ITERA-LLM: Boosting sub-8-bit Large Language Model inference via iterative tensor decomposition
File(s)2505.08981v1.pdf (2.7 MB)
Preprint
Author(s)
Zheng, Keran
Huang, Yinting
Yu, Zhewen
Bouganis, Christos-Savvas
Type
preprint
Date Issued
2025-05-13
Citation
arXiv, 2025
Journal / Book Title
arXiv
Copyright Statement
Copyright © 2025 The Author(s). This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (https://creativecommons.org/licenses/by-nc-sa/4.0/).
Identifier
http://arxiv.org/abs/2505.08981v1
Subjects
cs.AR
cs.AR