Pruning and Optimizing Large Language Models in an Era of GPU Scarcity

Ashhadul Islam*, Samir Brahim Belhaouari, Amine Bermak

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The increasing computational and environmental costs associated with AI models, especially large language models (LLMs), highlight the urgent need for network optimization. These models consume vast amounts of energy and resources, requiring more efficient training strategies to balance performance with ecological responsibility. Our focus is on enhancing the efficiency of deep neural networks on embedded devices through novel pruning techniques: “evolution of weights” and “smart pruning.” These methods, compared to traditional pruning approaches using benchmark datasets, involve evaluating parameter importance during training to better preserve accuracy during compression. Our approach results in faster computations and higher compression rates with minimal accuracy loss. We have successfully applied these techniques to LLMs consisting of around 10 million parameters. The LLM experiment is publicly available on Github to facilitate replication testing.

Original languageEnglish
Title of host publicationArtificial Intelligence And Applications, Icai 2024
EditorsHR Arabnia, L Deligiannidis, S Amirian, F Shenavarmasouleh, FG Mohammadi, D DeLaFuente
PublisherSpringer Science and Business Media Deutschland GmbH
Pages145-153
Number of pages9
Volume2252
ISBN (Electronic)978-3-031-86623-4
ISBN (Print)9783031866227
DOIs
Publication statusPublished - 3 May 2025
Event26th International Conference on Artificial Intelligence and Applications, ICAI 2024, held as part of the World Congress in Computer Science, Computer Engineering and Applied Computing, CSCE 2024 - Las Vegas, United States
Duration: 22 Jul 202425 Jul 2024

Publication series

NameCommunications In Computer And Information Science

Conference

Conference26th International Conference on Artificial Intelligence and Applications, ICAI 2024, held as part of the World Congress in Computer Science, Computer Engineering and Applied Computing, CSCE 2024
Country/TerritoryUnited States
CityLas Vegas
Period22/07/2425/07/24

Keywords

  • LLM Optimization
  • Network Pruning
  • Sparse Matrices

Fingerprint

Dive into the research topics of 'Pruning and Optimizing Large Language Models in an Era of GPU Scarcity'. Together they form a unique fingerprint.

Cite this