Quantized Mixture-of-LoRA-Experts for Low-Cost Training of Large Language Models

Ariyak, Alpay

Student Work

Quantized Mixture-of-LoRA-Experts for Low-Cost Training of Large Language Models

Öffentlich Deposited

This project explores the potential of combining Mixture-of-Experts (MoE) architectures with Quantized Low-Rank Adaptation (QLoRA) to enable efficient training of large language models on consumer hardware. The proposed QLoRA-MoE approach involves training multiple QLoRA adapters as experts on different data subsets and merging them during inference using various routing strategies. The goal is to assess whether these hybrid models can achieve improved performance compared to standard QLoRA and LoRA while requiring less compute than full fine-tuning. Experimental results demonstrate that QLoRA-MoE can reliably outperform standard QLoRA and LoRA with the right hyperparameters. This suggests that combining quantization with mixture-of-experts techniques is a promising direction for efficient and effective adaptation of large language models. However, further research is needed to optimize the routing strategies and expert architectures. This work highlights the potential of combining MoE with parameter efficient methods for more accessible development of high-quality models.

This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.

Creator