Fine-Tuning Open-Source Large Language Models for Generating Math Explanations
Public DepositedContenu téléchargeable
open in viewerPercy Liang’s article, “We have No Moat,” reveals that open-source large language models (LLMs) with 7 billion parameters are able to rival those of large tech companies with 500 billion parameters. Open-source LLMs have also become more accessible and easier to fine-tune with the rise of open-source resources like Hugging Face. Through the use of prompt engineering and fine-tuning, the goal of this project was to find and evaluate LLMs to potentially match the performance of OpenAI’s GPT-3.5. We aim to help ASSISTments, a non-profit organization that focuses on middle-school math education, in developing open-source LLMs to transition from tedious and somewhat inaccurate hand-written explanations to streamlined automatically generated ones. Open source LLMs offer a more cost-effective option compared to GPT-3.5 and a more time-efficient option compared to generating explanations by hand. ASSISTments has already started working on integrating LLMs into their website, and our focus was on improving the explanation generating LLMs. Leveraging a framework of prompt engineering and fine-tuning LLMs, we tested and evaluated the effectiveness of many models in writing accurate math explanations. During prompt engineering, we double-blinded the responses for each prompt and evaluated each response. This double-blind process allowed us to determine the score in an unbiased manner. Through an iterative process, we were able to see up to 80% improvement with our best prompts compared to just giving a labeled question-answer pair to prompt the LLM. Performing fine-tuning, we determined that we were unable to significantly improve a WizardMath’s mathematical reasoning, but fine-tuning was highly effective in producing consistently formatted answers which gave the explanations more readability compared to the base WizardMath. This framework was ultimately used to compare the performance of 3 LLMs in generating explanations to ASSISTments questions. We found that the fine-tuned model improved the base model by about 5%, while GPT-3.5 outperformed the base model by roughly 45%. Our results show promise in utilizing LLMs for generating accurate and readable explanations. Furthermore, our fine-tuning and prompt engineering framework can be utilized in other fields in which LLMs can be integrated in order to optimize the performance of the LLMs.
- This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.
- Creator
- Publisher
- Identifier
- E-project-022824-174239
- 117968
- Mot-clé
- Advisor
- Year
- 2024
- Date created
- 2024-02-28
- Resource type
- Major
- Source
- E-project-022824-174239
- Rights statement
Relations
- Dans Collection:
Contenu
Articles
La vignette | Titre | Visibilité | Embargo Release Date | actes |
---|---|---|---|---|
Fine-Tuning_Open-Source_LLMs_to_Generate_Math_Explanations_Project_Report__2_.pdf | Public | Télécharger | ||
Fine-Tuning_Open-Source_LLMs_to_Generate_Math_Explanations_Project_Poster.pptx | Public | Télécharger |
Permanent link to this page: https://digital.wpi.edu/show/cr56n536x