Large Speech Models for Acoustic Environment Monitoring
Public DepositedDownloadable Content
open in viewerEmbedded devices have limited resources but are required to perform multifaceted work, so it is crucial to minimize the number of resources needed for machine learning applications. In order to address this, we propose to create a versatile audio embedding that can perform both human and non-human speech tasks. We evaluate the performance of three potential pre-existing pre-trained audio embeddings: Wav2Vec, Wav2Vec2, and AST. Based on our results, Wav2Vec2 and Wav2Vec are suitable candidates. We further observe these models by fine-tuning on their non-specialized tasks. Through fine-tuning, we achieve 66.0% F1-score with Wav2Vec2 and 51.86% F1-Score with Wav2Vec for environmental sound classification.
- This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.
- Creator
- Publisher
- Identifier
- 119118
- E-project-032124-233517
- Advisor
- Year
- 2024
- Date created
- 2024-03-21
- Resource type
- Major
- Source
- E-project-032124-233517
- Rights statement
Relations
- In Collection:
Items
Items
Thumbnail | Title | Visibility | Embargo Release Date | Actions |
---|---|---|---|---|
Large_Speech_Models_for_Acoustic_Environment_Monitoring.pdf | Public | Download |
Permanent link to this page: https://digital.wpi.edu/show/js956k91r