Investigating underrepresentation in AI datasets for cardiovascular health asssessment
Public DepositedDownloadable Content
open in viewerCardiovascular diseases (CVDs) are the single most common global cause of death. With recent advancements in computer technology, machine learning (ML) and artificial intelligence (AI) are now common research aids for CVD research. In this paper, we investigate potential underlying biases in the data used by CVD studies that leverage ML and AI. After analyzing 11 CVD datasets, we found three datasets which included parameters for race/ethnicity and gender, all of which were demographically consistent with the US Census. However, the remaining 7 datasets referenced neither race/ethnicity nor gender. CVDs manifest differently across race/ethnic and gender groups, and thus research using datasets with unclear demographics could lead to inaccurate results. More investigation is necessary to quantify the impact of misrepresentation across demographic groups in CVD research.
- This report represents the work of one or more WPI undergraduate students submitted to the faculty as evidence of completion of a degree requirement. WPI routinely publishes these reports on its website without editorial or peer review.
- Creator
- Subject
- Publisher
- Identifier
- E-project-110422-152827
- 81296
- Keyword
- Advisor
- Year
- 2022
- Sponsor
- UN Sustainable Development Goals
- Date created
- 2022-11-04
- Resource type
- Source
- E-project-110422-152827
- Rights statement
- Last modified
- 2022-12-19
Relations
- In Collection:
Items
Permanent link to this page: https://digital.wpi.edu/show/0v8383858