Cleaning the Race Column

To ensure data quality, I replaced all '?' entries in the race column with NaN, a standard placeholder for missing values. This allows for better handling in analysis — for instance, I can now count or filter out missing demographic data without misclassifying unknowns. After this cleanup, I confirmed the distribution: the majority of patients identified as Caucasian, followed by African American and Hispanic populations.

Back

Source https://archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-2008

Page updated

Google Sites

Report abuse