To ensure data quality, I replaced all '?' entries in the race column with NaN, a standard placeholder for missing values. This allows for better handling in analysis — for instance, I can now count or filter out missing demographic data without misclassifying unknowns. After this cleanup, I confirmed the distribution: the majority of patients identified as Caucasian, followed by African American and Hispanic populations.