This project presents an in-depth analysis of hospital readmission patterns using the Hospital Readmissions Dataset, a robust collection of over 100,000 inpatient records from diabetic patients across 130 U.S. hospitals between 1999 and 2008. The dataset provides a wide range of variables related to patient demographics, length of stay, diagnoses, lab values, procedures, and prescribed medications.
The primary focus of this analysis was to explore factors that may influence the likelihood of hospital readmissions, with particular attention given to readmissions occurring within 30 days—a critical quality metric in healthcare that is closely tied to patient outcomes and reimbursement policies. This type of analysis can help identify at-risk populations, highlight potential disparities in care, and support better decision-making in hospital management and chronic disease intervention.
Using Python in a Jupyter Notebook environment, I:
Cleaned and transformed raw data into a usable format by handling missing values, correcting ambiguous labels, and standardizing column names for clarity.
Created a new variable for “readmission status” with easily interpretable categories: Not Readmitted, Readmitted After 30 Days, and Readmitted Within 30 Days.
Conducted exploratory data analysis (EDA) through visualizations to assess readmission patterns across age groups, racial categories, and diabetes medication usage.
Patients between the ages of 60–80 were most commonly readmitted, highlighting this group as a key demographic for intervention strategies.
Diabetes medication usage was associated with higher overall hospital interaction, possibly indicating more complex or severe conditions.
The majority of hospital visits came from Caucasian and African American patients, with readmission rates proportionally distributed, though racial disparities could be further explored in future research.
This project not only allowed me to apply data cleaning and visualization techniques, but also gave me hands-on experience working with complex, real-world healthcare data. It demonstrates my ability to manage an end-to-end analysis project—from raw data ingestion through to insight communication—using tools like Pandas, Matplotlib, and Seaborn.
By identifying patterns in patient readmissions, this project reflects the kind of data-driven thinking that can support improved health outcomes, smarter hospital workflows, and more equitable patient care.
Click below to view the full project with charts, Python code, and detailed explanations of each insight.