This project explores how various houseplants remove volatile organic compounds (VOCs) from the air. The dataset includes the removal amounts of benzene, formaldehyde, and TCE, alongside leaf surface area measurements. My goal was to determine which plants are most effective at purifying indoor air and whether leaf size correlates with purification efficiency.
I cleaned and merged three separate datasets:
Benzene Removal
TCE Removal
Formaldehyde Removal
The merged table allowed me to calculate the Total VOC Removed and Efficiency per cm² of Leaf Area.
To highlight the best purifiers:
Top 10 by Total VOCs
Top 10 by Efficiency
Gerbera Daisy ranked highest in both total removal and efficiency, followed by Peace Lily and Mother-in-law’s Tongue.
Each VOC was visualized separately to identify standout plants for:
Benzene
TCE
Formaldehyde
Descriptive Statistics
I computed summary stats to get an overview of VOC removal distribution. I also created a correlation matrix to assess relationships between leaf surface area and removal levels.
Screenshot: Correlation Heatmap
Group Comparisons
I categorized plants into Small Leaf and Large Leaf groups and conducted:
Boxplots
Histograms
Violin Plots
T-Test
I used an independent t-test to compare VOC efficiency between small and large leaf plants.
T-statistic: 1.357
P-value: 0.189
Interpretation: No statistically significant difference in VOC efficiency based on leaf size (p > 0.05), but small leaf plants showed slightly higher variance and occasional outliers.
Screenshot: T-Test Output
To scale analysis:
I loaded the merged dataset into PostgreSQL.
Ran SQL queries to retrieve:
Top 5 purifiers
VOC efficiency rankings
This integration made my analysis more dynamic and gave me practice working with real-world database queries.
Gerbera Daisy is the most efficient and powerful air-purifying plant in the dataset.
Leaf size doesn't significantly influence VOC removal efficiency.
PostgreSQL integration improved query performance and reproducibility.
Term
Definition
Common Name
The everyday or layman’s name for each plant is used for readability and general reference.
Leaf Surface Area (cm²)
The total surface area of a plant's leaves in square centimeters. Used to normalize VOC removal.
Total VOC Removed (µg)
The total mass (in micrograms) of volatile organic compounds removed by the plant.
VOC Efficiency (µg/cm²)
A calculated metric: VOCs removed per unit of leaf surface area. Higher = more efficient plant.
Initial (ppm)
Starting concentration of pollutants (usually in soil or air), measured in parts per million.
Final (ppm)
The ending concentration after the experiment period, also in parts per million.
Percent Removed
The percentage reduction in pollutants from the initial value. Useful for comparison.
Soil Bacterial Counts (cfu/g)
Colony-forming units per gram of soil are used to indicate microbial activity in soil.
Condition
The experiment’s setup (e.g., full foliage, empty chamber) describes the exposure or treatment.
Exposure Condition
The environment or treatment given to the sample (plant, soil, or air) during testing.
Interior Landscape Plants for Indoor Air Pollution Abatement. NASA Technical Reports Server (NTRS).