top of page

Hospital Acquired Infection Analysis

This project analyzes healthcare-associated infections across U.S. hospitals, leveraging statistical and machine learning techniques to identify high-risk facilities, benchmark performance against national averages, and uncover infection trends by type and location. These insights aim to support data-driven infection prevention strategies.

This project focuses on analyzing healthcare-associated infections (HAIs) in U.S. hospitals to uncover trends, identify high-risk facilities, and assess infection performance against national benchmarks. Using a dataset containing over 21,000 rows and 18 variables—including infection types (e.g., C. difficile, MRSA), infection scores, observed and predicted cases, and geographic data—the analysis provides a comprehensive view of infection trends across facilities.

Key objectives included:

  • Identifying facilities with the highest infection rates.

  • Exploring infection patterns across different states.

  • Comparing specific infection types to national benchmarks to evaluate performance.

To achieve these objectives, the following steps were taken:

  • Exploratory Data Analysis (EDA): Investigated infection score distributions, identified high-risk states, and visualized trends across facilities and infection types.

  • Principal Component Analysis (PCA): Reduced dimensionality to focus on key factors driving infection trends and to identify high-performing and underperforming states.

  • K-means Clustering: Grouped facilities into clusters based on infection performance to uncover patterns and similarities across regions and infection types.

  • Predictive Modeling: Used machine learning (Random Forest Classifier) to classify facilities as high-risk or low-risk based on infection scores and other features.

  • Benchmarking Against National Averages: Analyzed deviations from national infection score benchmarks to highlight areas needing improvement.

  • State-Level Analysis: Conducted detailed comparisons of infection rates for high-risk states, focusing on the most concerning infection types, such as C. difficile.

Results

  • Disproportionately high C. difficile infection rates were observed in Washington, D.C., Massachusetts, and New York.

  • K-means clustering revealed patterns in facility performance, grouping similar hospitals for targeted interventions.

  • Predictive modeling classified facilities accurately into risk categories, enabling data-driven infection management strategies.

This project emphasizes the importance of leveraging advanced analytical techniques to address healthcare challenges and provides actionable insights for improving infection prevention strategies. It also demonstrates the versatility of data science in solving complex problems in healthcare and beyond.

Power in Numbers

13.26

Unique ID's

Volunteers

Project Gallery

  • LinkedIn
  • Medium
  • GitHub

Interested in working together or have a question? Drop me a line!

bottom of page