Back to Projects
Machine Learning

Microbiome Disease Prediction

Predictive model to predict diseases based on patient microbiota data using machine learning methods.

Course: Introduction to Data ScienceCo-authors: Diego Quezada
Microbiome Disease Prediction

Objectives

  • 1Predict diseases from high-dimensional human microbiome data.
  • 2Compare machine learning classifiers for small-sample, high-dimensional scenarios.
  • 3Identify key microbial species associated with each disease.

Conclusions

  • SVM with linear kernel achieved 85% accuracy, outperforming Random Forest and XGBoost in this high-dimensional scenario.
  • PCA reduced 154,000+ features to 25 components while retaining 90% of variance, enabling effective model training.
  • Specific bacterial species (e.g., Bacteroides, Prevotella) emerged as key biomarkers for different diseases.
  • The model shows disease-specific patterns: increasing one disease probability decreases others due to single-diagnosis training data.

Technologies

  • NumPy
  • Pandas
  • Matplotlib
  • Plotly
  • Scikit-learn
  • XGBoost
  • FastAPI