Machine Learning
Microbiome Disease Prediction
Predictive model to predict diseases based on patient microbiota data using machine learning methods.
Course: Introduction to Data ScienceCo-authors: Diego Quezada

Objectives
- 1Predict diseases from high-dimensional human microbiome data.
- 2Compare machine learning classifiers for small-sample, high-dimensional scenarios.
- 3Identify key microbial species associated with each disease.
Conclusions
- SVM with linear kernel achieved 85% accuracy, outperforming Random Forest and XGBoost in this high-dimensional scenario.
- PCA reduced 154,000+ features to 25 components while retaining 90% of variance, enabling effective model training.
- Specific bacterial species (e.g., Bacteroides, Prevotella) emerged as key biomarkers for different diseases.
- The model shows disease-specific patterns: increasing one disease probability decreases others due to single-diagnosis training data.
Technologies
- NumPy
- Pandas
- Matplotlib
- Plotly
- Scikit-learn
- XGBoost
- FastAPI