Back to Projects
Machine Learning

Digital Nomads Travel Cost Prediction

A linear regression that predicts travel package costs — and a story about feature engineering. The raw dataset described packages with text only; scraped distance and per-airline cost-per-km tables to derive numeric features. R² went from 0.45 to 0.72.

Try DemoCourse: Machine LearningCo-authors: Fernanda Avendaño, Diego Quezada
Digital Nomads Travel Cost Prediction

Objectives

  • 1Predict the cost of travel packages using linear models.
  • 2Identify and correct non-compliance with theoretical requirements of linear models.
  • 3Apply feature engineering using internal and external information, including NLP methods.

Conclusions

  • Feature engineering improved R² from 0.45 to 0.72, demonstrating its critical role in linear models.
  • Ridge regularization handles multicollinearity better than variable elimination, preserving predictive information.
  • NLP-derived features (destination sentiment, description length) contributed 15% to model performance.
  • Log-transforming the target variable corrected heteroscedasticity and improved residual normality.

Technologies

  • Spacy
  • Scipy
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-learn