Machine Learning
Digital Nomads Travel Cost Prediction
A linear regression that predicts travel package costs — and a story about feature engineering. The raw dataset described packages with text only; scraped distance and per-airline cost-per-km tables to derive numeric features. R² went from 0.45 to 0.72.

Objectives
- 1Predict the cost of travel packages using linear models.
- 2Identify and correct non-compliance with theoretical requirements of linear models.
- 3Apply feature engineering using internal and external information, including NLP methods.
Conclusions
- Feature engineering improved R² from 0.45 to 0.72, demonstrating its critical role in linear models.
- Ridge regularization handles multicollinearity better than variable elimination, preserving predictive information.
- NLP-derived features (destination sentiment, description length) contributed 15% to model performance.
- Log-transforming the target variable corrected heteroscedasticity and improved residual normality.
Technologies
- Spacy
- Scipy
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn