NLP
Movie Sentiment Analysis
Comparative study of ML methods and neural networks for sentiment classification of movie reviews.
Course: Introduction to Data Science

Objectives
- 1Classify sentiment in movie reviews comparing traditional ML methods versus neural networks.
- 2Compare text vectorization methods: bag of words, TF-IDF, and word embeddings.
- 3Analyze the effectiveness of regularization techniques (dropout) to reduce overfitting.
Conclusions
- Logistic Regression with TF-IDF achieved 88% accuracy, comparable to LSTM networks with less complexity.
- Word embeddings capture semantic relationships but require more data to outperform TF-IDF on this dataset.
- Dropout (0.3-0.5) reduces LSTM overfitting by 10-15% on validation accuracy.
- Traditional ML models generalize better on limited data, while neural networks require larger datasets.
Technologies
- NumPy
- Matplotlib
- Scikit-learn
- TensorFlow
- Keras