A machine learning pipeline to classify IMDB movie reviews as positive or negative using NLP preprocessing, TF-IDF vectorization, and a Logistic Regression model.
This project builds a text sentiment analysis model using the IMDB Reviews Dataset. The pipeline involves preprocessing, vectorization, training, evaluation, and live prediction.
This project uses the IMDB Reviews Dataset:
Steps:
- Download and extract the dataset.
- Rename or ensure the file is named
IMDB_Dataset.csv. - Place it in the project root directory.
⚠️ The dataset is not included in the repo due to GitHub file size limits.
- Python 3
- Pandas
- Scikit-learn
- NLTK
- TF-IDF Vectorizer
- Logistic Regression
- Clean and normalize text using NLP techniques
- Convert reviews into numerical features using TF-IDF
- Train and evaluate a logistic regression model
- Save trained model and vectorizer for reuse
- Predict sentiment of custom reviews in real-time
| Metric | Score |
|---|---|
| Accuracy | 85.13% |
| F1-Score | 85% |
text-sentiment-analysis/
├── IMDB\_Dataset.csv
├── sentiment\_analysis.py
├── sentiment\_model.pkl
├── tfidf\_vectorizer.pkl
└── README.md
-
Clone the repo
git clone https://github.com/ahsankhizar5/text-sentiment-analysis.git cd text-sentiment-analysis -
Install dependencies
pip install -r requirements.txt
-
Run the script
python sentiment_analysis.py
-
Enter your own review for live prediction!
📝 Try your own review:
Enter a movie review: this seems to be bad one
Predicted Sentiment: Negative 😞
MIT License
For queries or collaboration, feel free to reach out: Ahsan Khizar GitHub — LinkedIn
“Code is not just about solving problems. It’s about building trust, clarity, and real-world impact — one line at a time.”> — Ahsan Khizar