Performing Sentiment Analysis on a Popular IMDB Dataset Using SciKit-Learn
KEY CONCEPTS:
Build and employ a logistic regression classifier using scikit-learn
Clean and pre-process text data
Perform feature extraction with nltk
Tune model hyperparameters and evaluate model accuracy
PROJECT PURPOSE:
In this project-based course from Coursera Project Network, I learned the fundamentals of sentiment analysis, and built a logistic regression model that could classify movie reviews as either positive or negative. The popular IMDB data set was used for this project. The goal was to use a simple logistic regression estimator from SciKit-Learn for document classification.
PROJECT OUTLINE:
Task 1: Introduction and Importing the Data
Task 2: Transforming Documents into Feature Vectors
Task 3: Term Frequency-Inverse Document Frequency
Task 4: Calculate TF-IDF of the Term ‘Is’
Task 5: Data Preparation
Task. 6: Tokenization of Documents
Task 7: Document Classification Using Logistic Regression
Task 8: Load Saved Model from Disk
Task 9: Model Accuracy
PROJECT SCREENSHOTS:
HELPFUL LINKS: