Using BERT to Analyze a Dataset for Sentiment Analysis
KEY CONCEPTS:
What BERT is and what it can do?
Clean and preprocess text dataset
Split dataset into training and validation sets using stratified approach
Tokenize (encode) dataset using BERT toknizer
Design BERT finetuning architecture
Evaluate performance using F1 scores and accuracy
Finetune BERT using training loop
PROJECT PURPOSE:
For this guided project from Coursera Project Network the purpose was to analyze a dataset for sentiment analysis. We learned how to read in a PyTorch BERT model, and adjust the architecture for multi-class classification. We learned how to adjust an optimizer and scheduler for ideal training and performance. While fine-tuning this model, we also learned how to design a train and evaluate loop to monitor model performance as it trains, including saving and loading models. The end result was a Sentiment Analysis model that leverages BERT’s large-scale language knowledge.
PROJECT OUTLINE:
Task 1: Introduction (this section)
Task 2: Exploratory Data Analysis and Preprocessing
Task 3: Training/Validation Split
Task 4: Loading Tokenizer and Encoding our Data
Task 5: Setting up BERT Pretrained Model
Task 6: Creating Data Loaders
Task 7: Setting Up Optimizer and Scheduler
Task 8: Defining our Performance Metrics
Task 9: Creating our Training Loop
PROJECT SCREENSHOTS:
HELPFUL LINKS: