Using BERT to Analyze a Dataset for Sentiment Analysis

Screen Shot 2020-10-01 at 9 33 58 AM

Screen Shot 2020-10-01 at 9 33 39 AM

KEY CONCEPTS:

What BERT is and what it can do?

Clean and preprocess text dataset

Split dataset into training and validation sets using stratified approach

Tokenize (encode) dataset using BERT toknizer

Design BERT finetuning architecture

Evaluate performance using F1 scores and accuracy

Finetune BERT using training loop

PROJECT PURPOSE:

For this guided project from Coursera Project Network the purpose was to analyze a dataset for sentiment analysis. We learned how to read in a PyTorch BERT model, and adjust the architecture for multi-class classification. We learned how to adjust an optimizer and scheduler for ideal training and performance. While fine-tuning this model, we also learned how to design a train and evaluate loop to monitor model performance as it trains, including saving and loading models. The end result was a Sentiment Analysis model that leverages BERT’s large-scale language knowledge.

PROJECT OUTLINE:

Task 1: Introduction (this section)

Task 2: Exploratory Data Analysis and Preprocessing

Task 3: Training/Validation Split

Task 4: Loading Tokenizer and Encoding our Data

Task 5: Setting up BERT Pretrained Model

Task 6: Creating Data Loaders

Task 7: Setting Up Optimizer and Scheduler

Task 8: Defining our Performance Metrics

Task 9: Creating our Training Loop

PROJECT SCREENSHOTS:

Screen Shot 2020-10-01 at 10 03 33 AM

Screen Shot 2020-10-01 at 10 38 44 AM

COURSERA PROJECT LINK

PROJECT GOOGLE DRIVE

MY COURSE CERTIFICATE

Screen Shot 2020-10-01 at 10 35 14 AM