Tweet Sentiment Analysis of English Premier League Clubs
Domains

Tech Stack
Project Summary
Abstract
Football Twitter is noisy, emotional, and context-heavy, which makes it a useful benchmark for comparing sentiment-analysis approaches beyond clean textbook datasets.
This project analyzed a two-month Kaggle dataset covering 14 English Premier League clubs and compared a fast lexicon-based baseline with a pre-trained transformer model.
VADER captured short-form polarity efficiently, while BERT offered stronger handling of semantics, ambiguity, and sarcasm in conversational sports data.
What I Built
- The 14-club EPL dataset provided a practical benchmark for comparing lexicon-based and transformer-based sentiment analysis.
- VADER offered a strong fast baseline, while BERT captured more semantic nuance and sarcasm.
Impact
- Framed the work as a practical NLP comparison relevant to social listening and fan-reaction analysis.
- Added an AI project grounded in real conversational data rather than synthetic examples.
Page Info
EPL Tweet Dataset
Analyzed a Kaggle dataset covering Twitter discussions around 14 English Premier League clubs across a continuous two-month window.

VADER Baseline
Used VADER to score polarity in short-form social-media text, capturing intensity, slang, punctuation, and fast-moving fan reactions.

BERT Comparison
Compared the lexicon-based baseline with a pre-trained BERT model to better capture context, semantic nuance, and sarcasm in football conversations.
