In (202400006), I’m learning to extract insights from text using advanced mining techniques. Below is a breakdown of what I’m learning:
Regular Expressions: Mastered pattern matching for text extraction and manipulation.
Text Preprocessing: Applied techniques like stemming, stop word removal, and tokenization to clean text data.
Feature Selection: Learned to identify and prioritize key text features for modeling.
Sentiment Analysis: Built a model to analyze emotional tones from text.
Responsible Text Mining: Explored ethical considerations and applications.
Text Classification: Explored methods like logistic regression, KNN, and Naive Bayes for sentiment labeling.
Assignment 1: Sentiment Classification: Classified sentiment from a dataset consisting of reviews using three different models to compare performance.
Clustering & Topic Modeling: Studied K-means clustering and Latent Dirichlet Allocation (LDA) to uncover text patterns.
Word Embeddings: Investigated TF-IDF, Word2Vec, and BERT for representing text meaning.
Deep Learning & LLMs: Introduced to neural networks and large language models for advanced text tasks.
Assignment 1: Built a reproducible pipeline for sentiment classification, including preprocessing, model training, and evaluation, submitted as a report and code package.
Assignment 2: Compared different types of word embeddings with clustering and topic modeling techniques.
I’m developing the skills to transform raw text into structured knowledge, with a focus on both supervised and unsupervised methods, preparing for deeper applications in sentiment and ethics.