CB.

Text Mining: Transforming Text into Knowledge (202400006)

Completed: 10-04-2025 | 7.5 EC | Universiteit Utrecht

What I Learned

In (202400006), I’m learning to extract insights from text using advanced mining techniques. Below is a breakdown of what I’m learning:

Text Processing Foundations

Regular Expressions: Mastered pattern matching for text extraction and manipulation.

Text Preprocessing: Applied techniques like stemming, stop word removal, and tokenization to clean text data.

Feature Selection: Learned to identify and prioritize key text features for modeling.

 

Sentiment Analysis: Built a model to analyze emotional tones from text.

Responsible Text Mining: Explored ethical considerations and applications.

Supervised Learning for Text

Text Classification: Explored methods like logistic regression, KNN, and Naive Bayes for sentiment labeling.

Assignment 1: Sentiment Classification: Classified sentiment from a dataset consisting of reviews using three different models to compare performance.

 

Unsupervised Learning and Embeddings

Clustering & Topic Modeling: Studied K-means clustering and Latent Dirichlet Allocation (LDA) to uncover text patterns.

Word Embeddings: Investigated TF-IDF, Word2Vec, and BERT for representing text meaning.

Deep Learning & LLMs: Introduced to neural networks and large language models for advanced text tasks.

 

Practical Application

Assignment 1: Built a reproducible pipeline for sentiment classification, including preprocessing, model training, and evaluation, submitted as a report and code package.

Assignment 2: Compared different types of word embeddings with clustering and topic modeling techniques.

 

I’m developing the skills to transform raw text into structured knowledge, with a focus on both supervised and unsupervised methods, preparing for deeper applications in sentiment and ethics.