CB.

Applied Data Analysis and Visualization (201900027)

Completed: 24-06-2025 | 7.5 EC | Universiteit Utrecht

What I Learned

In (201900027), I’m developing practical skills in data analysis and visualization, focusing on real-world applications using R. This course emphasizes hands-on experience with statistical methods, machine learning, and visualization techniques, assessed through weekly homeworks, a group assignment, and a digital exam. Below is a breakdown of the topics covered:

Data Analysis Foundations

Exploratory Data Analysis (EDA): Learned to explore datasets using summary statistics and visualizations to uncover patterns.

Supervised Learning: Mastered techniques like linear regression, logistic regression, and K-nearest neighbors for predictive modeling.

Model Evaluation: Studied model fit, cross-validation, and error metrics (e.g., mean squared error) to assess performance.

 

Advanced Analytical Techniques

Linear Regression with Big Data: Applied subset selection and shrinkage methods (ridge regression, lasso) to handle high-dimensional datasets.

Tree-Based Methods: Explored decision trees and random forests for regression and classification tasks.

Text Mining: Learned preprocessing, sentiment analysis, and frequency analysis (e.g., TF-IDF) for text data.

Network Analysis: Studied network representations, centrality measures, and community detection using igraph.

 

Visualization Techniques

Grammar of Graphics: Mastered ggplot for creating effective visualizations, including scatter plots, density plots, and labeled graphs.

Interactive Visualizations: Built RShiny apps for dynamic, user-driven data dashboards, allowing real-time data interaction.

 

Practical Application

Weekly Homeworks: Completed R-based exercises on EDA, model fitting, and visualization, graded pass/fail.

Group Assignment (Part 1): Conducted linear regression with subset selection or shrinkage methods on a dataset, creating visualizations to summarize findings.

Group Assignment (Full): Developed an RShiny app to visualize and analyze a chosen dataset, integrating supervised learning and interactive visualizations.