Skip to main content

Module description - Statistical Learning
(Statistisches Lernen)

Number
stl
ECTS 4.0
Specification Find optimal f for y = f(x) by means of statistics
Level Advanced
Content Many statisticians argue that Data Science and Machine Learning are just new names for statistics. The discussion of this statement is left to the students, but machine learning is really not much more than fitting a function to a training data set, with the hope that the recovered function also generalizes to test data.

Statistical Learning deals with estimating a function f that optimally solves the regression or classification problem y = f(x). This module will explore different possible families of functions for f and, in particular, how one differs against another in terms of an error or performance measure and, ultimately, which one is best suited to the problem under consideration. Important in this module: All this is to be done taking into account the limited nature of the chosen sample.
Learning outcomes LE1: Theoretical basics of the STL
Students are able to statistically formulate the regression and classification problem and their optimal solutions. They understand the difference between parametric and non-parametric function families, know suitable measures for assessing the goodness of fit and are familiar in particular with the bias-variance tradeoff.

LE2: Linear regression
Students understand regression parameters as statistical quantities and can include categorical variables, interactions between variables, and non-linear relationships in regression problems. They are aware of the limitations of using the linear regression method.

LE3: Classification problems
Students will be familiar with the best-known approaches to solving classification problems (logistic regression, linear discriminant analysis (LDA), Naive Bayes) and will be able to apply them to appropriate data sets.

LE4: Generalized Linear Models (GLMs)
Students understand GLMs as a generalization of the classical regression model. They know the application areas of frequently used link functions and can model adequate data sets with them.

LE4: Resampling
The impact of a restricted sample on performance measures can be statistically considered by students using cross-validation (CV) and bootstrap.

LE5: Model Selection
Students can select the best model from a group of models using various selection criteria (subset selection, AIC, BIC, Adjusted R2 ) and taking into account the limited size of the sample.

LE6: Non-linear regression
Students will recognize the applications of nonlinear regression and, in particular, be able to fit polynomial regression, splines, local regression, and generalized additive models (GAMs) to data.
Evaluation Mark
Built on the following competences Probability Modelling (WER), Exploratory Data Analysis (EDA), Foundation in Linear Algebra (GLA), Foundation in Calculus (GAN), Linear and Logistic Regression (LLR).
Modultype Portfolio Module
Diese Seite teilen: