Teaching

Thesis Projects (Bachelor and Masters)

 

See the thesis project page.

(Summer 2025) Bridging Theory and Practice: Reliable ML in Data Corruption Scenarios

Are you curious about what happens to machine learning when the data goes wrong? In a field dominated by large models and scaling breakthroughs, one crucial aspect has often been overlooked: data!

This seminar takes you back to the roots of machine learning–the data–to examine the threats posed by data corruption. Using tools from learning theory and statistics, we will explore how data corruption impacts applications in medicine, social systems, and fairness-aware machine learning.

Topics we will cover

To make this seminar interactive and collaborative, we have designed an engaging Data Corruption Bingo game:

  • Bingo board: the table below with rows representing types of data corruptions, and columns representing application areas. We will provide some references for each Bingo board entries, as kick-starters for your work. They should be considered starting points for your research, not an exhaustive reading list. You will ultimately decide which papers to include in your presentation and essay!

  • Your task: find a group partner and pick a horizontal or vertical line on the board, choosing one of the following paths:

    • Study one type of corruption across all three application fields;

    • Study at least 3 types of corruption within one application field;

    • Study another interesting combination.

  • We must discuss and approve your choice.

  • Types of corruption Medicine systems Social systems Fairness-aware ML
    Sample selection bias      
    Label noise      
    Missing attributes      
    Adversarial examples      
    Concept drift      
    Distribution shift      

Seminar structure

The seminar will begin with an introductory lecture covering the fundamentals of data corruption and the relevant learning theory and statistics tools. Then the students will form groups and select a topic line of interest. Each group will conduct in-depth research on their chosen topic, and prepare for a 30-minute talk, followed by a discussion of 30 minutes maximum. All the students will present their findings on the same date.

After the presentations, each group will submit a technical report summarizing their findings by the indicated deadline. The report should adhere to the JMLR format, and be no more than 6 pages. Grading will be based on both the presentation and the final report. 

Goal

The goal of this seminar is to explore the growing body of research on data corruption in different fields of applications, which spans both theoretical and practical aspects. We do not expect students to adopt a single perspective. Instead, we encourage you to explore what is known about the topic you are studying and ask your own question: What problems related to the topic would you like to think of, and what approaches—whether applied, theoretical, or philosophical—would you use? For the presentation and essay, we expect you to critically review the current state of the chosen topic, as well as discuss open questions and challenges, and possible solutions if you identified any.

Practical Information

Dates

  • Kick-off lecture: 10:00~13:00 on April 25th (3rd floor meeting room at Maria-von-Linden-Strasse 6)
  • Presentations on June 5th to 6th (meeting room TBD at Maria-von-Linden-Strasse 6)
  • Final report deadline on July 6th (tentative, open for discussion with participating students)
  • Students will have the opportunity for one-on-one discussions with the instructors to refine and develop concrete research directions. Office hours will be announced during the first lecture.

Registration: send us an email to both nan.lu[at]uni-tuebingen.de and laura.iacovissi[at]uni-tuebingen.de (with both addresses in CC please)

Capacity: Limited to 20 participants (10 groups of 2).

Who can join? This course is at the master's level, and students from the Machine Learning Master will be preferentially admitted. Master students in Computer Science, Media Informatics, Medical Informatics, and Bioinformatics can also join the course if they have sufficient background knowledge in Statistical Machine Learning, Probabilistic Machine Learning and Deep Learning.

 

(Summer 2024) A Zoo of Non-Standard Learning Problems

In the block seminar "A Zoo of Nonstandard Learning Problems" we will explore less-known learning problems. In the vast landscape of machine learning, many problems often remain overshadowed by mainstream research, despite their practical significance and theoretical richness. This seminar aims to shed light on these neglected territories, focusing primarily on supervised learning settings.

Topics:

  1. Superset Learning
  2. Multi-Instance Learning
  3. Label Ranking
  4. Ordinal Regression
  5. Learning with Soft Labels
  6. Multi-Label Learning
  7. Anomaly Detection
  8. Learning with Heavy-Tailed Class Distribution
  9. Learning with Interval (Imprecise) Input Data
  10. Learning with Coherent Risk Measures as Aggregation Functionals
  11. Learning with Option to Reject
  12. Learning with Option to Defer to Human

If interested to join the block seminar, send us a mail to rabanus.derr[at]uni-tuebingen.de ! 

Each student will choose a topic of interest from the list and give a presentation about it to the fellow students. After the presentations, each student will then dive deeper into their chosen paradigm through a hands-on coding project. The grading therefore consists of two parts: presentation and coding project. Join us on this journey as we broaden our horizons and explore a variety of learning problems!

Dates: preparatory meeting on the 30th April 17:00 - 18:00 (Meeting Room 1st floor Maria-von-Linden-Strasse 6) -> still open topics (send a mail to rabanus.derr[at]uni-tuebingen.de),
main dates 21st June 13:00 - 18:00 + 22nd June 10:00 - 15:00 (Meeting Room 3rd floor Maria-von-Linden-Strasse 6).
Credits: 3 ECTS
Workload: giving a presentation and coding project.

Registration:
1. Send us a mail to: rabanus.derr[at]uni-tuebingen.de
2. Select topic on first-come-first-serve basis on a doodle-link announced via mail on a fixed date (more information via mail).

This course is a course at the master level and has a maximum capacity of 12 slots. Students in the M.Sc. Machine Learning will be preferentially admitted, but master students in Computer Science, Media Informatics, Medical Informatics, and Bioinformatics can also join the course if they have sufficient background knowledge in Statistical Machine Learning, Probabilistic Machine Learning and Deep Learning.

(Winter 2021 , Winter 2023) Beyond Fairness: A Socio-Technical View of Machine Learning

This is a Masters course that starts off by looking at various mathematisations of fairness in ML, and proceeds to examine many other more general issues, in particular the role of data and categorisation. A wide literature is drawn from and the assessment is a team term paper.  Moodle site 2023.  Taught by Bob Williamson.  Tutors:  Nan Lu Benedikt Höltgen and Sebastian Zezulka

(Winter 2022 , Winter 2023)  Mathematics for Machine Learning

This is a course providing the necessary mathematical background for the Master of Machine Learning students. Ilias site. Taught by Armando Cabrera Pacheco.

(Winter 2022) Information Theory

This is a Bachelor course that covers the source and channel coding theorems and the role of information theory in machine learning.  Taught by Bob Williamson.