Munich Center for Mathematical Philosophy (MCMP)

Breadcrumb Navigation


Epistemology and Theory of Machine Learning (23-24 March 2023)

Idea & Motivation

The rapid rise and huge impact of methods in machine learning raises important philosophical questions. There is, in particular, an increasing interest in questions of epistemology: how exactly do machine learning methods facilitate or generate knowledge? Issues under this header include the justification and the fundamental limitations of such methods, their interpretability, and their implications for scientific reasoning in general. Since machine learning algorithms are, in the end, formal procedures, a formally-minded philosophical approach promises to be particularly fruitful for making progress on these issues. Such a study of modern machine learning algorithms can draw from a long tradition of work in formal epistemology and philosophy of science, as well as from work in computer science and the mathematics of machine learning. The aim of this workshop is to discuss foundational issues of machine learning in this formal spirit.
This workshsop marks the conclusion of the project “The Epistemology of Statistical Learning Theory,” funded by the German Research Foundation (DFG).

Confirmed Speakers


To register, please send a message to Tom Sterkenburg (


LMU München - Room M 209
Geschwister-Scholl-Platz 1

Day 1 (23 March 2023)

09:15 Welcome
09:30 - 10:15 Gitta Kutyniok: “Reliable AI: Successes, Challenges, and Limitations”
10:15 - 11:00 Jan-Willem Romeijn: "Reverse-Engineering the Model"
11:00 - 11:15 Coffee Break
11:15 - 12:00 Oliver Buchholz: “The Curve-Fitting Problem Revisited”
12:00 - 12:45 Daniela Schuster: “Philosophical Considerations on Abstaining Machine Learning”
12:45 - 14:00 Lunch Break
14:00 - 14:45 David Watson: “Philosophical Aspects of Unsupervised Learning”
14:45 - 15:30 Konstantin Genin: "Reconsidering the Foundations of Experimental Design"
15:30 - 15:45 Coffee Break
15:45 - 16:30 Timo Freiesleben: “Beyond Generalization: A Theory of Robustness in Machine Learning”
18:30 Conference Dinner

Day 2 (24 March 2023)

09:30 - 10:15 Gerhard Schurz: “Meta-Inductive Justification of Universal Generalizations”
10:15 - 11:00 Rianne de Heide: “The Limits of Explainable Machine Learning”
11:00 - 11:15 Coffee Break
11:15 - 12:00 Rolf Pfister: “An Approach to Solve the Abstraction and Reasoning Corpus by Means of Scientific Discovery”
12:00 - 12:45 Daniel Herrmann: "Artificial Agency"
12:45 - 14:00 Lunch Break
14:00 - 15:00 Tom Sterkenburg: “Statistical Learning Theory and Occam’s Razor”
15:00 - 16:30 Discussion


Oliver Buchholz (Tübingen): The Curve-Fitting Problem Revisited

The curve-fitting problem (CFP) is ubiquitous across scientific disciplines and well-studied in the philosophy of science. Generally, it refers to the task of fitting a mathematical function to given data. On the conventional analysis, this task involves a tradeoff between the function’s simplicity and accuracy as well as a closely related tradeoff between overfitting and underfitting: complex functions might overfit given data by incorporating its idiosyncrasies. This usually leads to low accuracy when the fitted function is used for predicting previously unseen data. Simple functions can prevent overfitting, yet at the risk of being too simple for capturing patterns that are relevant for making accurate predictions. Although solving tasks akin to the CFP, deep neural networks (DNNs) have been shown to exhibit high predictive accuracy regardless of their complexity and their exact fit to given data. Thus, DNNs escape the conventional analysis: apparently, they are not susceptible to overfitting and seem unaffected by the tradeoff between simplicity and accuracy. In this talk, I explore the philosophical ramifications of this result. In particular, I argue that it calls for rethinking the nature and justification of statistical

Timo Freiesleben (Tübingen & LMU/MCMP): Beyond Generalization: A Theory of Robustness in Machine Learning

The term robustness is ubiquitous in modern Machine Learning (ML). However, its meaning varies depending on context and community. Researchers either focus on narrow technical definitions, such as adversarial robustness, natural distribution shifts, and performativity, or they simply leave open what exactly they mean by robustness. In this paper, we provide a conceptual analysis of the term robustness in ML. We formally define robustness as the relative stability of a robustness target with respect to specific interventions on a modifier. Our account captures the various sub-types of robustness that are discussed in the research literature, including robustness to distribution shifts, prediction robustness, or the robustness of algorithmic explanations. Finally, we delineate robustness from adjacent key concepts in ML, such as extrapolation, generalization, and uncertainty, and establish it as an independent epistemic

Konstantin Genin (Tübingen): Reconsidering the Foundations of Experimental Design

The randomized controlled trial is the reigning champion of scientific methodology. But the supremacy of the RCT is more fragile than it appears. Bayesians do not endorse it. But challenges are also mounting from machine learning and machine learning-adjacent fields such as reinforcement learning and causal discovery. Incredibly, even the frequentist theory of optimal design of experiments---stretching back to Kirstine Smith and the birth of modern statistics---does not provide a theoretical justification for randomized experiments. In this talk I am concerned with the minimax frequentist foundations of randomization: theorems purporting to show that, under certain conditions, randomization minimizes the worst-case expected error in estimating parameters. I attempt to contrast these foundational theorems with the justifications of alternative methodologies, especially those emerging from the reinforcement learning tradition. I close by drawing out some consequences for the received ethics of clinical trials


Rianne de Heide (VU Amsterdam): The Limits of Explainable Machine Learning

We study the theoretical foundations of explainable artificial intelligence (XAI). We study two important properties of attribution functions: robustness (a small change in input should not result in a big change in feature weights) and recourse sensitivity: allowing a user to change the decision of a machine learning system by making limited changes to its input, which is important for making machine learning accountable for society. We formalise the latter notion, and prove that it is in general impossible for any single attribution method to be both recourse sensitive and robust at the same time. We provide examples of this impossibility for several popular attribution methods, including LIME, SHAP, Integrated Gradients and SmoothGrad. We exactly characterise the class of functions for which the impossibility occurs for the case the user is only able to change a single feature.

This talk is based on joined work with Hidde Fokkema and Tim van Erven.


Daniel Herrmann (UC Irvine): Artificial Agency

The core promise and challenge of AI systems stems from their sophisticated learning and decision-making capabilities. We want to understand how such artificial agents make their decisions, which will help us design better agents. Two possible strategies for understanding artificial agents are (i) to import the rich formal work in philosophy and economics describing how ideally rational agents learn and make decisions and (ii) to break artificial systems down into subagents, each of which is simpler and more interpretable. In this talk I will present a framework for modeling different agents and their relationships to each other called "Cartesian Frames", originally developed by Scott Garrabrant, and show how it can help with both strategies. I will also identify some core open questions.

This talk is based on joint work with Scott Garrabrant and Josiah Lopez-Wild.


Gitta Kutyniok (LMU/Mathematics): Reliable AI: Successes, Challenges, and Limitations

Artificial intelligence is currently leading to one breakthrough after the other, both in public life with, for instance, autonomous driving and speech recognition, and in the sciences in areas such
as medical diagnostics or molecular dynamics. However, one current major drawback is the lack of reliability of such methodologies.

In this lecture we will take a mathematical viewpoint towards this problem. We start with a brief introduction into this vibrant research area, focussing specifically on deep neural networks. We
will then survey recent advances, in particular, concerning generalization guarantees and explainability. Finally, we will discuss fundamental limitations of deep neural networks imposed by
today's hardware, which seriously affects their reliability.


Rolf Pfister (LMU/MCMP): An Approach to Solve the Abstraction and Reasoning Corpus by Means of Scientific Discovery

The Abstraction and Reasoning Corpus (ARC) by François Chollet (On the Measure of Intelligence, 2019, serves as an IQ test for machines that requires the implementation of abstraction and reasoning skills. Despite many diverse attempts, the best algorithms so far solve only 20% of the ARC test – this is also true for artificial neural networks and the currently famous transformer networks.

In my talk, I present a new approach on how ARC could be solved, based on methods of scientific discovery. Building on Mill's methods of agreement and difference, Gärdenfors' conceptual spaces, abduction and various other scientific methods, I outline a concept of how abstraction and reasoning could be successfully implemented and lead to the solution of ARC, at least theoretically.


Jan-Willem Romeijn (Groningen): Reverse-Engineering the Model

Data-driven or "machine learning" prediction methods generate predictions from data without explicitly stating their modeling assumptions. In fact there are substantial obstacles to bringing those assumptions out in the traditional format, because machine learning methods mostly do not rely on representations of their target systems. For this reason machine learning methods may be considered fully instrumentalist, offering predictions but no understanding.

In my talk I present a road map for how we can reverse-engineer the models inherent to any prediction method, and I discuss how far this gets us towards regaining a conception of such models as representations. The first part of the talk uses insights from Carnap's inductive logic program and de Finetti's views on statistics, and combines these with recent work on machine learning by Fong et al (2022) and Freiesleben (2023). This will result in an empiricist but not fully instrumentalist view on machine learning methods, in which we can identify certain structures as implicit models.

In the second part of the talk I will determine whether and in what ways the implicit models identified in machine learning methods represent their target, drawing on ideas about randomness and nonlinear dynamics. My preliminary conclusion is that the reverse-engineered models represent only superficially, and that they make visible how our traditional conception of models is ill-suited for a science of complex systems, echoing Breiman's seminal paper on the two cultures of statistics.


Gerhard Schurz (Düsseldorf): Meta-Inductive Justification of Universal Generalizations

The account of meta-induction (Schurz, 2019) proposes a two-step solution to the problem of induction. Step 1 consists in a mathematical a priori justification of the predictive optimality of meta-induction, upon which step 2 builds a meta-inductive a posteriori justification of object-induction based on its superior track record. Sterkenburg (2021) challenged this account by arguing that meta-induction can only provide a (non-circular) justification of inductive predictions for now and for the next future, but not a justification of inductive generalizations. In this talk I present a meta-inductive method that does provide an a posteriori justification of inductive generalizations, in the form of exchangeability conditions. In the final part of the talk, a limitation of the proposed method is worked out: while the method can justify weakly lawlike generalizations, the justification of strongly lawlike generalizations requires epistemic principles going beyond meta-induction over predictive success.


Daniela Schuster (Konstanz): Philosophical Considerations on Abstaining Machine Learning

A key question concerning the appropriate attribution of the notion of artificial intelligence is to what extent artificial systems can act autonomously and make decisions by themselves. In this talk, I want to focus on a largely neglected aspect of decision-making competence, which is the capability of actively refraining from deciding. For this, I will introduce different Machine Learning (ML) models that belong to a research area in computer science that is called “Abstaining Machine Learning” (AML) and I will categorize them into different types. Next, I will show how the different types of AML models behave differently in respect to certain epistemological questions. Most prominently, I will relate to the current epistemological debate about suspension of judgment and show how different AML models meet the standards of suspension differently well. Moreover, I show the varying behavior of the different types of models in respect to questions about autonomy, the possibility to explain outcomes, and conceptual

Tom Sterkenburg (LMU/MCMP): Statistical Learning Theory and Occam’s Razor

A central debate in the philosophy of science concerns the justification of Occam's razor, the principle that a preference for simplicity is conducive to successful inductive reasoning. In machine learning, there is a parallel and likewise unresolved debate around the question whether statistical learning theory can provide a formal justification for a simplicity preference in machine learning algorithms.

In this talk, I will present an epistemological perspective that synthesizes the arguments of the opposing camps in this debate, and yields a qualified means-ends justification of Occam's razor in statistical learning theory.


David Watson (KCL): Philosophical Aspects of Unsupervised Learning

Unsupervised learning algorithms are widely used for many important statistical tasks with numerous applications in science and industry. Yet despite their prevalence, they have attracted remarkably little philosophical scrutiny to date. This stands in stark contrast to supervised and reinforcement learning algorithms, which have been the subject of much critical analysis. I argue that unsupervised learning methods raise unique epistemological and ontological questions, providing data-driven tools for discovering natural kinds and distinguishing essence from contingency. This analysis goes some way toward filling the lacuna in contemporary philosophical discourse on unsupervised learning, as well as bringing conceptual unity to a heterogeneous field more often described by what it is not (i.e., supervised or reinforcement learning) than by what it is. I submit that unsupervised learning is not just a legitimate subject of philosophical inquiry but perhaps the most fundamental branch of all AI. However, an uncritical overreliance on unsupervised methods poses major epistemic and ethical risks.


Practical information

The workshop will take place in the main building of LMU Munich (Geschwister-Scholl-Platz 1), in room M209 (OG 2, second floor).
You can find a map of the building with room M209 marked at

How to get there by public transport:

Train: Arrival at München Hauptbahnhof (Munich Main Station) or München Ostbahnhof (Munich East Station), then take the S-Bahn to Marienplatz and from there the U3 or U6 to stop Universität. To plan your travel, visit .
S-Bahn (City train): All lines to Marienplatz, then U-Bahn.
U-Bahn (Metro): Line U3/U6, stop Universität.
Bus: Line 53, stop Universität.


Tom Sterkenburg (LMU/MCMP)


This workshop is supported by the German Research Foundation (DFG).