Main Navigation

Content

Epistemology and Theory of Machine Learning (30-31 May 2025)

Location: Geschw.-Scholl-Pl. 1 (A) - A 120

30.05.2025 at 09:15 – 31.05.2025 at 15:45

Idea & motivation

This is the second edition of the Epistemology and Theory of Machine Learning series started in 2023.

The rapid rise and huge impact of methods in machine learning raises important philosophical questions. There is, in particular, an increasing interest in questions of epistemology: how exactly do machine learning methods contribute to the pursuit of knowledge? Issues under this header include the justification and the fundamental limitations of such methods, their interpretability, and their implications for scientific reasoning in general. Since machine learning algorithms are, in the end, formal procedures, a formally-minded philosophical approach promises to be particularly fruitful for making progress on these issues. Such a study of modern machine learning algorithms can draw from a long tradition of work in formal epistemology and philosophy of science, as well as from work in computer science and the mathematics of machine learning. The aim of this workshop is to discuss epistemological questions of machine learning in this spirit.

This edition is organized by the Emmy Noether junior research group “From Bias to Knowledge: The Epistemology of Machine Learning”, funded by the German Research Foundation (DFG).

Confirmed speakers

Heather Champion (Western/Tübingen)
Vincent Fortuin (TUM & Helmholtz AI)
Michael Herrmann (Tübingen)
Moritz Herrmann (LMU/IBE)
Levin Hornischer (LMU/MCMP)
Sara Jensen (Oslo)
Donal Khosrowi (Hannover)
Luis Lopez (LMU/MCMP)
Silvia Milano (Exeter/TUM)
Anders Søgaard (Copenhagen)
Frauke Stoll (Dortmund)
Lisa Wimmer (LMU/Statistics)
[Carlos Zednik (Eindhoven): Cancelled] Timo Freiesleben (Tübingen)

Registration

Registration is free but required. You can register here.

Location

LMU München
Geschwister-Scholl-Platz 1
80539

Schedule

Day 1 (Friday 30 May)
09.15	Welcome
09:30 - 10:15	Levin Hornischer: "Robustness and trustworthiness in AI: A no-go result from formal epistemology"
10:15 - 11:00	Lisa Wimmer: "Uncertainty in machine learning - a pitfalls kind of talk"
11:00 - 11:15	Coffee break
11:15 - 12:00	Michael Herrmann: "A farewell to the bias-variance tradeoff – „thanks" to deep machine learning models?"
12:00 - 12:45	Sara Jensen: "The underdetermination of representational content in DNNs"
12:45 - 14:15	Lunch break
14:15 - 15:00	Heather Champion: "Beyond conceptual change: towards a mid-level theory of strong novelty for ML-enabled science"
15:00 - 15:45	Vincent Fortuin: "Philosophical reflections on Bayesian deep learning"
15:45 - 16:00	Coffee break
16:00 - 16:45	Timo Freiesleben: "What is bare-bones machine learning still missing for a general scientific methodology?"
18:30	Workshop dinner

Day 2 (Saturday 31 May)
09:30 -10:15	Donal Khosrowi: "Can generative AI produce novel evidence?"
10:15 - 11:00	Moritz Herrmann: "Machine learning as an empirical science: Conceptual approaches and practical insights"
11:00 - 11:15	Coffee break
11:15 - 12:00	Luis Lopez: "What did AlphaFold learn about protein folding?"
12:00 - 12:45	Anders Søgaard: "What does mechanistic interpretability buy the humanities?"
12:45 - 14:15	Lunch break
14:15 - 15:00	Frauke Stoll: "Empirical and theoretical links: Rethinking the role of DNNs in scientific understanding"
15:00 - 15:45	Silvia Milano: "Algorithmic profiling as a source of hermeneutical injustice"

Abstracts

Heather Champion (Western/Tübingen): Beyond conceptual change: towards a mid-level theory of strong novelty for ML-enabled science

Recent philosophical accounts of machine learning (ML)’s impact on science prioritize context-specific views of strong novelty as theoretical or conceptual revision. While it is uncontroversial that conceptual change represents a more general, widely relevant dimension of high impact, I argue that a “mid-level” theory of strong novelty has several upshots. Particularly, it should guide the design of new research projects with ML, including those that might aim at conceptual change. I present novelty desiderata that signal high impact to existing scientific knowledge or research direction. I illustrate these with cases of scientific discovery from various domains, such as economics and astrophysics. Furthermore, I define novelty relative to a discovering collective in contrast to purely psychological (individual) or historical (domain-wide) accounts.

While conceptual change makes broad scientific impact (e.g. concepts structure theory), local belief revisions make deep impact—either enlarging existing theory or changing the direction of research. First, eliminating deep ignorance generates awareness of useful patterns, evidence, or hypotheses. Surprising outcomes change an idea’s expected utility, while reducing utility “blindness” steers research when prior uncertainty is very high. Meanwhile, learning outcomes achieved with some independence of local theory that demarks or explains phenomena afford strong novelty: local theory captures the kind of prior information regarding phenomena that diminishes scientific impact. Thus, to fully appreciate the ways that ML advances science, philosophical consideration of novelty and ML must move beyond conceptual change.

Vincent Fortuin (TUM & Helmholz AI): Philosophical Reflections on Bayesian Deep Learning

Bayesian inference has long been celebrated as a normative model of rational belief updating, grounded in foundational work by de Finetti, Cox, Savage, and Wald. These philosophical justifications paint Bayesianism as uniquely suited to represent uncertainty, incorporate prior knowledge, and guide rational action under uncertainty. However, when Bayesian methods are brought into the domain of deep learning, these justifications come under strain. Priors are often chosen for computational convenience rather than sincere epistemic commitment, and inference is typically approximate, relying on techniques like variational inference or Monte Carlo sampling that may diverge significantly from ideal Bayesian reasoning.

In this talk, I explore the philosophical tensions that arise when Bayesian principles are applied in large-scale, high-dimensional machine learning settings. I argue that while traditional justifications falter under practical constraints, Bayesian deep learning can be reframed within a more pragmatic perspective. I consider several paths forward, including bounded rationality, engineering pragmatism, and the idea of a computational epistemology that accommodates approximation and heuristic reasoning. Rather than abandoning Bayesianism, we may need to reinterpret its role—not as a strict epistemic ideal, but as a guiding framework for navigating uncertainty in complex, computationally limited systems.

Timo Freiesleben (Tübingen): What is bare-bones machine learning still missing for a general scientific methodology?

Machine learning shows great promise for becoming an integral part of science. Yet in its raw form, it remains a one-trick pony: powerful at pattern recognition, but limited in its ability to support deeper scientific understanding and reasoning. In this talk, I explore several key add-ons that enhance the scientific utility of machine learning. For example: combining machine learning with causal modeling enables the integration of domain knowledge and the estimation of treatment effects; incorporating uncertainty quantification turns predictions into reliable guides for action; and applying interpretability methods provides insight into both the model and the underlying phenomenon. While these and other add-ons have begun to take shape in recent years, they still require significant refinement—and face important obstacles—before machine learning can fully realize its potential as a tool for scientific inquiry.

Michael Herrmann (Tübingen): A Farewell to the bias-variance tradeoff – „thanks" to deep machine learning models?

The phenomenon called double descent in machine learning (ML) has created some tension with the validity of the bias-variance tradeoff. According to the latter the relationship between prediction error and model complexity can be understood as a tradeoff. A good model can be chosen by trading off bias and variance as a function of model complexity leading to a U-shape in the test error curve (MSE) such that there is a sweet spot between under- and overfitting. However, empirical studies in the machine learning field with deep neural networks, but also with other statistical models (e.g. linear regression or decision trees) have shown a clear absence of the U-shaped curve. Instead, overparametrized models, beyond a certain model complexity, show a second decrease of test error creating the so called double-descent and the rather surprisingly good performance of heavily overfitted models. The weird consequence is that by increasing model complexity it is not necessary to trade bias for variance any more. This phenomenon and the debate about double descent in ML creates a twofold tension: Firstly, the validity of the bias-variance tradeoff as a theoretical principle is put into question. And secondly, taking ML's conclusions seriously, it displays a divergence between statistical intuition and modern ML phenomena, providing reasons for a further discontinuity between classical statistics and machine learning.

I want to argue that this twofold tension actually does not need to get solved but rather dissolved. At first, the dissolution consists in pointing out that nowhere in the debate a precise mathematical statement of the bias-variance tradeoff is given. Especially it is not spelled out under which assumptions the statement holds. This is detrimental since then the mathematical statement seems unconditionally valid which is clearly false. Furthermore, without reference to any assumptions the ML's claim of a weakening, restriction or a refutation of its validity is rendered impossible.
Secondly, for the dissolution more scrutiny is needed on the conceptualization of model complexity. This insight can be made clear by re-focussing on statistical knowledge about what is the „statistical currency" for formalizing model complexity. The double descent-plots in ML-papers often use the total number of parameters as a proxy for model complexity. But this decision remains unjustified. This is unfortunate because there is evidence that the double descent phenomenon is not merely a function of the total parameter count.

Moritz Herrmann (LMU/IBE): Machine Learning as an Empirical Science: Conceptual Approaches and Practical Insights

Treating machine learning as an empirical science grounded in experimental exploration and evaluation raises specific epistemic challenges. How can theoretical assumptions be translated into experimental designs that meaningfully address scientific questions? How should we account for the multiplicity of design choices and other sources of uncertainty? And in what sense can we generalize experimental findings? In this talk, I reflect on these questions through several experimental studies in both supervised and unsupervised learning. I focus in particular on the impact of design and analysis choices, and on the challenges of generalizing results from method comparison experiments. I argue that a narrow understanding of experimental research risks limiting the development of machine learning as a scientific field. To move forward, we need a broader perspective that embraces diverse types of research contributions and epistemic goals – but also accepts inconclusiveness as a valid and sometimes unavoidable outcome.

Levin Hornischer (MCMP): Robustness and Trustworthiness in AI: A No-Go Result From Formal Epistemology

A major issue for the trustworthiness of modern AI-models is their lack of robustness. A notorious example is that putting a small sticker on a stop sign can cause AI-models to classify it as a speed limit sign. This is not just an engineering challenge, but also a philosophical one: we need to better understand the concepts of robustness and trustworthiness. Here, we contribute to this using methods from (formal) epistemology and prove a no-go result: No matter how these concepts are understood exactly, they cannot have four prima facie desirable properties without trivializing. To do so, we describe a modal logic to reason about the robustness of an AI-model, and then we prove that the four properties imply triviality via a novel interpretation of Fitch's lemma. In particular, we show that standard methods to explicate robustness are not fully satisfying. A broader theme of the paper is to build bridges between AI and epistemology: Not only does epistemology provide novel methods for AI, but modern AI also provides new questions and perspectives for epistemology.

Sara Jensen (Oslo): The Underdetermination of Representational Content in DNNs

There is widespread hope of using ML models to make new scientific discoveries. As part of this, much effort is being put into establishing methods for interpreting the learned basis vectors in the latent spaces of deep neural networks (DNNs), motivated by the belief that the networks implicitly learn scientifically relevant representations and concepts from the data. By studying these learned representations, we may learn about new dependencies and structures in nature. There is disagreement regarding how concepts are represented in the hidden layers, specifically whether they are localised or distributed across nodes, and whether they are linear or non-linear. I argue that for distributed representations, the conceptual content of the representations will often be underdetermined. This happens for sets of variables which are defined in terms of each other, such as volume, temperature and pressure. This shows a crucial difference between classical scientific representations and representations in DNNs, which will likely have implications for the hope of extracting and learning new scientific concepts and dependencies from such models.

Donal Khosrowi (Hannover): Can Generative AI Produce Novel Evidence?

Researchers across the sciences increasingly explore the use of generative AI (GenAI) systems for various inferential and practical purposes, such as for drug and materials discovery and synthesis, or for reconstructing destroyed manuscripts and artifacts in the historical sciences. This paper explores a novel epistemological question: can GenAI systems generate evidence that provides genuinely new knowledge about the world or can they only produce hypotheses that we might seek evidence for? Exploring responses to this question, the paper argues that 1) GenAI outputs can at least be understood as higher-order evidence (Parker 2022) and 2) may also constitute de novo synthetic evidence. We explore the wider ramifications of this latter thesis and offer additional strictures on when synthetic evidence can be strong evidence for claims about the world.

Luis Lopez (MCMP): What did AlphaFold learn about protein folding?

AlphaFold2's unprecedented success in protein structure prediction has revolutionized structural biology, yet a fundamental question persists: what exactly has it learned to achieve such predictive power? In this talk, I argue that AlphaFold2 has effectively learned (and approximated) a folding "code" latent in the Protein Data Bank (PDB). Such an answer requires a careful (re)formulation of the protein folding problem (which consists of at least three interrelated sub-problems), a precise definition of folding "codes" (including a PDB folding "code"), and a closer look at AlphaFold2's algorithms. I begin by applying a set-theoretic definition of systems to proteins, differentiating their composition, environment, structure, and mechanisms. These distinctions are then used to reformulate the protein folding problem, which allows for the definition of a PDB folding "code." Such a "code" is defined within the endo-structure of proteins (i.e., the collection of relations among their components), assuming a typical aqueous environment, while disregarding mechanisms. The PDB folding "code" should not be confused with the overall function mapping entire sequences to their complete three-dimensional structures. Instead, it consists of a set of sub-mappings—analogous to the genetic code's codon-to-amino-acid assignments (save important differences)—that associate specific local and global sequence motifs with corresponding local or higher-order structural motifs. The overall mapping from sequence to structure emerges from the composition and context-dependent application of these sub-mappings. I further interpret AlphaFold2's performance as evidence of its capacity to learn these complex mappings, effectively addressing the intrinsic n-body problem of protein folding. The talk concludes with a discussion of prospective approaches toward the interpretability of such a folding "code."

Silvia Milano (Exeter): Algorithmic profiling as a source of hermeneutical injustice

It is a well-established fact that algorithms can be instruments of injustice. It is less frequently discussed, however, how current modes of AI deployment often make the very discovery of injustice difficult, if not impossible. In this paper, we focus on the effects of algorithmic profiling on epistemic agency. In particular, we show how algorithmic profiling can give rise to epistemic injustice through the depletion of epistemic resources that are needed to interpret and evaluate certain experiences. By doing so, we not only demonstrate how the philosophical conceptual framework of epistemic injustice can help pinpoint systematic harms from algorithmic profiling, but we also identify a novel source of hermeneutical injustice that to date has received little attention in the relevant literature.

Anders Søgaard (Copenhagen): What does Mechanistic Interpretability buy the Humanities?

I’ll talk about mechanistic interpretability and the humanities, arguing that they make for a better fit than first generation XAI and the natural sciences. I take mechanistic interpretability to refer to global-forward XAI approaches to inference opacity that focus on higher-order explanations. I take science, broadly construed, to be in the business of providing transition or property theories, with transition theories being almost exclusively proposed within the natural sciences. Many have shown that standard XAI tools cannot provide faithful transition theories of DNNs. Property theories do not seem to run into the same problem, but XAI tools have limited expressive power. The property theories floated in the humanities tend to have higher-order nature. To the extend that mechanistic interpretability can detect the properties of interest to the humanities, DNNs and mechanistic interpretability may have explanatory value in the humanities.

Frauke Stoll (Dortmund): Empirical and Theoretical Links: Rethinking the Role of DNNs in Scientific Understanding

What role can deep neural networks (DNNs) play in advancing scientific understanding? In this talk, I argue that DNNs can support the early stages of understanding by uncovering empirical patterns and regularities, but that their contribution to deeper explanatory understanding depends on overcoming two distinct kinds of link uncertainty. Drawing on Emily Sullivan’s account, empirical link uncertainty refers to the degree of evidence connecting a model to its target phenomenon. But as the case of the Rydberg formula illustrates, empirical connection alone is insufficient: explanatory understanding also depends on theoretical link certainty—the integration of models into a broader theoretical framework. DNNs, even when supplemented by explainable AI (XAI) methods, largely operate at the instrumental and descriptive levels, clarifying what patterns are present and sometimes how they emerge, but not why they hold. This positions DNNs as akin to phenomenological models, which capture surface regularities without revealing underlying mechanisms. Yet unlike phenomenological models, DNNs are doubly opaque: they obscure both mechanisms and the regularities themselves, lacking an interpretable mathematical structure.I propose that opacity in DNNs should be understood hierarchically—as what-, how-, and why-opacity—each posing distinct challenges for understanding. While empirical link certainty facilitates progress across these levels, only theoretical embedding can transform DNN outputs from descriptive tools into sources of explanatory understanding. Situating DNNs within the broader modeling literature thus clarifies both their potential and their limitations as instruments of scientific inquiry.

Lisa Wimmer: Uncertainty in machine learning - a pitfalls kind of talk

It has become consensus in the ML community that uncertainty-aware predictions are strictly more informative than, and thus preferable over, point estimates. For example, obtaining the output "healthy patient, with 56% certainty" from a hypothetical diagnostic system seems to inform a downstream human decision more effectively than a mere "healthy patient".

While I agree with the general sentiment, I identify several unresolved issues with the current state of uncertainty estimation which risk undermining the endeavor. I will use this talk to shed light on some of these problems that go beyond improving algorithms, and argue that we have more conceptual fish to fry

Practical information

The workshop will take place in the main building of LMU Munich (Geschwister-Scholl-Platz 1), in room A 120.

You can find a map of the building with room A 120 marked at
https://www.lmu.de/raumfinder/#/building/bw0000/map?room=000001209_

How to get there by public transport

Train: Arrival at München Hauptbahnhof (Munich Main Station) or München Ostbahnhof (Munich East Station), then take the S-Bahn to Marienplatz and from there the U3 or U6 to stop Universität. To plan your travel, visit https://mvv-muenchen.de .

S-Bahn (City train): All lines to Marienplatz, then U-Bahn.

U-Bahn (Metro): Line U3/U6, stop Universität.

Bus: Line 53, stop Universität.

Organizers

Timo Freiesleben, Katia Parshina, and Tom Sterkenburg.

Acknowledgement

This workshop is supported by the German Research Foundation (DFG).

Links and Functions

Breadcrumb Navigation