Bayesian Epistemology: Perspectives and Challenges (10-14 August 2020)
The conference on 12-14 August 2020 is preceded by a Summer School on 10-11 August 2020.
Idea & Motivation
Bayesian epistemology remains the dominant account of rational beliefs, it underpins the dominant account of decision making in science and beyond, as well as many of our statistical methods.
While important applications continue to to emerge, the work on the foundations of Bayesian epistemology never stops and a number of challenges are emerging.
The aim of this conference is bring together scholars exploring applications, challenges and foundations of Bayesian epistemology.
Topics of interest (in alphabetic order) are not limited to:
- Bayesianism and Artificial Intelligence
- Bayesian Networks
- Bounded Rationality
- Evidence Aggregation
- Foundational Aspects of Bayesian Statistics
- Higher Order Evidence
- Imprecise Bayesian Approaches
- Interpretations of Probabilities
- Judgement Aggregation
- Maximum Entropy (Applications, Inference and Methods)
- Multi Agent Epistemology
- Objective Bayesian Epistemology
- Principles of Bayesianism (Conditionalisation, Probabilism, Total Evidence)
- Updating Procedures (Jeffrey, KL, L&P)
Speakers for the Conference
Speakers for the Summer School
- Leah Henderson (Groningen)
- James Joyce (Michigan)
- Anna Mahtani (LSE)
- Gerhard Schurz (Düsseldorf)
- Naftali Weinberger (MCMP, LMU Munich)
- Jürgen Landes (MCMP, LMU Munich)
In order to register for the conference, please send an email to Juergen.Landes@lrz.uni-muenchen.de with the subject line: Registration: Bayesian Epistemology.
The conference will be held online.
Summer School 10.08 - 11.08.2020
|09:40 - 10:00||Welcome|
|10:00 - 11:00||Gerhard Schurz: Metainduction - Basic Account: A New Solution to the Problem of Induction?|
|11:00 - 11:15||Break|
|11:15 - 12:15||Anna Mahtani: The objects of credence: propositions|
|12:15 - 13:45||Lunch Break|
|13:45 - 14:45||Leah Henderson: Hierarchical Bayesian Modelling - Theory|
|14:45 - 15:45||Jürgen Landes: Objective Bayesian Epistemology|
|15:45 - 16:00||Break|
|16:00 - 17:00||James Joyce: TBA|
|10:00 - 11:00||Gerhard Schurz: Metainduction - Extensions of the account|
|11:00 - 11:15||Break|
|11:15 - 12:15||Anna Mahtani: The objects of credence: two-dimensionalism|
|12:15 - 13:45||Lunch Break|
|13:45 - 14:45||Leah Henderson: Hierarchical Bayesian Modelling - Applications|
|14:45 - 15:45||Naftali Weinberger: Causal Modelling|
|15:45 - 16:00||Break|
|16:00 - 17:00||James Joyce: TBA|
Conference 12.08. - 14.08.2020
|09:40 - 10:00||Welcome|
|10:00 - 11:00||Gerhard Schurz: Meta-inductive Probability Aggregation|
|11:00 - 11:15||Break|
|11:15 - 12:15||Seamus Bradley: Learning through ignoring the most wrong|
|12:15 - 13:45||Lunch Break|
|13:45 - 14:30||Mario Günther & Borut Trpin: Bayesians still don't learn from conditionals||Alicja Kowalewska: Story coherence with bayesian networks|
|14:35 - 15:20||Miriam Bowen: Comparative beliefs and Imprecise Credences||Pavel Janda: Accuracy and Games with Absentmindedness|
|15:25 - 15:35||Break|
|15:35 - 16:20||Palash Sarkar & Prasanta Bandyopadhyay: Simpson's Paradox and Causality||Sven Neth: Rational Aversion to Information|
|16:25 - 17:10||Francesca Zaffora Blando: Pride and Probability: A Tale of (Co-)Meagre Success||Ted Poston: Coherence and Confirmation|
|09:30 - 10:30||Anna Mahtani: The Ex Ante Pareto Principle and Frege's puzzle|
|10:30 - 10:45||Break|
|10:45 - 11:30||Richard Lohse: A general Worry about the Accuracy first Programme||Krzysztof Mierzewski: Probabilistic Stability and Statistical Learning|
|11:35 -12:25||Andree Weber: Conciliatory Views on Peer Disagreement|
|12:25 - 14:00||Lunch Break|
|14:00 - 15:00||Leah Henderson: Emergent Compatibilism for IBE and Bayesianism|
|15:00 - 15:15||Break|
|15:15 - 16:00||Alex Meehan: Kolmogorov Conditionalizers Can Be Dutch Booked||Aviezer Tucker: Testimony and the analysis of disinformation|
|16:05 - 16:50||Snow Zhang: Trilemma about Deference, Judgment Aggregation and Disagreement||Ted Poston: Coherence and Confirmation|
|16:55 - 17:40||David Kinney: Why Average When You Can Stack?|
|09:30 - 10:15||Mario Günther: An Analysis of Actual Causation|
|10:15 - 10:30||Break|
|10:30 - 11:15||Michael Nielsen & Kenny Easwaran: Learning by Maximizing Expected Accuracy|
|11:20 -12:05||Patrick Klösel: Graphical Causal Modeling in Econometrics||Rafal Urbaniak: Imprecise credences can increase accuracy wrt. claims about expected frequencies|
|12:05 - 13:50||Lunch Break|
|13:50 - 14:35||Andrea G. Ragno: A Vindication of Rudner-Steele's Argument||Patryk Dziurosz-Serafinowicz: The Value of Uncertain Evidence|
|14:40 - 15:25||Margherita Harris: Model-based Robustness Analysis||Michal Godziszewski: Fairness and Justified Representation in Judgment Aggregation and Belief Merging|
|15:25 - 15:40||Break|
|15:40 - 16:40||James Joyce: TBA|
|16:40 - 16:45||Closing Words|
Abstracts Summer School
Hierarchical Bayesian modelling is a technique which allows for learning at multiple levels of abstraction. I will introduce the fundamentals of these models and some of the practical challenges in using them.top
Hierarchical Bayesian models have been used to tackle a variety of problems in cognitive science and in philosophy of science. I will give an overview of some of these applications.top
Bayesians hold that rational degrees of belief are obtained by (Jeffrey) updating prior probabilities. They are, however, split about the choice of prior probabilities. Objective Bayesians insist that the relation between evidence and rational beliefs is, to a degree, objective. Hence, the choice of prior probabilities has to be objective, in some sense. In my talk, I will first briefly motivate and introduce this epistemology, then talk about recent advances from objectivists and end with some open problems.
A wide range of different disciplines work with the idea of credences (economics, formal epistemology, decision theory, philosophy), but little attention has been paid to the question: what are the objects of credence? In contrast, an analogous question - what are propositions? - has received a great deal of attention in the philosophy of language. I show how the two issues relate, and discuss some of the implications for users of the credence framework.
A prominent theory of propositions has been given by David Chalmers, and this is his two-dimensionalism account. I explain this account, and then consider how it can be applied by users of the credence framework to answer the question: what are the objects of credence? I discuss some implications of applying two-dimensionalism in this way, and conclude that it leads to some serious problems.
Gerhard Schurz (Düsseldorf): First lecture: Metainduction - Basic Account: A New Solution to the Problem of Induction?
The problem of induction, or Hume's problem, consists in the apparent impossibility of establishing a non-circular justification of induction, i.e. of the transfer of observed regularities from the past to the future. Hume's problem exemplifies in particular sharpness the regress problem of traditional foundationalism. This talk introduces to a new account to Hume’s problem and to the regress problem in general: the method of optimality justifications. This account concedes the force of Hume’s sceptical argu-ments against the possibility of a non-circular demonstration of the reliability of in-duction. What it demonstrates is that one can nevertheless give a non-circular justifi-cation of the optimality of induction, more precisely of meta-induction.
Results in mathematical learning theory have shown that is impossible to demon-strate the optimality of an 'object-level' prediction method in comparison to all other possible methods. This is the reason why Reichenbach's "best alternative" account of induction fails. The break-through of the optimality program lies in its application at the level of meta-methods. Meta-inductive methods use all cognitively accessible ob-ject-level methods and their track record as their input and attempt to construct from them an optimal method. Based on results in machine learning it can be demonstrated that there are meta-inductive prediction strategies whose predictive success are long-run optimal in all possible worlds in regard to all accessible methods, with tight upper bounds for short run losses that quickly vanish when the number of predictions in-creases. Moreover, the a priori justification of meta-induction generates a non-circular a posteriori justification of object-induction.
Literature: Gerhard Schurz: Hume's Problem Solved: The Optimality of Meta-induction. MIT Press, Cambridge/MA.
The basic account of meta-induction has two restrictions:
(i) it is restricted to prediction game with a finite number of competing methods of predictions that are accessible to the meta-inductivist, and
(ii) it assumes the events to be predicted are real-valued so that probabilities or weighted averages of events can be predicted.
There are two important extensions that overcome these restrictions. The finiteness restriction can be relaxed by allowing that the class of competing prediction methods is allowed to grow unboundedly. A universal long-run optimality result is provable even for this case.
Restriction (ii) is relaxed in so-called discrete prediction games. In these games predictions have to coincide with possible events. These games can handled by allow-ing for probabilistic prediction methods and considering their expected, as opposed to their actual success. While the optimality of this method (dominant in machine learn-ing) is not fully universal, a method of collective meta-induction has been developed that is universally optimal.
Discrete prediction games are of particular importance for a further generalization of the meta-induction approach: its generalization from prediction games to action games.
Finally it is shown that besides their universality a variety of dominance results can be established for meta-induction. These results may provide a new solution to the no free lunch problem.
In my talk I cover some of the basics of graphical causal models, focusing on bridge principles for linking causal hypotheses to probability distributions, and the role of such principles in causal search and in eliminating confounding. I then discuss the interpretation of probability in causal models and criticize an argument relying on the uniform assignment of probabilities to causal parameters.
In "Bayesian Orgulity" (2013), Belot argues that Bayesian agents are plagued by a pernicious type of epistemic immodesty. By the very nature of the Bayesian framework, they are bound to invariably expect that their beliefs will converge to the truth—and this is so even when, from a topological point of view, there are many data streams on which they will in fact fail to be inductively successful. In this talk, I will propose one possible strategy for evading Belot's worry. By appealing to the theory of algorithmic randomness, I will show that Belot's objection does not apply if one restricts attention to computable open-minded Bayesian agents and computable inductive problems. More precisely, we will see that, when a Bayesian agent with a computable open-minded prior estimates the values of a computable random variable, their successive estimates are guaranteed to converge to the truth both almost surely and on a topologically large set of data streams (i.e., on a co-meagre set of data streams).
While Imprecise Probabilities (IP) are, in some respects, a modest and useful generalisation of the standard Bayesian probabilistic approach to epistemology, IP has some issues that are perhaps holding back its widespread adoption. One such problem is “Belief Inertia”: the alleged inability of agents with imprecise priors to learn from evidence. In this paper I will demonstrate that a small change to the rule for updating imprecise probabilities solves the problem of belief inertia.
Michal Godziszewski (MCMP): Fairness and Justified Representation in Judgment Aggregation and Belief Merging
We put forth an analysis of actual causation. The analysis centers on the notion of a causal model that provides only partial information as to which events occur, but complete information about the dependences between the events. The basic idea is this: c causes e just in case there is a causal model that is uninformative on e and in which e will occur if c does. Notably, our analysis has no need to consider what would happen if c were absent. We show that our analysis captures more causal scenarios than any counterfactual account to date.
One of the open questions in Bayesian epistemology is how to rationally learn from indicative conditionals (Douven, 2016). Eva et al. (2019) propose a strategy to resolve this question. They claim that their strategy provides a "uniquely rational response to any given learning scenario". We show that their updating strategy is neither very general nor always rational. Even worse, we generalize their strategy and show that it still fails. Bad news for the Bayesians.
In science, obtaining the same result through different means (i.e. obtaining a ‘robust’ result) is often seen as a valid way to further confirm a hypothesis. The Bayesian should of course have something to say about the logic underpinning this method of confirmation. But, as Schupbach (2018) persuasively argues, Bayesian accounts of robustness analysis (RA) which rely on probabilistic independence to explicate the notion of RA diversity are in many cases woefully inadequate. Schupbach’s explanatory account of RA is a promising attempt to fill this gap. Indeed, by having ‘as its central notions explanation and elimination’, this account fits nicely with many empirically driven cases of RA in science, while at the same time providing important normative implications.
In this talk, however, I will assess Schupbach’s further claim that his explanatory account of RA ‘applies to model-based RAs just as well as it does to empirically driven RAs’. I will argue that applying his explanatory account of RA in the context of models is considerably more difficult than Schupbach suggests. Finally, I will consider what lessons we might learn from this difficulty, lessons about the viability of model-based robustness analysis as a method of confirmation.
A number of different views of the relationship between Inference to the Best Explanation (IBE) and Bayesianism have been proposed. I argue for a position I call ‘emergent compatibilism’, according to which the explanatory considerations involved in IBE emerge from an independently motivated Bayesian account. I discuss the assumptions behind this view, and show how it allows the Bayesian account to shed light on the relationship between different explanatory virtues.
Formal and social epistemologists have devoted significant attention to the question of how to aggregate the credences of a group of agents who disagree about the probabilities of events. Most of this work focuses on strategies for calculating the mean credence function of the group. In particular, Moss (2011) and Pettigrew (2019) argue that group credences should be calculated by taking a linear mean of the credences of each individual in the group, on the grounds that this method leads to more accurate group credences than all other methods. In this paper, I argue that if the epistemic value of a credence function is determined solely by its accuracy, then we should not generate group credences by finding the mean of the credences of the individuals in a group. Rather, where possible, we should aggregate the underlying statistical models that individuals use to generate their credence function, using "stacking" techniques from statistics and machine learning first developed by Wolpert (1992). My argument draws on a result by Le and Clarke (2017) that shows the power of stacking techniques to generate predictively accurate aggregations of statistical models, even when all models being aggregated are highly inaccurate.
The accuracy-first programme tries to lay the normative foundations of Bayesianism. In particular, accuracy-firsters argue that accuracy, i.e. closeness to truth, is the sole source of epistemic value and that credences satisfying the Bayesian norms are somehow systematically more accurate than others. An important part of this task is to justify a mathematical characterisation of accuracy that delivers the desired result. In this talk, I question the possibility of such a justification.
Typically, accuracy-firsters justify their characterisation by arguing that the different aspects of it are “intuitive” or “natural” or that the alternatives are “absurd”. This line of argument seems to presuppose that characterising accuracy is about analysing the ordinary language concept of accuracy, where this analysis is justified by linguistic intuition. If this is indeed the case, then a general worry is not far away: Is our ordinary language concept of accuracy really determinate and precise enough to warrant the rather narrow kind of mathematical characterisation that accuracy-firsters require? A natural suspicion is that it is not.
To corroborate this suspicion, I examine two influential characterisations of accuracy. First, Joyce’s famous original one in the 1998 paper that launched the accuracy-first programme. Second, the latest one given by Richard Pettigrew in his 2016 book about the accuracy-first programme. In both cases, I pick out one central aspect of the characterisation and show that it is doubtful that (linguistic) intuition supports it in the right way. In fact, I argue that intuition even undermines it.
Moreover, I identify a common property of both characterisations that causes the trouble: Both characterisations entail that distance to truth has the mathematical property of being strictly convex, but this does not seem to be part of our ordinary language concept of accuracy. Since strict convexity is apparently indispensable for the desired mathematical results, this is a significant challenge for the accuracy-first programme. Finally, I briefly explore the option of taking the characterisations of accuracy to be fruitful precisifications instead of conceptual analyses. In this case, accuracy-firsters would be less bound to the ordinary language concept. Instead, they would have to argue for the theoretical fruitfulness of their characterisations.
A vexing question in Bayesian epistemology is how an agent should update on evidence which she assigned zero prior credence. Some theorists have suggested that she should update by Kolmogorov conditionalization (a norm based on Kolmogorov's theory of regular conditional distributions). However, it turns out that, in some situations, a Kolmogorov conditionalizer will plan to always assign a posterior credence of zero to the evidence she learns. Intuitively, such a plan is irrational and easily Dutch bookable. In this talk, based on joint work with Snow Zhang, I propose a revised norm, Kolmogorov-Blackwell conditionalization, which avoids this problem. I present our main result, a Dutch book and converse Dutch book theorem for this new norm, and relate it to the results of Rescorla (2018).
Leitgeb offered an acceptance rule based on the notion of probabilistically stable hypotheses: that is, hypotheses that maintain sufficiently high probability under conditioning on new information. According to the stability rule, a proposition ought to be accepted whenever it is logically entailed by some probabilistically stable hypothesis. When applied to discrete probability spaces, the stability rule guarantees logically closed and consistent belief sets, and it suggests a promising account of the relationship between subjective probabilities and qualitative belief.
Yet, most natural inductive problems—particularly those commonly occurring in statistical inference—are best modelled with continuous probability distributions and statistical models with a richer internal structure. In this talk, I discuss the behaviour of Leitgeb’s stability rule on Bayesian statistical models. I show that, for a very wide class of probabilistic learning problems, Leitgeb's rule yields a notion of acceptance that either fails to be conjunctive (accepted hypotheses are not closed under finite conjunctions) or is trivial (only hypotheses with probability one are accepted). These results apply to most canonical Bayesian models involving exchangeable random variables, such as parametric models with Dirichlet priors over discrete distributions (and, in particular, to every method in Carnap’s family of inductive methods). Analogous results also affect refined notions of stability which take into account the evidence structure in the learning problem at hand.
These results exhibit a serious tension for the stability rule: in Bayesian statistical models, important properties of priors that are conducive to inductive learning—open-mindedness, as well as certain symmetries in the agent’s probability assignments—act against conjunctive belief. We will see that the main selling points of the stability account of belief—its good logical behaviour and its close connection to the Lockean thesis—do not survive the passage to richer probability models.
Is more information always better? Or are there some situations in which more information can make us worse off? Good (1966) famously argued that expected utility maximizers should always accept more information, provided that the information is cost-free. I argue that Good presupposes that we are certain that we will update on any new information by Bayesian conditionalization. If we relax this assumption and assign a non-zero probability to Non-Bayesian updating, then it can be rational to reject free information – from both a pragmatic and an epistemic point of view.
Michael Nielsen (ANU) & Kenny Easwaran (Texas A&M University): Learning by Maximizing Expected Accuracy
What is the correct way to update probability judgments in response to new evidence? A platitude of Bayesian epistemology is that rational updating goes by conditionalization. But that's not right---at least not in general. The standard arguments supporting conditionalization show that it is the correct policy for updating probabilities only in certain learning situations, namely those in which evidence can be represented as a finite partition of the agent's space of possibilities and in which the agent observes which member of this partition is true. A number of recent papers in formal epistemology have pointed out that conditionalization is not an adequate learning procedure in scenarios that involve non-partitional evidence or non-factive learning (Salow2017, Schoenfield2017, Carr2019, Gallow2019a,b, among others.) How do the arguments for conditionalization generalize to non-partitional and non-factive learning?
The current results addressing this question assume that agents assign probabilities to finitely many propositions. Although this assumption simplifies the mathematics a great deal, it's doesn't seem to us to be warranted in general. There are many natural scenarios, which we discuss in the paper, in which it is reasonable for agents to assign probabilities to infinitely many propositions. How do the arguments for conditionalization generalize to cases like these?
In this paper, we pursue both of the aforementioned questions simultaneously in a very general expected accuracy framework. Our first main result generalizes a well-known theorem of Greaves and Wallace (2006). They showed that in finite probability spaces the learning procedure that maximizes expected accuracy for strictly proper scoring rules is conditionalization, provided the updating is in response to evidence that can be represented by a finite partition. We generalize this result to infinite probability spaces with information represented by a sub-sigma-algebra of the space. One can view this result as a companion to recent Dutch book theorems due to Rescorla (2018). Rescorla's theorems characterize the exact same learning procedure that we deal with here (which he calls "Kolmogorov conditionalization'") in terms of invulnerability to Dutch book. Our result also answers some questions about maximizing expected accuracy in infinitary settings that Easwaran (2013) raises.
We go on to generalize further by allowing evidence of a more general sort. This will include learning scenarios where agents are certain they will receive evidence consisting of a proposition from the original sigma-algebra, where the possible pieces of evidence do not form a partition. But it will also include learning scenarios where the evidence is not composed of propositions at all. We don't give an exhaustive account of the possible interpretations of this general framework, but we argue that it is able to account for several important issues raised in the recent formal epistemology literature (see citations above).
It is a widespread intuition that the coherence of independent reports provides a powerful reason to believe that the reports are true. Formal results by Huemer (1997), Olsson (2002, 2005), and Bovens and Hartmann (2003) prove that, under certain conditions, coherence cannot increase the probability of the target claim. These formal results are taken to have significant epistemic upshot. In particular, they are taken to show that reports must first individually confirm the target claim before the coherence of multiple reports offers any positive confirmation. In this paper, I dispute this epistemic interpretation. The formal results are consistent with the idea that the coherence of independent reports provides a powerful reason to believe that the reports are true even if the reports do not individually confirm prior to coherence. Once we see that the formal discoveries do not have this implication, we can recover a model of coherence justification consistent with Bayesianism and these results. This paper, thus, seeks to turn the tide of the negative findings for coherence reasoning by defending coherence as a unique source of confirmation.
After tracking through the formal debate and its epistemic implications, I introduce as a constant a term ‘C’ that is learned in addition to learning the conjunctive fact that multiple coherent reports are true. I then investigate whether the coherence of multiple independent testimonies increases probability when individually the testimonies do not. Mathematical models vindicate this intuition. I discuss this result and show how it fits with BonJour’s original judgment that coherence can provide confirmation apart from what is now known as individual reliability. I also provide a second model of confirmation by coherence that challenges an assumption with how conditionalization is applied. Conditionalization is typically understood as updating on every single item of evidence that is learned. However, it’s compatible with conditionalization that one updates only on a mass of evidence. This second understanding of conditionalization is compatible with confirmation by coherence. However, questions remain. Future research is needed to investigate the nature of 'C' and whether the widespread assumption that coherence is model as the conjunction of individual items of evidence is accurate for the role coherence plays in epistemology.
Epistemic risk refers to the danger of committing to a wrong claim due to the state of uncertainty of our knowledge. To accept or reject a hypothesis, Rudner (1953) argued that ”scientist qua scientist makes value judgments” in virtue of this risk. Instead, Levi (1960) believed that there was no ethical trade-off between science and socio-political goals because of such risk and thus he retained the ideal of a value-free (VFI) science. Levi defends this position within the Bayesian framework.
An important stance among this debate is offered by Steele (2012) who specifies Rudner’s most important thesis into the following: ”scientist qua policy advisor makes value judgments”. In fact, the way scientists are instructed to transmit information is (very often) either too orthodox or it consists of a different probability function and does not properly represent their credal state. In other words, the degree of belief held by a science advisor is transmitted in a crude shape for different reasons.
This means that scientists cannot avoid making value judgments. Steele’s claim seems to be well-founded because of the pressure policy-makers put on scientists. Therefore, one might ask: if there was no immediate policy consequences which determined scientists’ prior beliefs (like in the case of policies which act upon climate change), could science be value-free with respect to Steele’s argument? In this article, we will outline why it is difficult to talk about VFI in cases where immediate policy consequences are absent.
In the first section, we will overview Steele’s main argument. In the second section, we will introduce the case where policy consequences do not inform scientists’ priors and explain why Levi’s contention with Rudner seems prima facie to be plausible. Finally, in the last section, we will object Levi’s view and two further arguments of VF with two different objections. Both will criticise an objective Bayesianism and its view on priors focusing on the work of Wheeler and Williamson (2011).
Palash Sarkar (Indian Statistical Institute) & Prasanta Bandyopadhyay (Montana State University): Simpson's Paradox and Causality
Rafal Urbaniak (Gdańsk): Imprecise Credences Can Increase Accuracy wrt. Claims about Expected Frequencies
The evidence that we get from peer disagreement is especially problematic from a Bayesian point of view since the belief revision caused by a piece of such evidence cannot be modelled along the lines of Bayesian conditionalisation. In my talk, I will explain how exactly this problem arises, what features of peer disagreements are responsible for it, and what lessons should be drawn for both the analysis of peer disagreements and Bayesian conditionalisation as a model of evidence acquisition.