Recent conversations with friends & colleagues about probability theory reminded me of conversations with a friend of mine in graduate school about the supposed virtues of making one's own reasoning in one's daily life more systematic through Bayesian inference. The basic idea, in rough qualitative terms, is that one's belief in a hypothesis can be quantified through a prior probability, and when one observes some data related to that hypothesis, one can use the probabilities of observing that data when that hypothesis does or does not hold to update one's belief in (becoming the posterior probability of) that hypothesis based on the data. An example of quantitative & qualitative explanations can be found on the site LessWrong [LINK]. However, even in graduate school and again more recently, I realized that it is very easy for one to talk oneself into believing that one is using systematic Bayesian reasoning while actually just rationalizing one's own prior beliefs & changes in beliefs after the fact. This can be illustrated mathematically in a few ways that are not exhaustive. Follow the jump to see more.
Bayes's theorem
It is good to first review Bayes's theorem. Given a hypothesis \( \mathrm{H} \) and data \( \mathrm{D} \), Bayes's theorem is \[ \mathrm{P}(\mathrm{H}|\mathrm{D}) = \frac{\mathrm{P}(\mathrm{D}|\mathrm{H})}{\mathrm{P}(\mathrm{D}|\mathrm{H})\mathrm{P}(\mathrm{H}) + \mathrm{P}(\mathrm{D}|\neg\mathrm{H})} \mathrm{P}(\mathrm{H}) \] where \( \mathrm{P}(\mathrm{H} \)) represents the degree of belief in the hypothesis before seeing the data and \( \mathrm{P}(\mathrm{H}|\mathrm{D}) \) represents the degree of belief in the hypothesis after seeing the data.
It is also good to formalize this in the specific case of a hypothesis being represented by a parameter that is taken to be a random variable \( A \) with values \( \alpha \) and data being represented by a random variable \( X \) with values \( x \). In this case, Bayes's theorem reads as \[ f_{A|X = x}(\alpha) = \frac{f_{X|A = \alpha}(x)}{\int f_{X|A = \alpha'}(x) f_{A}(\alpha')~\mathrm{d}\alpha'} f_{A}(\alpha) \] in terms of the relevant unconditional & conditional probability densities (and \( \alpha' \) in the denominator is a dummy variable for integration). Notably, the factor \( \frac{f_{X|A = \alpha}(x)}{\int f_{X|A = \alpha'}(x) f_{A}(\alpha')~\mathrm{d}\alpha'} \) multiplying the prior belief \( f_{A}(\alpha) \) is a function of \( (x, \alpha) \), meaning that for fixed \( x \) (observed data), the prior belief in the parameter \( \alpha \) is modified by a pointwise multiplication to yield the posterior belief. This will become important for one of the specific dangers discussed in this post.
Danger 1: Rationalization only after the fact
If one is willing to rationalize after the fact, then given fixed data \( x \), one can always dishonestly choose a different form for the likelihood ratio \( \frac{f_{X|A = \alpha}(x)}{\int f_{X|A = \alpha'}(x) f_{A}(\alpha')~\mathrm{d}\alpha'} \), which is a function of \( (x, \alpha) \), such that the prior belief \( f_{A}(\alpha) \) is always modified to fit a certain preconceived posterior belief \( f_{A|X = x}(\alpha) \). This is the least honest danger, so it can be dealt with the most easily.
Danger 2: Rigid prior beliefs
Going back to the broader form of Bayes's theorem, even without appealing to the formal structure of probability densities, one can see that if \( \mathrm{P}(\mathrm{H}) \) is either 0 or 1, meaning that one is completely sure even in the absence of data that the original hypothesis is true or false, then as long as the data is possible to be observed, meaning that \( \mathrm{P}(\mathrm{D}) \neq 0 \), then \( \mathrm{P}(\mathrm{H}|\mathrm{D}) = \mathrm{P}(\mathrm{H}) \), meaning that no amount of additional data will ever change the initial belief. This is also true in a limiting sense too, though the details somewhat depend on the mathematical structure of the prior belief: the closer that the prior belief \( \mathrm{P}(\mathrm{H}) \) is to either 0 or 1, the more contradictory data is required to significantly push the posterior belief \( \mathrm{P}(\mathrm{H}|\mathrm{D}) \) away from the prior belief \( \mathrm{P}(\mathrm{H}) \). This is a little more honest but also not too hard to deal with.
Danger 3: Changing one's mind in a way that mathematically cannot be Bayesian
This is the most honest way that I can think of to mistakenly ascribe the changing of one's mind to Bayesian updating, so it requires a lot more mathematical care to deal with. Suppose that one has a prior belief \( f_{A}(\alpha) \) and then, upon observing data \( x \), one decides that the structure of one's prior beliefs was fundamentally sound but the mean simply needed to shift by an amount \( \beta \) that depends on the observed data \( x \). In particular, shifting \( \mathrm{E}(\mathrm{A}) \) to \( \mathrm{E}(\mathrm{A}) + \beta(x) \) but otherwise keeping the structure of the belief the same formally means that the posterior belief can be written as \( f_{A}(\alpha - \beta(x)) \).
Intuitively, this kind of changing of one's mind should be consistent with Bayesian updating, but mathematically, it is possible to prove that such a changing of one's mind cannot be done within the bounds of Bayesian updating. Notably, the likelihood ratio \( \frac{f_{X|A = \alpha}(x)}{\int f_{X|A = \alpha'}(x) f_{A}(\alpha')~\mathrm{d}\alpha'} \) performs a pointwise multiplication of \( f_{A}(\alpha) \) at each \( \alpha \). By contrast, changing from \( f_{A}(\alpha) \) to \( f_{A}(\alpha - \beta(x)) \) can be done through the linear translation operation \[ f_{A}(\alpha - \beta(x)) = \exp\left(-\beta(x)\frac{\partial}{\partial \alpha}\right)f_{A}(\alpha) \] which can be evaluated as a Taylor series \[ f_{A}(\alpha - \beta(x)) = \sum_{n = 0}^{\infty} \frac{1}{n!} \left(-\beta(x)\frac{\partial}{\partial \alpha}\right)^{n} f_{A}(\alpha) \] or a convolution \[ f_{A}(\alpha - \beta(x)) = \int \delta(\alpha - \beta(x) - \alpha')f_{A}(\alpha')~\mathrm{d}\alpha' \] or a Fourier expansion \[ f_{A}(\alpha - \beta(x)) = \int_{-\infty}^{\infty} \exp(\mathrm{i}k(\alpha - \beta(x))) \tilde{f}_{A}(k)~\frac{\mathrm{d}k}{2\pi} \] where \( \tilde{f}_{A}(k) = \int \exp(-\mathrm{i}k\alpha) f_{A}(\alpha)~\mathrm{d}\alpha \) is the Fourier transform of the prior belief \( f_{A}(\alpha) \). In any case, such a convolution cannot be expressed as a pointwise multiplication with respect to a function of \( \alpha \). Put another way, the linear translation operator \( \exp\left(-\beta(x) \frac{\partial}{\partial \alpha}\right) \) does not commute with \( \alpha \), so there is no way for application of the linear translation operator to any prior belief \( f_{A}(\alpha) \) to ever correspond to a pointwise multiplication as in Bayes's theorem. Thus, shifting one's expected value for the hypothesis in response to data but otherwise keeping the structure of one's prior belief fundamentally unchanged can never be called Bayesian updating.