There is always a more fundamental question. Kenny Easwaran at Antimetea has written a response to my post What Does Bayesian Epistemology Have to Do With Probabilities? In his post, he raises the question, just what is a probability? I want to take a look at my own assumptions about what a probability is, and what he has to say, and see if this has any relevance for our discussion of Bayesian epistemology.
I will not attempt here to develop a philosophy of probability, like Bayesianism, or frequentism, or anything of that sort. These are accounts of what probabilities mean, but not of what probabilities are. Easwaran and I agree that probabilities, in the sense in which we are using the term, are certain formal constructions. I was assuming a particular set-theoretic construction, because that's what I was taught (although what I'm about to present is slightly different than what I was assuming before, because I didn't remember things quite right).
I had assumed that a probability was defined over what I called a "state space" (which is actually a computer science term, but is not totally inapplicable here) which is a set of equally likely outcomes.
The correct term is, in fact, "sample space," and, according to my textbook (Mathematics: A Discrete Introduction by Edward R. Scheinerman), an ordered pair (S, P) where S is a set of outcomes and P is a function from S (in some formulations, the power set of S is used, but that makes everything else more complicated, and I think all it buys you is a simpler notation) to the real numbers between 0 and 1 (inclusive) such that the sum of P(s) over every s ∈ S = 1.
Once we've got this, give the interpretation of 1 as certain truth and 0 as certain falsity, and so we can map things back to a Boolean algebra. Easwaran constructs this in reverse:
My understanding of the word is that “probability” refers to any function from a Boolean algebra to the real numbers satisfying the following three properties: (1) it is never negative; (2) the tautology is assigned value 1; (3) finite additivity (that is, given two elements whose conjunction is the contradiction, the probability of their disjunction is the sum of their probabilities).
Now, my discussion before was built on this sample space construction, and I was discussing what the members of the set were. Easwaran's construction has the benefit of allowing us to deal directly with propositions, without introducing the possible worlds semantics. This, I think, is why he seems to describe his view as in between (P) and (KPW): he can hold that there is a real sample space, and construct it out of propositions. With his construction, he doesn't need to go much further than what Kripke says explicitly. Ignoring the facts we're not interested in isn't a simplification for practical purposes: it's actually what we want to do.
Now, a benefit of (KPW) proper (that is, the view I originally dubbed (KPW)) over Easwaran's view is that it explains where these probabilities come from, at least in the case of an abstract ideal reasoner: we assign the same possibility to every epistemically possible world, and take a look at how many worlds the proposition in question comes out true in. As Easwaran points out, this may run into trouble, because these may not be defined. Things get tricky with infinite sample spaces: if they are similar enough to the real line (or plane, etc.), then things work out, but otherwise they may not. So my (KPW) may be in trouble. I wonder, though, on Easwaran's view or on (P), where the probabilities are supposed to come from.
The answer to the question in the title of this post may seem obvious (after all, isn't Bayesianism all about probabilities?), but I think that the long discussion that followed Lauren's post on van Fraassen's objection to Bayesianism from quantum mechanics shows that it isn't clear at all - or at least, that it wasn't clear to either of us as we were discussing the issue. I think that I now understand why. In this post, I'm going to give three answers to this question, which I will call The Primitivist Account (P), The Kripkean Possible Worlds Account (KPW), and the Lewisian Possible Worlds Account (LPW). This post will discuss what each view means, and where vagueness enters each account. I will also be identifying three crucial problems with (P) and showing how each of the other views answers these difficulties.
Here are brief definitions of each view, and how each one relates subjective degrees of rational confidence to probabilities (I will explain in more depth later).
Part of the reason for the previous confusion is that I was more or less assuming (P), and I think that Lauren had noticed some serious problems with it. First, a word on the reason for my assumption, and then I will try to state Lauren's objections.
(P) may be the dominant interpretation of Bayesianism. I don't really know. But there is good reason why someone reading the literature might think it's the dominant interpretation: it maps especially well to how Bayesian philosophers actually apply Bayesianism. Most philosophers who apply Bayesian reasoning (myself included) do it by simply making up numbers that are supposed to represent their degrees of confidence. Where do these numbers come from? We simply observe that we have varying degrees of confidence about different beliefs, and map these degrees of confidence to the real numbers between 0 and 1. Vagueness comes in from the fact that we don't have mathematically precise degrees of confidence, and our numbers are simply made up from our imprecise degrees of confidence, rather than computed somehow.
Now, as I have said, I believe that three important questions came out in our previous discussion which this theory leaves unanswered:
(P) does not answer these questions. This is, of course, to be expected from a theory called "primitivism," but I think the third question is particularly problematic. In the previous discussion, it was van Fraassen's assumption that we should do this very thing that brought up the issue. However, Bayesianism really needs these principles. Is it possible to provide an analysis of degrees of rational confidence that adequately answers these questions? (KPW) and (LPW) will attempt this very thing.
(KPW) is inspired by Kripke's treatment of possible worlds in terms of state spaces in the 1980 preface to Naming and Necessity, pp. 15-20. Kripke here argues that possible worlds are the same sorts of things as the "states" used in school probability theory, the difference being that possible worlds are maximally specific. Now, consider a view according to which the state space of Bayesian reasoning is the space of all epistemically possible worlds - that is, all the world-states (which are abstractions just like dice-states) which might for all we know be actual. Note that not all of these may be really possible. For instance, the Anselmian God either exists or does not exist, with logical necessity, but his existence and non-existence may both be epistemically possible for a particular person. So, when we say that we have subjective degree of confidence .5 for a given proposition, we are saying that that proposition holds in half of all epistemically possible worlds.
This view will be helped by Lewis's observation about the relationship between propositions and possible worlds: namely, that every proposition picks out a set of possible worlds, the worlds in which it obtains. (Lewis wants this to be a reductive analysis of propositions, but we need not do that.) So, consider any given proposition you believe. There is a set of possible worlds in which that proposition obtains. The set of epistemically possible worlds (for you) is the intersection of the sets for all the propositions you believe.
The (KPW) answer to question (1) has already been given - Bayesian degrees of confidence are probabilities. Let's proceed to give an interpretation of the math on (KPW).
Bayes' theorem is a relation between an initial probability - a probability over some state space S - and a conditional probability - a probability over some subset S' of S. Usually, we consider some proposition p and some evidence e. We already have assigned a particular degree of confidence to p and we want to adjust our confidence in light of learning the new evidence e. We use Bayes's theorem to calculate P(p|e). What has happened here? The new evidence e has eliminated certain formerly epistemically possible worlds - namely, all the worlds according to which ~e. In order to computer P(p|e) we have to know something about the relationship between p and e. In particular, we have to know P(e|p), P(p) and P(e) (all of them over the initial state space S). This involves knowing how many of our epistemically possible worlds certain conditions obtain in.
The (KPW) answer to question (3) should now be quite clear. Probabilities for events like dice rolls are, on this view, actually just special cases of degrees of subjective confidence. Why is there a 1/6 probability of a single die rolling 1? Because in 1/6 of all epistemically possible worlds it will land 1. (We should think of these world-states as covering the whole history of the world, so future events can be handled the same as past or present events.) In our school probability exercises, we simplify the case by supposing there are only 6 worlds. In fact, there are 6 sets of worlds. We know that the worlds will divide more or less evenly (we assume we know with certainty that the die will be rolled), because most of the propositions we are uncertain about vary independently of the result of the die roll. The ones that don't vary independently (e.g. propositions stating that the die is unfair in some particular way) are, for all we know, as likely to favor one side of the die as another.
Vagueness enters (KPW) by virtue of the fact that the worlds are created by us. They don't exist objectively. As such, there is vagueness as to how many worlds there are, and there is vagueness as to whether certain propositions are true in certain worlds. These things are simply not fully defined. We should nevertheless be able to fix upper and lower bounds by considering all of the possible resolutions of the vagueness (actually, we can probably do better by figuring out in advance which resolutions will lead to high values, and which to low values). In practice, however, we do something more like the die role case: we eliminate all the propositions that vary (more or less) independently (as far as we know) of the propositions under consideration, to divide the epistemically possible worlds into sets, and then consider each set as a single underspecified world.
(LPW) is very similar to (KPW) in its answers to the three questions. (LPW) holds that we are talking about real possible worlds, which are epistemically accessible - that is, which might, for all we know, be the world we're in. Vagueness on (LPW) is different. Because the worlds are fully defined and there is an objective truth about how many there are, there is only one source of vagueness properly so-called: vagueness about whether a given world is epistemically accessible. However, there is also second-order uncertainty - uncertainty about whether a certain world is genuinely possible, or whether a given proposition obtains in a certain world.
These two theories improve on (P) by providing explanations for why we use Bayesian reasoning the way we do, and why it works like probability theory at all. They also allow us to define our degrees of confidence much more clearly.
Hello. As a brief introductory reminder, I'm Lauren, Kenny's fiance, and a guest blogger here when I have time (which isn't very often.) However, I am going to take some time to discuss a paper by Bas C. van Fraassen, Conditionalizing on Violated Bell's Inequalities, in which he claims that quantum mechanics creates problems for Bayesian epistemology. I have two main points to make in response, the first is that he doesn't actually need quantum mechanics for his argument, and the second is where he has failed to account for the effect of choosing which events to talk about, which changes the conclusions of his paper. I will treat these in reverse order, though.
A brief summary of van Fraassen's argument is this:
In an experimental set up involving measuring the spin of entangled photons, there are two detectors, each of which has a 50% probability of detecting something (or registering something) for each run of the experiment. (Here I'm going to be slightly sloppy and use "register something" and "detect something" interchangeably to mean "made a positive spin reading".)
However, the detectors are not uncorrelated- the probability of one detecting something is related to the cosine of the angle between the detectors squared. This is well established in quantum mechanics.
Then, van Fraasen imagines a situation where someone named Hilary is asked to predict whether or not one of the detectors registers something. She initially answers that the probability is 50%. She is then told that the other, hereafter referred to as the second, detector did register something, but she is not told what the angle is between the detectors, although she does know of the cosine squared relation. She is then asked the same question, but she now no longer knows what the probability is, because she knows it could be anything between 0% and 100% depending on the angle between the detectors.
Van Fraassen then asks, if Hilary were forced to bet, what the best thing would be for her to do. He concludes that she ought to ignore this new piece of information, even though it is relevant to the probability of the first detector registering something, and to bet the first detector registers something 50% of the time, because, he claims, she would break even doing this. Then, van Fraassen questions why Hilary is justified in ignoring the information about the second detector, since it would change her opinion. This is especially a problem for Bayesian inference, which claims that we should include all relevant evidence in our probability calculus, and as we include more relevant evidence, our probabilities become "better".
I will argue that her initial answer of 50% is actually incorrect, because of the effect of only asking her about situations where the second detector registers something, which is not a sufficiently random subset, with respect to the first detector, of all the events. Thus, her second answer is, in fact, the better answer, and Bayesian inference still stands.
Consider, for a moment, this example. (I'll explain in a moment how it relates.) I have a perfectly fair coin, which you know is fair. I then flip the coin, and ask you to guess whether or not it's heads. You win if, when I ask you, your guess matches the coin. As is well known, you should guess heads 50% of the time, to maximize your likelihood of winning. If you answer yes 49% or 51% of the time, odds are that you'll win less often than if you answered yes 50% of the time. Now, however, imagine that I flip the same fair coin, but that I look at the coin before I ask you to guess whether or not it is heads. If it's tails, then I ask you to guess whether or not it's heads. (If it's heads, I just ignore it and flip the coin again, although you don't know this.) In this case, your likelihood of winning is greatest if you never guess heads. Similarly, if I only asked you when the coin landed heads, you likelihood of winning is greatest if you always answer heads. So, when we ask someone only about a specific subset of events, the properties of that subset are relevant to rate someone should guess at. So then, if you are playing this game with someone and tell them that you're only going to ask them about a certain subset of events, but don't tell them what the subset is, they will be at a loss as to what rate they should guess, and also if they continue to guess yes 50% of the time, they will not necessary break even (depending on your subset), even though the coin lands heads 50% of the time.
Now let's look at van Fraassen's argument again, and ask whether we are, at any point, asking Hilary to guess on only a certain subset of events, and if so, whether the features of that subset was chosen would influence the probability. Recall that we do inform Hilary that the second detector did register something. Now, since the second detector will not always register something, and since we presumably are not lying to her, we are thus picking out a subset of the events, namely, the ones where the second detector goes off. Next we need to question whether this is effectually a random subset with respect to the first detector (the one we are asking Hilary about). If it is, then she will still break even guessing it detected something 50% of the time, but if it's not, then just like in the coin game above, she will no longer break when guessing yes 50% of the time. However, we know that there is a correlation between the first detector registering something and the second detector registering something (namely, that this correlation is related to the cosine of the angle squared), and so this is a NOT an effectually random subset with respect to the first detector. Hence, Hilary will not break even by guessing 50%.
But wait, you say- doesn't the first detector have to register "yes" 50% of the time? Then why doesn't she break even? Yes, the detector does register yes 50% of the time- but only when we're talking about averaging over ALL the events. Similarly, the coin lands heads up 50% of the time, over ALL the flips I make- but not over all the flips I ask you about. Similarly, we're not asking Hilary about all the events- only some of them. If you were asking Hilary about all of them, independent of what the second detector did, then she would break even guessing yes 50% of the time. But this isn't the case, since we're only asking her about ones where the second detector registered something. Thus, the error in van Fraassen's argument is when he says that "For at the right [first} side the clicks come at the 50% rate, and changes in Hilary's personal information or opinions do not affect that at all. Thus Evelyn [a hypothetical person standing at the first detector] at least would be right to advise Hilary to just ignore the ...information [about the second detector]." Evelyn would NOT be right to advise that since Hilary is being asked about a specific subset of the events Evelyn is seeing, and those events DON'T come in at the 50% rate. Evelyn should inform her of the rate for that subset of events.
Van Fraassen then goes on to investigate "Yet by what epistemic principle can one license ignoring evidence that would clearly change one's opinion if heeded?". However, this isn't necessary, because as we've seen, Hilary shouldn't ignore the evidence she has gained, because if she continues guessing 50%, she won't break even, as van Fraassen claimed. Thus, she does have a "better" probability after the evidence from the second detector than she did beforehand.
Now, you may be wondering how "I don't know anything" is "better" than 50%. The reason is that when we initially asked Hilary what the probability was of the first detector registering something and she answered 50%, she was implicitly assuming that there was no correlation between whether or not we asked here about the first detector and what the first detector registered. To be correct, she should have said "Depends- was this a randomly selected run?". As we've seen, it was not. So her answer of 50% is actually wrong- not because the second detector doesn't register something 50% of the time, but because we're asking her about a subset that she knows nothing about instead of the entire set. The extra information tells her that she was wrong in that assumption, and thus, the probability "something between 0% and 100%" is in fact better than "50%".
Now on to my second point, that this doesn't actually require quantum mechanics. Hopefully by stripping away the quantum mechanics, it will become clearer where the flaw is van Fraassen's argument is. So here is an argument isomorphic to van Fraassen's, but without the quantum mechanics.
Consider this case:
Assume that it rains in Timbuktu is 50% of the time.
Also assume that due to the global air flow, ocean currents, and everything else, there is a correlation between whether or not it snows in Philadelphia and whether it rains in Timbuktu. Examples of such relations would be:
1) Whenever it snows in Philadelphia, it always rains in Timbuktu, and never rains any other time. (In this case, it'd snow 50% of the time in Philadelphia).
2) Whenever it snows in Philadelphia, it never rains in Timbuktu, and always rains when it's not snowing in Philadelphia. (In this case again, it'd snow 50% of the time in Philadelphia).
Assume that I know this exact mathematical relation, but that Kenny doesn't. He can know the form of it, but not it's exact mathematical value.
Additionally assume that Kenny and I are both aware that it's not snowing here in Philadelphia.
Finally, assume Kenny has a friend in Timbuktu.
Now, assume I tell Kenny that if it rains in Timbuktu, I will make him hot chocolate. Kenny would like to know what the odds are that I'm going to make him hot chocolate. So Kenny calls his friend in Timbuktu. We expect the conversation to goes something like this:
Kenny: "Hi. What are the odds that it is going to rain there?"
Kenny's friend: "50%."
However, the conversation really should go like this:
Kenny: "Hi. What are the odds that is going to rain there?"
Kenny's friend: "I know that it depends on whether it's snowing over there, but I don't know how."
It's wrong for Kenny's friend in Timbuktu to say 50%, because the probability of it raining in Timbuktu is actually conditional on the probability of snow in Philadelphia, and I am forcing the case where it's not snowing in Philadelphia. Essentially, I'm making a cut and ignoring the days when it snows in Philadelphia. So, the relevant probability isn't the probability that it rains in Timbuktu on any day, but the probability that, on days it isn't snowing in Philadelphia, it rains in Timbuktu. Now, if Kenny bet that in 50% of these cases he'd get hot chocolate, as van Fraassen recommends, he's not necessarily going to average out even- in the first case, he's never going to get any hot chocolate. In the second case, he'll always get hot chocolate. Thus, the "extra information" that it depends on whether it's snowing in Philadelphia is not at all irrelevant, nor should he ignore it. I hope this case is somewhat clearer than the quantum mechanical case in his paper.