May I recommend an explanation of what economists mean by “Bayesian”? See it everywhere but, even though I’ve googled the term looking for some simple, understandable definition, I just cannot grasp it.
1. I don’t use that term much, if at all. So maybe someone else should answer it.
2. A Bayesian as opposed to what? In statistics, the opposite is a Frequentist. The difference is one of interpretation, and it shows up, for example, in the interpretation of a confidence interval. Suppose we poll a sample of voters and find that 55 percent support policy X, with a margin of error of + or – 3 percent at a 90 percent confidence interval. A Bayesian statistician would be comfortable saying that these results indicate that there is a 90 percent chance that the true proportion of supporters in the overall population is between 52 and 58. The frequentist philosophy is that the proportion of supporters in the overall population is what it is. You cannot make probability statements about it. What you can say about your confidence interval of 52 to 58 is that if the true proportion of supporters were outside of that interval, the probability that your poll would have found 55 percent supporters is less than 10 percent.
3. By analogy, I would guess that economists use the term Bayesian to describe someone who is willing to make probability statements that describe their degree of belief in a proposition that in practice has to be either true or false. When a weather forecaster says that there is a 20 percent chance of measurable precipitation tomorrow, that sounds like a Bayesian forecast. In the end, we will either have measurable precipitation or we won’t. The “20 percent chance” formulation says something like “I don’t expect rain, but I could turn out to be wrong.
4. “Bayesian” also refers to a process of updating predictions. As new information comes in, the forecaster may say, “Now I think that there is a 40 percent chance of measurable precipitation tomorrow.”
5. Similarly, a statement like “The Democrats will nominate an avowed socialist in 2020” is either going to turn out to be true or false. But a Bayesian would be willing to say something like “I give it a 10 percent chance” and then revise that probability up or down as new information develops.
In this case, the opposite of a Bayesian would be someone with firm beliefs that are not responsive to new information.
Again, I don’t apply the Bayesian label myself, so I am not sure that I am the best person to articulate the intent of thsoe who do use it.
I am kind of winging it here.
A bayesian choice isn’t a conserved quantity in that the chooser does not check the ‘market’.
A frequentist assumes finite trade space and chooser considers what others are doing.
I try a short example of a frequentist.
When shopping, the frequentist may decide to just buy tonight’s dinner and go through the shortline, ‘five items or less’. The checkout system designed to keep the queues stable and allow small variations in the ‘number items per basket’. Choosers, in this case, are managing to keep the ‘error term’ bound, adjusting the arrival times to match the number of checkout clerks.
In trials lawyers sometimes try to explain what is meant by ‘reasonable’ in burden of proof standards like, “beyond a reasonable doubt.” Most of the time these efforts fall flat, but one technique that seems to work ok is to define by contrast to the opposite, that is, it’s easier to imagine what an un-reasonable person is like, and then understand that one’s duty is to endeavor to not be like that.
The image of an unreasonable person is a pipheaded one who is stubbornly and obstinately fixed to his position and who refuses to alter his opinion or change his mind no matter what one says or does, and regardless to what new information or evidence comes to light, no matter how compelling. He won’t “listen to reason”, so to speak. One isn’t supposed to require a case to be presented such that the conclusion is 100% certain, as it would be unreasonable to hold law enforcement to such a requirement, as no one could ever be convicted under that standard.
By contrast, being “reasonable” means retaining some degree of flexibility and open-mindedness, of having a “scientific spirit” in which one has the humility to admit that many of his beliefs are provisional and have some degree of uncertainly and could change when exposed to new data or logic.
One could think about it in a visual, pseudo-mathematical way by plotting “belief” on the y-axis and and “evidence in favor of” on the x axis, and maybe the graph looks something like a fuzzy “erf” error function, but the point is, the derivative is always position. If the derivative is ever zero, you aren’t updating, and so you are being unreasonable.
From this perspective, being “Bayesian” is hard to distinguish from being “reasonable”, except being some cool-smart-guy jargon with the implication that there is some mathematically sophisticated calculations based on probability theory behind what the speaker is claiming, to amplify its perceived accuracy and authoritativeness. (Note, I’m not bashing the people who do this stuff the right way, just saying a lot of people are imitating their language in an intellectually hollow manner and without understanding like a cargo cult. For the actual sophisticated theory, I would highly recommend Jaynes’ Probability Theory: The Logic of Science.)
I like the visual and the use of the function’s derivative as a gauge for “reasonableness”. However, I wonder whether the x-variable (degree of “evidence for…”) should be expressed either as net of “evidence of falsifiability” or as zero is there is any evidence of falsification. As you know, most of the belief-evidence “discussion” nowadays is about juicing the “evidence for..” variable and seldom looking for falsification.
Prof. Arnold,
If I am not mistaken, I have seen you use the term “a Bayesian prior.”
The attestation may have been “A Bayesian prior [against launching a war by invading a distant country].”
Apologies if I am mixing you up with someone else.
A Bayesian prior is a very precise concept so I wouldn’t relate using that term to being a Bayesian
I’d say 3-5 is what most people mean when they more casually say they are Bayesian.
If you’re reading a statistics paper, then saying they are Bayesian means they use Bayesian techniques, like Markov Chain Monte Carlo or Gibbs Sampling to estimate the posterior distribution of parameters, rather than maximum likelihood or ordinary least squares.
Also, in #2, the Bayesian would refer to what you’re describing as a credible interval.
The best “layman’s approach” to updating probabilities Bayesian style” is the old “Let’s make a deal” game show with Monty Hall, game host.
Wikipedia is useful here, as is Marilyn Vos Savant “Highest IQ.”
https://en.wikipedia.org/wiki/Monty_Hall_problem
“When a weather forecaster says that there is a 20 percent chance of measurable precipitation tomorrow, that sounds like a Bayesian forecast. In the end, we will either have measurable precipitation or we won’t. The “20 percent chance” formulation says something like “I don’t expect rain, but I could turn out to be wrong.”
My understanding is that a 20% percent rain forecast actually means that with very high reliability they predict that 20% of their coverage area will receive rain. Its saying “I definitely expect rain, but I don’t necessarily expect rain for any particular individual out there.”
I don’t like the description in point 2 of the original post, for at least two reasons.
reason the first: As I understand it, Bayesian methods depend fundamentally on a prior probability estimate (roughly; might be a more general multivariate correlated thing for which the simplicity connotations of “probability estimate” don’t apply) which is then folded in with the new data to make a revised estimate. Basically, I think that what you acknowledge in point 4 is so central that it should show up in point 2 as well. (Thus the standard bickering: “how can one get started when you don’t know anything initially” and “one can’t do it perfectly, but we can show that in many cases the estimation process converges so fast that even huge errors in the initial probability estimate don’t matter much in practice — and in the other cases, alternative approaches which don’t reason explicitly about prior probability estimates tend to go bad even harder.”)
reason the second: People, including Bayesians, may refer to the probability of something being true, but if you get serious enough about arguing about fundamental principles that fundamental objections like “the proportion of supporters in the overall population is what it is” and “you cannot make probability statements about it” are admissible, then the serious Bayesians I am most familiar with seem to argue that while the reality is indeed not fundamentally probabilistic, nonetheless for decision theoretic purposes the probabilistic analysis fundamentally works to give exactly the correct outcome. These days they may impatiently point to a bunch of practical engineering applications — handwriting recognition, e.g. — to support the “just works”. Before there were so many practical tests of automated inference to appeal to, though, people like Jaynes and Vapnik were writing about how (roughly) an intelligent agent using decision theory based on these probability estimates will make good decisions, i.e., decisions that given the limited data available to it tend (as strongly as is practical) to yield its preferred outcomes. That is, the probabilistic uncertainty doesn’t need to be in the world, merely in the decision agent’s mind, and even though the physical world “is what it is” with no physical uncertainty corresponding to the agent’s mental uncertainty, everything just works. (Indeed, just works mathematically exactly in common cases, so that making the distinction is often a waste of effort.)
Also, re. “in this case, the opposite of a Bayesian would be someone with firm beliefs that are not responsive to new information”… Relatedly, there is a tradition in credentialed ivory tower philosophy of science of (usually implicitly) reasoning with pure true and false instead of thinking about degrees of confidence, which tends to lead to various kinds of confusion (such as a false dichotomy between knowing and not knowing when the real world is dominated by organisms and mechanisms that never *know* facts in those perfectly pure black and white terms, but which might be able to make pretty good decisions based on being e.g. about 41,900,000,000 times more confident about a relevant proposition about the world than they were when they were born or booted up).
I always liked this explanation
http://yudkowsky.net/rational/bayes
Though the author now says this is better
https://arbital.com/p/bayes_rule/?l=1zq
Shorter version :
https://xkcd.com/1132/
I argue about how the main difference between a Bayesian and a Frequentist is with respect to what is and how to use probability.
The Frequentist / statistician sort of believes that uncertainty is probabilistic based on the Law of Large Numbers. And that for most systems, the problem is discovering what the right frequency is. Is a coin toss head or tails? 50% chance (+ tiny for heads). Will it rain tomorrow? Some x%, based on various models of prior frequency.
The Bayesian uses all of the frequentist data, but claims probability is a measure of the decision maker’s uncertainty at the moment. For almost any possible decision, one choice is “gain info” / do some test or more analysis. Bayes gives a good basis for the possible value of more info.
Given a situation and an “attempted optimal decision” choice, it might well be that getting more info, which doesn’t change the decision, is just a waste.
The Frequentist does not have as effective a mechanism for estimating the value of new information, nor of how to integrate the new info into the prior probabilistic model.
Finally, machine learning.
@Marc nice link above.
You may want to learn about Bayes’ rule if you are:
A professional who uses statistics, such as a scientist or doctor;
A computer programmer working in machine learning;
A human being.
Machine learning is moving towards lots of trials and changing strategies to see what works. That’s Bayesian updating of probabilities.
It seems like some of what the post is discussing is the usefulness of statistics in general. Statistical methods work well when you have large numbers of observations of little individual impact (whether or not it rains tomorrow) but poorly when you have unique events of high impact (say, the probability of a nuclear war).