Bayesian Inference is thought by many to have originated

through an English statistician and philosopher’s work on probability theory.

Thomas Bayes, the statistician in question, had a paper, “An Essay Towards Solving a

Problem in the Doctrine of Chances”, published posthumously in 1763 by

Richard Price (Fienberg 2006) containing ground breaking work on conditional

probability and alongside a number of interesting propositions, the most

influential work he completed was his examination of the problem, “Given

the number of times in which an unknown event has happened and failed: Required

the chance that the probability of its happening in a single trial lies somewhere

between any two degrees of probability that can be named” (Bayes 1763).

Bayes’ paper introduced a special continuous case of what is

today known as Bayes’ Theorem but a statement of the general formula for the

theorem was never given. His special case used a uniformly distributed prior, effectively

taking what he thought the probability of something was, now known as the

prior, then combining new data to produce an improved result, the posterior.

Many notable statisticians had serious issues with Bayes’

prior distribution choices they came from something very close to guess work

and as a result his paper did not get much traction throughout the community at

the time (Lee 1988). There also has been a lot of controversy regarding the

name of the Bayes Theorem as there is evidence to suggest Bayes’ work alone was

not enough to be credited as the founder and that Price was the one

to understand the significance of the notes he received from

the late Bayes which he then used to produce a publishable paper.

Headway was again made when a French scientist by the name of

Pierre-Simon Laplace, now widely considered to be the world’s first Bayesian, independently

released, “M´emoire sur la Probabilit´e des Causes par les Ev´ ´enements”,

several years after Bayes’ paper was published, shedding new light into

probability theory and Bayesian statistics (Laplace 1774). Like Bayes, Laplace also

used constant uniform priors for his uninformative priors though this was

because he considered his own central limit theorem and the principle of

insufficient reason, now known as the principle of indifference, an obvious

explanation for his assumption, rather than Bayes’ mathematical simplicity

reasoning.

He went on to introduce the first conjugate priors and a

general version of Bayes’ theorem which worked for continuous as well as

discrete data and even cases with multiple parameters. His methods became known

later as Inverse Probability, due to the direction he worked, from effect to

cause (Fienberg 2006).

During the

early 20th century, many statisticians were probing into areas

rather different from Bayesian methods. Sir Ronald Fisher, an English

statistician and geneticist, argued against inverse probability due to its

dependence on the choice of ignorance prior, furthermore he began working on

new methods which resulted in the 1922 publication of his ground-breaking paper,

“On

the Mathematical Foundations of Theoretical Statistics”.

This paper extensively changed

statistical thoughts and processes by introducing the notion of likelihood,

which led directly to maximum likelihood estimators alongside basic tests of

significance, new forms of variance analysis and randomization methods not yet

seen. Throughout the paper terms such as sufficiency, efficiency and parameter

were used for the first time (Fisher 1922

Fisher’s book cast a shadow of sorts over Bayesian statistics

for several years and resulted in a decline in its study and usage, numerous

statisticians, such as Jerzy Neyman and Egon Pearson, developed various

frequentist methods, such as hypothesis testing and confidence intervals primarily

stemming from Fisher’s work (Fienberg 2006, Fisher 1921).

There was new life for Bayesian statistics when early

ideas of Laplace resurfaced and were broken into two separate paths, objective

and subjective probability

Bruno de Finetti, through the 1920s, developed thoughts on

subjective probability and exchangeability in Italy whilst independently Frank

Ramsey did the same in England, both released books in 1930 and 1931

respectively (Fienberg 2006

Objective Bayesian inference was making headway with Sir Harold

Jeffreys at its front, he produced a paper named, “Theory of Probability”,

in 1939, which was crucial in what is now known as Bayesian’s revival. Jeffreys

incorporated an invariance approach to help derive the ignorant objective

priors, pacifying some of the major concerns anti-Bayesians had about only

using a constant prior distribution but also making major headway in Inverse

Probability (Fienberg 2006 2nd).

Views were still largely focused on frequentist methods but

throughout World War Two men like Alan Turing and Irving Jack Good continued

Jeffreys work and were fathering applied Bayesian statistics at Bletchley Park.

It is well known that Turing, Good and other members of Hut 8 used Bayesian

Inference for crucial deciphering to utilize German intelligence to shorten and

ultimately win the war. Their work and progress as Bayesians was not

declassified until the middle of the 1970s.

It was around this time when frequentist methods truly became

over shadowed as statisticians such as Dennis Lindley published books relating

wholly to Bayesian Inference which gained significant recognition globally. “Introduction

to Probability and Statistics from a Bayesian Viewpoint” and “The

Future of Statistics – a Bayesian 21st Century”, in

1965 and 1975 respectively, were two such books which had a

large impact on the overall opinion of Bayesian statistics (Lindley 1965, 1975).

By the early 1990s computers were becoming less scarce and

more commonly accessible in addition to having far superior capabilities to their

predecessors a couple years before. This meant that when computation techniques

called Markov Chain Monte Carlo methods emerged they were easily used and found

to be pivotal in solving one of the few major problems Bayesian Inference still

had, which was sampling from unusual distributions. Monte Carlo approximates

any posterior distribution by taking an extremely large number of random

samples from the same distribution (Casella 2011).