What are Bayesian Statistics, and When Should I Use Them?

Sandra Quickert | Jason Segall

Statistics is divided between two camps: the frequentists and the Bayesians. The latter of these has begun to attract more popular attention in recent years with the rise of AI and other similar technologies. This has extended to the life sciences, in which there are several fields where Bayesian approaches are taking centre stage. So, how do Bayesian statistics differ from the statistical techniques typically encountered in the classroom, and how might we put them to use?

Frequentist vs Bayesian Statistics

When we think about statistical inference, we often envision hypothesis tests, p-values, and confidence intervals – indeed, this is the vast majority of the statistics we’ve covered in this blog over the years. This is the realm of frequentist statistics.

Key Takeaways

Bayesian inference updates the probability of an outcome using both prior knowledge and new data via Bayes’ Theorem, offering a more dynamic and often more intuitive interpretation of uncertainty than traditional frequentist methods, which rely solely on observed data.
Priors in Bayesian analysis can significantly influence results, not only providing a formal mechanism to incorporate existing knowledge, but also requiring transparency through sensitivity analyses using different prior assumptions.
While historically limited by computational demands, advances such as Markov Chain Monte Carlo methods and increased computing power have made Bayesian methods increasingly practical and valuable in fields like diagnostics, AI, and medical research.

While frequentism is a more prevalent technique today, Bayesian inference actually came about first. It was developed by Thomas Bayes in the mid-18^th century, and later popularised by Pierre-Simon Laplace. Bayesian inference centres on Bayes’ Theorem, which is often stated in the form of the equation:

\[p(\theta \vert X)=\frac{p(X \vert \theta) p(\theta)}{p(X)}\]

Let’s break this down. A probability in the form \(\,p(A \vert B)\,\) gives the chance that we’d observe A given B. For example, the probability of hearing thunder in the next 10 minutes given that it’s raining would be a lot higher than the probability of hearing thunder with no other information.

Bayesian Statistics: Thomas Bayes — Thomas Bayes developed his eponymous approach to statistics in the mid 1700s.

In our equation, \(X\) is the data we’ve collected in our experiment, and \(\theta\) is an outcome we’re interested in. So, Bayes’ Theorem gives the probability of observing the outcome \(\theta\) given the collected data we observe – we call this quantity the posterior. The posterior depends on:

\(p(X \vert \theta)\): The probability of observing our data given a certain outcome (the likelihood).
\(p(\theta)\): The probability of a certain outcome based on our knowledge of the situation alone (the prior).
\(p(X)\): The probability of observing the collected data (the marginal likelihood).

Thinking Bayesian

In a sense, Bayesian statistics is a formalised way of how we ourselves tend to naturally analyse evidence and form conclusions. We begin with our prior knowledge of the problem, consider new evidence, and use it to update our views.

For an example, imagine you were a bookmaker setting odds for a football team every week. How would you formulate a representative betting line? At the very start of the season, you would start with your priors – the strength of the squad, the results from the previous season, etc. This is the information which you would use to set the odds of the team winning in their first game. In this case, the posterior is identical to the priors because we have no evidence to update our probabilities.

Once the result of the first match is in the books, we now have evidence of the performance of the team. This means we can set the odds – that is, find a posterior – for the second game based on both the priors and the evidence.

Importantly, this posterior is then used as a prior to set the odds for the third game, which are then used as a prior for the fourth game, and so on. This means that, as the season progresses, the odds for each subsequent game are based more and more on the evidence of the results of previous games and less on the priors.

Receive every Quantics blog as soon as it’s released

Subscribe to the Quantics Blog

A Medical Example

One area of the life sciences where Bayesian inference is vital is diagnostic testing. Let’s consider some examples from that field to see how Bayes’ theorem behaves under some different circumstances.

Imagine we were developing an at-home test for the flu. As we would expect, this test returns one of two results, positive or negative. There are, however, four possible scenarios from the results of the test:

Positive, and the patient has the flu (true positive)
Positive, but the patient doesn’t have the flu (false positive)
Negative, and the patient doesn’t have the flu (true negative)
Negative, and the patient does have the flu (false negative)

That means to effectively use the diagnostic test, we need to understand the probability of a patient having the flu given a positive test. Our outcome is whether the patient has the flu or not, which we’ll designate \(\theta\), and our data is the result of the test (+ or −). In the language of Bayes’ Theorem, that means we are interested in the quantities \(p(flu \vert +)\) and \(p(flu \vert -)\).

Let’s imagine that we’ve already done some thorough testing on our diagnostic, and we’ve established that:

\(p(+ \vert flu)=1\). The test will always be positive given the patient has the flu.
\(p(- \vert no\,flu)=0.999\). The test will be negative given the patient does not have the flu 99.9% of the time.
\(p(+ \vert no\,flu)=0.001\). The false positive rate, here about 0.1% – the probability of the test being positive given the patient doesn’t have the flu.

For now, we’ll consider the first of our interesting probabilities. Using Bayes’ Theorem, we can say that:

\[p(flu \vert +)=\frac{p(+ \vert flu)p(flu)}{p(+)}\]

Which, since \(p(+ \vert flu)=1\), reduces to:

\[p(flu \vert +)=\frac{p(flu)}{p(+)}\]

Now we must determine our prior. A sensible choice here would be the prevalence of flu – the percentage of the population who have the flu. Let’s say this is 0.5%, which means that \(p(flu)=0.005\).

To determine \(p(+)\), imagine we test 1 million patients. Based on the probabilities we’ve already outlined, 5000 of these will have the flu, all of whom will return a positive test. Of the remaining 995,000, we would see a false positive result from 995 subjects. So, we would see 5995 positive tests from our sample altogether. This means that \(p(+)=\frac{5995}{1,000,000}=0.005995\).

Putting it all together:

\[p(flu \vert +) =\frac{0.005}{0.005995}=0.83=83\%\]

So, we can say that the chances of a patient having the flu having returned a positive test are 83%. In this case, the priors are very powerful – if the prevalence of the flu had instead been 0.1%, then our final probability would have been just 50%!

Why Bayesian Statistics?

The influence of priors on the outcome of Bayesian probabilities is often held as a negative, as one could envision using the priors to manipulate a result to suit one’s purposes. To counter this, the probabilities are often calculated using optimistic, realistic, and sceptical priors to demonstrate how the priors affect the overall results. It is also common to use an uninformative prior which provides no additional information beyond the collected data if little is known about the situation.

However, priors can also be an extremely powerful tool as they ensure that existing evidence is properly accounted for. If a football team has an elite squad, it would take them losing a lot of games before the odds might predict them losing games regularly. It would rightly take a lot of evidence – more than a slow start to the season – to overcome those priors. Similarly, if a new drug shows a very large effect in clinical trials, then it would take a lot of evidence to overturn the prior that the drug is effective down the line. A frequentist interpretation would only account for the data in these situations, which could lead to unrepresentative results.

Bayesian reasoning often provides a more intuitive interpretation of probability than a frequentist reasoning. So why has its prevalence only grown recently? Many Bayesian methods require increased computational power when compared with frequentist methods which has become more readily available in more recent years. Combined with the development of methods such as Markov Chain Monte Carlo algorithms, this means that Bayesian analysis has become more computationally accessible.

Going forward, then, where could Bayesian statistics be used where they are not currently common? In part 2 of our series on Bayesian methods, we’ll outline recent work at Quantics into Bayesian approaches for sample size calculations.

About the Authors

Sandra Quickert

Sandra joined Quantics in 2017. She has a PhD and Masters both in Mathematics from the University of Bonn in Germany. Since joining Quantics, Sandra has been a key member of our Clinical, Bioassay and HTA teams and is the responsible statistician for many of our key client clinical trials for medical devices and pharmaceuticals.

View all posts
Jason Segall

Jason joined the marketing team at Quantics in 2022. He holds master's degrees in Theoretical Physics and Science Communication, and has several years of experience in online science communication and blogging.

View all posts