Defining a Statistical Analysis for Bioassay: Response Modelling

Matthew Stephenson | Jason Segall

The previous blogs in our collaboration with the team at RoukenBio – the Immunology CRO redefined – have focused predominantly on the biological aspects of bioassay development. Including aspects such as cell selection and and DOE, we have been answering important questions about ensuring that the biological response in our assay is present, that it is representative of real-world interactions, and that the assay conditions are such that the response is as strong and consistent as possible. Once these have been established, attention must turn to defining the statistical analysis of the bioassay.

As a first step, we must make decisions on the bare bones of the analysis. This includes strategically choosing a concentration series and statistical model and looking at whether any response transformations or weighting will be required. Here, we will look at the thought process behind these decisions and highlight some handy tips and tricks in our experience in bioassay development.

Key Takeaways

Linear models suit a narrow, clearly linear segment; 4-parameter logistic curves map the full sigmoidal response; 5-parameter logistic variants add asymmetry when the data demand it.
Multiplicative dilutions on a log scale should capture both asymptotes and the steep linear portion, giving the assay breathing room against day-to-day shifts and potential hook effects.
Log-transforming responses can restore homogeneity of variance where required, while comparing to the “next model up” (or applying equivalence limits) offers a more reliable view of goodness-of-fit than an F-test or R².

Selecting a model

Fundamentally, a bioassay concentration-response curve will almost always take the form of an s-shaped curve if the range of concentrations tested is wide enough. If the response is positive – i.e. we expect an increased response for an increased sample concentration – the curve begins in a flat region of low response where the sample concentration is not high enough to elicit a response.

At higher concentrations, we eventually see a stronger response. From here, the response increases quickly – almost linearly – with concentration. An important milestone which usually falls within this linearly increasing phase is the EC₅₀: the concentration at which half of the maximal response is observed.

As the concentration continues to increase, the curve once again flattens out. Here, we have reached a maximal response which will not increase noticeably as the concentration continues to increase.

We refer to the flat regions at high and low concentration as the asymptotes, with the part of the curve between the asymptotes known as the linear region. In an assay with an inhibitory response – i.e. the response is expected to decrease with increased concentration – the broad shape of the curve is the same, except the asymptotes are reversed. That is, we see the high-response asymptote at low concentration, and the low-response asymptote at high concentration.

Note that, for bioassay data, we (almost) always plot the concentration on a logarithmic scale. This is because we typically choose a concentration range which spans several orders of magnitude, so using a linear scale would mean it would be difficult to determine the shape of the curve at lower concentrations. The shapes we describe above are those we would see when plotting bioassay data on a log scale: we would expect very different curves on a linear scale.

Figure 1: A typical concentration-response curve for a bioassay

The specific base of logarithm used is not important – though it should be kept consistent between analyses for the same assay as changing bases can alter measured results. Base-10, Base-2, and the natural logarithm are common choices.

To perform a statistical analysis on a bioassay response, we need to fit a mathematical function – the concentration-response model – to the data. In this case, the data are a series of responses ( $y_i$ ) associated with a series of concentration values ( $x_i$ ). As there will always be noise in the data, the model also includes a term to account for random error in the data ( $\epsilon_i$ ).

The Linear model

A linear model in bioassay is about as simple as it gets: a straight line. As any good high-school maths student ought to be able to tell you, a straight line has just two parameters: the y-intercept and the gradient . The relationship is given by:

$y_i=A+Bx_i+\epsililon_i$

The linear model is best used when the concentration range is chosen such that only responses from the linear region of the concentration-response curve are present.

The huge benefit of a linear model is its simplicity. It requires less data than more complex models to be used. Indeed, the choice of a linear model can sometimes be forced in cases when the available data is limited.

Conversely, a major downside of a linear model is its lack of ability to account for any curvature in the data. The linear region of the concentration-response curve often falls within quite a narrow window of concentrations, meaning only a small shift to the left or right can result in noticeable curvature and a poor model fit. As this can lead to assay failures, consideration should be given to the observed variability in the assay before a linear model is used for an assay for, say, lot release.

The assay is considered to be sufficiently accurate if a confidence interval for the relative bias falls entirely within pre-defined acceptance limits for the relative bias. The acceptance limits are chosen based on the performance requirements of the assay, alongside pre-validation or qualification data.

Similarly, the precision of the assay is quantified using its intermediate precision, which can be measured as a geometric coefficient of variation. The intermediate precision enumerates the variability of the assay under normal conditions for a single laboratory. That means that both within-assay variability and between-assay variability is included. Just as with the relative bias, the intermediate precision is acceptable if its confidence interval falls within acceptance limit. In this case, however, acceptance is bounded only on the upper end: an assay cannot be too precise, so we only care that the intermediate precision falls below the acceptance limit.

The 4-Parameter Logistic (4PL)

The 4PL model is a step up in complexity compared to a linear model. As its name suggests, there are four parameters which need to be fit to the data. The A and D parameters fix the two asymptotes, while the C parameter designates the log(EC₅₀) of the concentration-response curve. The B parameter, also known as the slope parameter, is related to the gradient of the curve at the EC₅₀, but is not exactly equal to it.

There are several different formulae available for the 4PL, all of which give the same results for relative potency, albeit with some slight differences in exact parameter values. One commonly used formula for the 4PL is:

$y_i=D+\frac{A-D}{1+L^{B\left(x_i-C\right)}}+\epsilon_i$

Where L is the base of the logarithm used for the concentration (e.g. if we use a base-10 logarithm, L=10).

The 4PL model is a good balance between complexity and flexibility. It directly maps the full s-shaped curve of a typical bioassay concentration-response curve, meaning it is less likely to run into model fitting issues due to the narrow linear window.

The added parameters do mean, however, that more data is required to fit a 4PL. You must have at least 5 data points, fall on each asymptote. This can usually be accommodated, but might prove problematic in some scenarios with limited resources. The complexity of a 4PL means that, unlike a linear model, it requires an iterative process to fit, so we would recommend using specialist statistical software if a 4PL is chosen for an assay.

The 5-Parameter Logistic (5PL)

The 5PL is similar to the 4PL model in that it also takes the form of an s-shaped curve, meaning it is well-suited to mapping the concentration-response curve of a bioassay. The major difference is that a 5PL allows for asymmetry in the concentration-response curve. The 4PL is symmetric – the curvature of both asymptote transitions is identical – which means that, if the transitions have different curvatures in the data, a 4PL might not be a good fit for the data. This problem can be solved by using a 5PL.

The 5PL adds an extra parameter compared to the 4PL, the asymmetry parameter denoted by E, which controls the asymmetry. The full formula for the 5PL is given by:

$y_i=D+\frac{A-D}{\left(1+L^{B\left(x_i-C\right)}\right)^E}+\epsilon_i$

While the ability of a 5PL to fit asymmetry in the concentration-response data is useful, it comes at the cost of a yet more complex model than the 4PL. This means that a 5PL curve can be difficult to fit, even for specialist statistical software.

Other models

The Linear, 4PL, and 5PL models are the most commonly used statistical models for bioassays. They are, however, not the only available options.

A commonly discussed model for bioassays is the 3-Parameter logistic model (3PL). This is similar to the 4PL, but with one of the asymptote parameters fixed to a constant value rather than fitted to the data. If one believes that one of the asymptotes ought to be the same for every assay run – if the response at low concentration is expected to be zero, for example – then it might seem tempting to fix that asymptote in the model.

We do not, however, recommend doing this. The assumption that one asymptote remains constant is a strong one, and can easily prove incorrect even just due to the natural variability of the assay. This can lead to poorly fit models and the potential for the bias in the estimate of relative potency. Fixing an asymptote can also lead to an underestimate of the variance of the data, resulting in confidence intervals which are too narrow. Fitting a 4PL, then, is a better choice even if one asymptote is expected to remain consistent for the assay.

Selecting a concentration series

The presiding goal of any assay development process is for the resulting assay to be reliable and consistent. One way this can be achieved is by including as many concentrations as possible. This helps ensure that the assay curve is well-defined. Maximising the number of concentrations, however, comes at the cost of increased resource use. This can increase the cost of running the assay, so the choice of concentration series is a trade-off between maximising the performance of the assay and minimising its expense.

Ideally, we want several points on the linear portion as this is the part of the curve where we see the highest variability. Several points on the linear portion, therefore, can reduce this variability and lead to a more consistent assay. Similarly, it is important to ensure that both asymptotes are well anchored. This can be done by choosing a concentration series which allows for at least one but, ideally, two or more points on each asymptote. The concentration series should also accommodate left-right shifts in the concentration-response curve. For example, a validation study might assess the performance of the assay when using samples from as low as 50% potency to as high as 200% potency. It is important that the curve remains well characterised – several points on the linear portion and well-anchored asymptotes – at such extremes.

Since we typically plot the concentration on a logarithmic scale, it usually makes sense for our concentration series to be multiplicative. Simple examples could be powers of two (e.g. 2, 4, 8, 16…) or a series of multiples of ten (e.g. 10000, 1000, 100, 10…). This ensures that the concentrations fall with even spacing on the logarithmic scale. It can be tempting to vary the dilution factors within the series in an attempt to control exactly where the responses fall on the curve. This is a risky approach as it can lead to gaps in the dilution series into which the linear portion might fall due to natural day-to-day variability.

This highlights a key consideration: one ought not design a concentration series based on data from a single assay. This approach does not take into account the variability inherent in running an assay multiple times, and can lead to problems down the line. Instead a potential concentration-series should be evaluated over several runs of the assay to understand how the day-to-day variability of the assay interacts with the chosen series.

One biological issue which might cause us to choose a different concentration series is known as the hook effect. This is where, instead of a flat upper asymptote, we see the response drop off as the concentration continues to increase. This is a false negative – the reduction in measured response is not a genuine reduction in response, but, for example, there being insufficient detection antibodies present to show the full response.

A hook can be problematic for model fitting. One way to avoid the negative consequences of a hook in an assay is to simply truncate the concentration series before the hook kicks in! This will not be possible in all cases, but it can be a helpful trick to resolving a common issue. This method, however, is a sticking plaster: a drift in the assay or day-to-day variability might result in the hook reappearing. A more robust approach to removing a hook might be to examine the biological reasons why it occurs and investigate conditions which might lessen it. This could include returning to fundamental design choices such as the selection of the cell line used in the assay.

The importance of variance homogeneity

When fitting statistical models to data, it is also important to consider statistical inference, which includes the construction of confidence intervals and tests for suitability. We require several things to be true about the response data in order that this inference behaves as expected. One assumption which we see prove a common problem is that of homogeneity of variance.

Let’s break this down. Variance can be visualised as the spread of the data. The responses of a concentration group with high variance will appear spread out when plotted, while a concentration group with low variance will appear tightly packed. For statistical inference to behave properly, we require that the variance of our concentration-response data is homogenous: that is, it does not vary with concentration. The spread of the concentration groups should be roughly the same no matter what the concentration is.

In reality, we regularly see that the variance changes as the concentration increases. Often we find that the variance of higher concentration groups is greater than that of lower concentration groups – the variance increases with concentration. This can often look like the data is “fanning out”, particularly for linear models. This variance heterogeneity (sometimes called heteroskedasticity) breaks the assumptions we must make in order to perform the usual inference procedures and should be corrected before the data is used to draw conclusions about any samples.

Figure 5: Bioassay data showing increasing variance with concentration

Figure 6: Bioassay data with a response transform applied to the response

So, how can we restore homogeneity of variance? The simplest and most powerful approach is to use a response transform. This is a mathematical function, applied to the response data before model fitting, chosen to ensure that the transformed data has variance homogeneity.

The most common choice of function is a logarithmic transform, i.e. taking the log of the response values. A log transform has the effect of compressing concentration groups with high variance and spreading out concentration groups with low variance, ensuring the variance of the full data set is closer to homogenous. Once again, the base of the logarithm isn’t important, nor does it have to match that used for the concentration scale, but once a transform is finalised it should be kept consistent.

Other transformations, such as a square root or an exponential, can be used in situations where a log transform is inappropriate. If none of the common response transformations work, then a solution is to use response weighting. This is a process where concentration groups with higher variation are systematically weighted to have less influence during the model fitting process. While this can be a useful tool, it is statistically involved and requires significant amounts of historical data to utilise effectively. In general, response transformations should be the first port of call to correct issues with variance homogeneity.

Testing model fit

Our focus in this article so far has been on building a statistical model which is an appropriate fit to the data produced in a bioassay. As we’ve described, a poor model fit can lead to biases in reportable results, so it is important to consider an in-life test of the fit quality as part of the assay set-up.

There are several ways to test for goodness-of-fit (sometimes also referred to as lack-of-fit). One commonly used method is the F-test. This compares two types of error in the data for each concentration group:

The pure error: the difference between each response and the mean of a concentration group.
The lack-of-fit error: the difference between the mean response of a concentration group and the modelled value at that concentration.

If the lack-of-fit error is much greater than the pure error, then this is a sign that most of the “scatter” of the data about the model is due to a poor model fit, rather than random variability. The test would, therefore, fail.

The F-test has some well-known flaws, however, specifically, if the pure error is very small – that is, the assay is very precise – then the lack-of-fit error will almost always be much larger, and the F-test will almost always fail even if the model fit is good. An example of this effect can be seen in the real-life data in Figure 7. Visually, this data looks excellent – we have both high accuracy and precision, as well as what seems to be a good model fit. This data, however, actually fails an F-test for goodness-of-fit!

A similar effect can happen in the other direction, too. If the assay is very noisy and imprecise, then the lack-of-fit error is unlikely to be sufficiently large in comparison to the pure error for the test to fail.

Another metric which is commonly – and incorrectly – thought of as measuring goodness-of-fit is the coefficient of determination, or R². We’ve outlined the reasons why R² is not an appropriate measure of goodness-of-fit elsewhere, but the main takeaway is this: a high R² does not always indicate a good model fit, and a low R² does not always indicate a bad model fit. While R²is easy to calculate, and thus might seem tempting, we do not recommend it for testing goodness-of-fit.

Figure 7: Real life data which fails the F-test. We can visually see that this data is well-fit by the models, meaning these failures could be a result of the F-test’s tendency to fail very precise assays.

What should you use to test for goodness-of-fit then? A good option might be to test the “next model up”. For example, when using a linear model, one could fit a quadratic model and check whether the coefficient of the quadratic term is non-zero. If it is, then it’s a sign that there is curvature in the data, meaning a linear model might no longer be a good choice. Similarly, when using a 4PL, you could check that the E parameter of a 5PL remains close to one, which indicates that the data is sufficiently symmetric to be well-fit by a 4PL.

This type of test can be performed using a significance (or difference) test, similar to the F-test. However, a method which is fast becoming more popular is the equivalence test. The intricacies of this type of testing is well beyond the scope of this piece, but it provides a more statistically robust method of testing goodness-of-fit by setting fixed limits of allowed variation in the parameter.

These limits are set based on historical data, so equivalence testing may not be a good option for early stages of assay development before this data is readily available. We often find that a good approach is to use a significance testing – such as the F-test – for early development. This is often before the precision of the assay reaches a level where the F-test becomes problematic, and also before assay failures on goodness-of-fit are likely to prove a serious issue. As the example in Figure 7 shows, however, this is not always the case. This is a challenge regularly faced in early assay development: a metric to test goodness-of-fit is required, but there is not enough data available to properly implement equivalence testing.

Beyond model development

Designing a model for bioassay data is a crucial first step in defining a statistical analysis. But it is only the first step. In future additions to this series, we’ll look at ideas behind other parts of the bioassay statistical analysis, such as assay layout, replication strategy, and outlier management.

Subscribe to the Quantics Blog

Follow Quantics on Social Media:

LinkedIn Facebook Twitter

Try QuBAS Now!

About the Authors

Matthew Stephenson

Matthew Stephenson is Director of Statistics at Quantics Biostatistics. He completed his PhD in Statistics in 2019, and was awarded the 2020 Canadian Journal of Statistics Award for his research on leveraging the graphical structure among predictors to improve outcome prediction. Following a stint as Assistant Professor in Statistics at the University of New Brunswick from 2020-2022, he resumed a full-time role at Quantics in 2023.
View all posts
Jason Segall

Jason joined the marketing team at Quantics in 2022. He holds master's degrees in Theoretical Physics and Science Communication, and has several years of experience in online science communication and blogging.
View all posts