Choosing a statistical model: Continuous response data

Francis Bursa

Previously, we discussed the definition of relative potency and how it relates to the concepts of biological similarity, and thus the need for parallelism between the reference dose response curves and the test curves. In order to test for parallelism and then calculate a relative potency, the data has to be mathematically modelled.

There are 2 fundamentally different types of assay data (from a mathematical point of view). In a continuous data assay, the response is effectively continuous, eg optical density, chemical measurement of a substance etc. In a quantal assay the response is measured as “all or none”, eg alive or dead, or reacted / not reacted. The difference is important because different mathematical models have to be used.

Continuous Assay Data

Whilst it is possible to use modern computer and mathematical techniques to model data without any assumptions about the data distribution between and within groups, this is complex and it turns out that in most situations some simple assumptions are (reasonably) valid and make everything much simpler.

In this example there are 8 doses with 3 replicates at each dose. The doses span pretty much the whole range of response from 0 to maximum. The dose data is plotted on the log scale.

Key Takeaways

Continuous response data in bioassays is often modeled using the 4-parameter logistic (4PL) model, which provides a good fit due to its symmetrical S-shape. While simpler models like linear regression might seem attractive, they can lead to false assay failures when relative potency (RP) shifts.
The 4PL model assumes uniform variability and normal distribution, but in real-world bioassay data, variability often increases at higher response values. Using log-transformed doses and responses helps normalise variability.
Although linear models may appear simpler, relative potency shifts can cause non-parallelism and assay failures. A 4PL model is often a better solution where possible to avoid unnecessary failures in bioassay analysis.

A simple symmetrical S-shaped curve appears to fit the data reasonably well. This symmetrical S shaped curve is called a 4 parameter logistic (4PL) curve because 4 parameters are required to specify it fully: the 2 asymptotes A and D, the parameter B, which is proportional to the slope at the inflection point, and the position on the x axis, C.

The equation for the 4PL, for a response y and log(dose) x is:

$y =D+\frac{A-D}{1+e^{B(x-C)}}$

Simpler models

Although the 4PL appears to fit the data quite well, in real use we do not often expect extreme values of response (or dose) so what about restricting the model to the middle section which actually looks quite linear?

A linear model is of course very simple. This simplicity makes the model very tempting, particularly if statistical software or support is not available. But there are potentially serious practical consequences to choosing a linear model. Remember that the model chosen will have to be fitted to the test substance as well as the reference. Look what happens if the test sample potency drops off….

The linear model fits the test and reference data nicely when the RP is close to 1, but as the RP falls, the test “s” curve moves to the right on the dose axis and the linear model fits become increasingly non-parallel. The assay would fail a parallelism test. Had a 4PL been fitted, the assay would have passed. The animation below illustrates this well. Use a linear model with caution if you want to avoid unnecessary assay failures.

4PL assumptions

As mentioned earlier, there are some mathematical assumptions behind the standard 4PL that are generally satisfied in practice but are important to understand.

The model assumes that the amount of variability in the response data is the same for each dose group, and that variability follows a normal distribution. Unfortunately this creates some problems.

In the real world of bioassay, the variability of the data is often greater at higher response values. Logging the response variable (usually) has the effect of normalising this variability. So always use Log(dose) and consider log response unless you are sure that using response on the raw scale will not violate the variability assumptions.

In addition, a normal distribution assumes that the data is not limited on one side, ie the distribution is symmetrical. As the response approaches zero, this assumption fails. However if we use log(response), then since log(a very small number) = a very large (negative) number, the symmetrical distribution requirement is satisfied.

Other transforms that can be used, but the log function is by far the most common, and in this context it does not matter whether you use log 10 or log 2.

Subscribe to the Quantics Blog

Follow Quantics on Social Media:

LinkedIn Facebook Twitter

Try QuBAS Now!

About the Author

Francis Bursa

Senior Statistician – Francis joined Quantics in 2013. With a Masters from Cambridge and DPhil in Theoretical Physics from Oxford University, Francis brings high level mathematical ability and extensive experience in simulation techniques to Quantics. These techniques can be used to explore “what if” scenarios, reducing the need for further experimental data. Francis heads the R&D team.

View all posts

Choosing a statistical model: Continuous response data

Continuous Assay Data

Key Takeaways

Read next…

What is Relative Potency?

What is the 4PL Formula?

Response Transformation: When and How?

Simpler models

4PL assumptions

About the Author

About The Author

Choosing a statistical model: Continuous response data

Continuous Assay Data

Key Takeaways

Read next…

What is Relative Potency?

What is the 4PL Formula?

Response Transformation: When and How?

Simpler models

4PL assumptions

About the Author

About The Author

Related Posts