Blog
Aug 24

# Choosing a statistical model: Continuous response data

In the last blog we discussed the definition of relative potency and how it relates to the concepts of biological similarity and thus the need for parallelism between the reference dose response curves and the test curves. In order to test for parallelism and then calculate a relative potency, the data has to be mathematically modelled.

Read our blog on relative potency here

There are 2 fundamentally different types of assay data (from a mathematical point of view). In a continuous data assay, the response is effectively continuous, eg optical density, chemical measurement of a substance etc. In a quantal assay the response is measured as “all or none”, eg alive or dead, or reacted / not reacted. The difference is important because different mathematical models have to be used.

#### Continuous data assays

Whilst it is possible to use modern computer and mathematical techniques to model data without any assumptions about the data distribution between and within groups, this is complex and it turns out that in most situations some simple assumptions are (reasonably) valid and make everything much simpler.

###### Bioassay response curve 8 doses In this example there are 8 doses with 3 replicates at each dose and the doses span pretty much the whole range of response from 0 to maximum. The dose data is plotted on the log scale (see blog 1). A simple symmetrical S shaped curve appears to fit the data reasonably well. This symmetrical S shaped curve is called a 4 parameter logistic (4PL) curve because 4 parameters are required to specify it fully: the 2 asymptotes A and D, the parameter B, which is proportional to the slope at the inflection point, and the position on the x axis, C.

The equation for the 4PL is: #### Simpler models

Although the 4PL appears to fit the data quite well, in real use we do not often expect extreme values of response (or dose) so what about restricting the model to the middle section which actually looks quite linear?

A linear model is of course very simple. This simplicity makes the model very tempting, particularly if statistical software or support is not available. But there are potentially serious practical consequences to choosing a linear model. Remember that the model chosen will have to be fitted to the test substance as well as the reference. Look what happens if the test sample potency drops off….

The linear model fits the test and reference data nicely when the RP is close to 1,  but as the RP falls, the test “s” curve moves to the right on the dose axis and the linear model fits become increasingly non-parallel. The assay would fail a parallelism test. Had a 4PL been fitted, the assay would have passed. The animation below illustrates this well.

So use a linear model with caution if you want to avoid unnecessary assay failures.

#### 4PL assumptions

As mentioned earlier, there are some mathematical assumptions behind the standard 4PL that are generally satisfied in practice but are important to understand.

The model assumes that the amount of variability in the response data is the same for each dose group, and that variability follows a normal distribution. Unfortunately this creates some problems.

In the real world of bioassay, the variability of the data is often greater at higher response values. Logging the response variable (usually) has the effect of normalising this variability. So always use Log(dose) and consider log response unless you are sure that using response on the raw scale will not violate the variability assumptions.

In addition, a normal distribution assumes that the data is not limited on one side, ie the distribution is symmetrical. As the response approaches zero, this assumption fails. However if we use log(response), then since log(a very small number) = a very large (negative) number, the symmetrical distribution requirement is satisfied.

Other transforms that can be used, but the log function is by far the most common, and in this context it does not matter whether you use log 10 or log 2.

That’s all for this blog. Next time we will discuss other models including the 5PL, and how the models are optimised to get the best fit to the data.

Get in touch

Get all our latest news delivered straight to your inbox.

[mc4wp_form id="16233"]