Welcome to our fourth bioassay blog from Quantics Biostatistics. In our previous blog we discussed the 4 parameter logistic (4PL) model. This is a symmetrical S-shaped curve with the equation
In this bioassay blog we will go into more detail about models for continuous response data, and in particular some of the problems that can arise.
(Sometimes you may see slightly different versions of the 4PL equation shown above – these are all mathematically equivalent. However, one thing to watch out for is that the meanings of the four parameters can vary between these versions. So if you’re fitting a 4PL and getting parameter values very different from those you were expecting, check that you’re using the version of the 4PL you thought you were!)
The 4PL is a symmetrical curve, which starts out at an asymptote (a constant value) at low doses, increases in an S-shaped curve, and ends up at another asymptote at high doses. An example 4PL curve for bioassay data is shown in the figure below. Note that A and D are the asymptotes, and the EC50 is antilog(C). B is the “slope parameter”, which is proportional to the slope.
The 5 parameter logistic
The 4PL often fits bioassay data quite well. But since it is symmetrical, it will not fit asymmetrical data well. In this case a commonly-used alternative is the 5 parameter logistic (5PL) model. This is similar to the 4PL but has an additional parameter, E, which allows it to be asymmetric. The equation for the 5PL is:
Like the 4PL, the 5PL starts at one asymptote at low doses and ends at another asymptote at high doses. However, the curve from one asymptote to the other is not symmetrical. Here’s an example:
The asymptotes are A and D again; however, the EC50 is not antilog(C) for the 5PL. It is:
B is again the slope parameter. The new parameter E controls the asymmetry: if E is 1 the curve is symmetric. For larger values the part of the curve near the A asymptote becomes more tightly curved while the part near the D asymptote is less curved; for values below 1 it is the other way around.
The fitting process
To understand some of the problems that can arise when using the 4PL and 5PL, it is important to have an idea of what bioassay statistical software does when it “fits” a model. Statistical software uses a method called “least squares” fitting. This method has the advantages that it is mathematically simple, and it is consistent with the assumption that the data follows a normal distribution.
Robust regression is an alternative approach to this but that will be discussed in a later blog.
Least squares fitting, as the name suggests, tries to minimise the sum of “squares” – that is, the squares of the distances between the data and the model curve. This makes sense because making these distances small will make the curve be close to the data, which is what we want.
It is impossible to directly calculate the best fit 4PL or 5PL model through a single equation. Instead, bioassay statistical software starts with a rough guess at the values of A, B, C and D (and E if 5PL), which may be quite far from the best fit. It then tries to adjust this fit a little and checks whether the sum of squares has decreased. If it has, it accepts the change and then tries another one; if not, it tries a different change to the curve. This continues until it is impossible to find a change that decreases the sum of squares, in which case the software concludes that it has found the best fit.
Problems with fitting
Normally the fitting process works well. However, in some situations it can give unexpected results.
One example of this is that the “best” fit determined by the software can still be very poor. This happens if we try to fit bioassay data to an inappropriate model. For example, if the data is asymmetric but we try to fit a 4PL, there is no way the fit can be close to the data everywhere, since the 4PL is inherently symmetric. Another example is if the bioassay data are extremely asymmetric and we try to fit a 5PL. Although the 5PL is asymmetric, there is a maximum amount of asymmetry possible. The figure below shows the full range of asymmetry possible:
All the curves in this plot above are 5PLs, with different values of the E parameter (the additional parameter which allows it to be asymmetric). The black curve is E=1, for which the 5PL is symmetric, and the dotted curves are other values of E. However, it is impossible to go beyond the solid purple curve on right side (E=0) or the solid red curve on the left (E=∞). So if your bioassay data is more asymmetric than this, it will be impossible to get a good fit with a 5PL.
An appropriate goodness-of-fit test, for example the F test, is useful to check whether this is happening. The F test can be used as a goodness-of-fit test for any model as long as there are two or more replicates. The F test compares two quantities: the “lack-of-fit” error, which measures how far the fit is from the bioassay data, and the “pure error”, which is measures how far apart the replicates are, on average. If the former is large compared to the latter the p-value for the F test will be small, indicating a poor fit.
Another issue is that the fit can be close to the data, but the model curve may not be as expected . For the 4PL, we can end up with curves like this:
What’s going on here?
Well, with the bioassay data shown above it so happens that the sum of squares always becomes smaller if B is increased – but more and more gradually as B increases. Mathematically, the best fit (the solid line) has infinite B, which means the model curve is flat all the way up to the EC50, where it abruptly jumps to the other asymptote.
A particular problem with this situation is that it may not be as obvious as the example above that it is happening. Most bioassay statistical software will adjust the fit by increasing B a large number of times, but eventually the change in the sum of squares becomes so small that the software concludes that it has reached the best fit, and stops. It will then output whatever value of B it has reached, which might be something like 10 or 20. This will give a steep model curve, like the dotted curve in the plot, but without an abrupt jump. There may be no obvious sign anything is wrong.
To avoid this problem, always examine the software output for warning messages – in this situation, the software output might say something like “Fit failed to converge” – and consider using suitability criteria which reject fits with very large values of B. If this happens repeatedly, it may be necessary to decrease the spacing between doses to get more data on the steep part of the curve.
Quantics’ bioassay software, QuBAS deals with this problem automatically: it specifically checks whether what looks like the best fit is a sensible smooth curve.
Our Bioassay blog so far has been focussing on continuous response (quantitative) data, but next time we plan to move onto binary (quantal) response data and choosing an appropriate dose-response model.
Before we close, however, we wanted to reflect on an interesting paper that had us thinking here at Quantics. Is the biological data presented above really always unexpected?
A similar “jump” to what the 4PL above shows can also happen for 5PL fits and our purple curve in our 5PL figure above shows a different example where the response is exactly flat up to a certain dose, where it suddenly start going steeply upwards.
However, the recent paper Growth-dependent bacterial susceptibility to ribosome-targeting antibiotics (Greulich et al 2015) used a simple mathematical model for bacterial growth in the presence of antibiotics and found curves that look rather similar to the purple curve, using an equation very similar to the 5PL – look at the green curves in this figure from their paper.
The authors of this paper also find that their model fits experimental data quite well. So maybe these “unexpected” fits can happen after all?
Greulich P, Scott M, Evans MR, Allen RJ. Growth‐dependent bacterial susceptibility to ribosome‐targeting antibiotics. Molecular systems biology. 2015 Mar 1;11(3):796.