Blog
Aug 29
DISTRIBUTION OF RESPONSE

When and How Should I Transform my Response Data?

Our clients sometimes tell us that they don’t want to use response transformations since it feels like an unwarranted manipulation of the data. However, there is little point in avoiding a transformation just to get results that may be wrong. In this blog we will explain when you should consider transforming the response, and how to choose a response transformation.

When analysing a relative potency assay, the dose itself is almost never used directly. Instead, a log transformation is applied, and the whole analysis is carried out using the log dose. This dose transformation is required for the relative potency calculation to make sense. (The only exception is for slope ratio assays, but these are rarely used.) In addition to transforming the dose, the response can also be transformed. Unlike the dose transformation, transforming the response is optional.

 

Testing variance homogeneity

First of all, response transformations should only be used for continuous data, never for quantal “all or none” data (e.g, alive or dead, or reacted / not reacted).  So in the rest of this blog I’ll only be talking about continuous data.

The underlying reason why a response transformation might be required is to satisfy the statistical assumptions made when fitting a dose-response model to the data. One of these assumptions is that the variance is “homogenous”. This means that the spread of the responses around the model is the same at every dose. This is illustrated below, where the distribution of responses, shown in red, has the same width at every dose.

DISTRIBUTION OF RESPONSE

However, in practice this doesn’t always happen. An example we came across recently is shown below. Here it is clear that the responses are becoming more and more spread out at higher doses.

Increasing variance of response

Sometimes the fact that the variability increases (or decreases) with the dose can be very obvious, as in the example above. However, sometimes the effect can be less extreme. In that case it can still be detected by calculating the standard deviation for each dose group. If the standard deviation is roughly the same for each dose group, the variance is homogenous; and if not, this assumption is not valid.

Assessing whether the variance is homogenous based on a single assay can be difficult. This is because the standard deviation can be very variable, since each dose group may only contain two or three replicates. Instead, multiple assays should be used. If they all show a consistent dependence of the standard deviation on the dose it is likely there is a lack of variance homogeneity.

The coefficient of variation (often called the CV or %CV) of each dose group can also be used to test variance homogeneity. However, a constant CV means the standard deviation isn’t constant; this follows from the definition of the coefficient of variation:

definition of the coefficient of variation

Since the mean response varies with the dose, the standard deviation must also vary in order to give a constant CV. So if the CV is constant, this is evidence for the variance not being homogenous.

Choosing a transformation

A lack of variance homogeneity can lead to serious problems. Since one of the assumptions needed for fitting is wrong, the fits won’t be accurate. This will lead directly to inaccurate estimates for the relative potency. Estimates for system and sample suitability criteria will be affected too, and so will confidence intervals and p-values.

The easiest way to deal with this is to transform the responses so that the variance becomes homogenous. In theory any transformation can be used, but there are a few simple ones that are very commonly used; one of these will usually work, so I would suggest trying these out first before trying something more complicated.

If the standard deviation is higher for higher responses, the simple transformations are:

  • square root of response
  • log⁡(response) – any base can be used for the logarithm
  • log ⁡(response+1) – this one is useful if the response can be zero, in which case log ⁡(response) can’t be used

On the other hand if the standard deviation is lower for higher responses, the simple transformations are:

  • response squared
  • exp⁡ (response)

A particularly common situation is that the CV is constant. In that case it can be shown mathematically that a  log (response) transformation should be used.

 

Transformations all the way down

As we mentioned in the introduction, our clients sometimes tell us that they don’t want to use response transformations since it feels like an unwarranted manipulation of the data, and they would prefer to analyse the “raw” data. However, there is little point in avoiding a transformation just to get results that may be wrong – which is certainly possible if the required statistical assumptions, such as variance homogeneity, are not satisfied.

Furthermore, the apparently “raw” data has often been transformed already. For example, a common response variable in bioassay is the optical density, measured by a plate reader. However, a plate reader doesn’t measure optical density directly; instead it measures the light transmission, and the optical density is calculated from it using the equation

optical density is calculated

This is nothing but a response transformation: the raw response is the light transmitted, and a  transformation has been applied to it to get the transformed response, which we call the optical density.

In fact, it’s worse than that. Even the light transmitted isn’t measured directly. There’s a photodiode inside the plate reader, and what’s actually measured is the electrical current produced by the photodiode when it’s exposed to the light, which is then converted to a measurement of the current. So there are actually two transformations between the raw data and the optical density.

But hang on. It turns out that even the current isn’t measured directly – there are a whole series of transformations needed to get to the optical density. The same thing is true for other response variables used in bioassays.

In fact, it’s hard to think of any data that is truly “raw”. Whether such a thing can exist even in principle is an interesting question in the philosophy of science, but perhaps a bit off-topic for this blog. But for our purposes, it’s enough to realise that the “raw” data we’re handling has already been transformed in various ways, so we may as well transform it again if that makes things more convenient.

 

After transforming

Returning to the example above, the same data after applying a log transformation is shown in the figure below.

Transformed response

The variability is now clearly much more similar across the doses, and we could confirm this by looking at whether the standard deviations of the dose groups are similar. In practice, it is better to check the transformation works by looking at multiple assays, not just one.

It is important to remember that the response transformation will change the entire subsequent analysis. In particular the dose-response relationship will change, so if, for example, a 4PL was a good fit to the data before the transformation, that may no longer be the case after the transformation.

We talk about model choice in our blog on that specific topic, Choosing a statistical model: Continuous response data.

System and sample suitability criteria will also be affected. Therefore both the choice of model and the suitability criteria need to be reassessed if a response transformation is used. In particular, parallelism tests may need to be changed; this is something we are going to write about in our next blog.

    Comments

    Share on activity feed

    Powered by WP LinkPress

    About The Author

    Senior Statistician – Francis joined Quantics in 2013. With a Masters from Cambridge and DPhil in Theoretical Physics from Oxford University, Francis brings high level mathematical ability and extensive experience in simulation techniques to Quantics. These techniques can be used to explore “what if” scenarios, reducing the need for further experimental data. Francis heads the R&D team.