It has been sometime since I have set foot in a lab, but the thought of seeing outliers in my data still sends a shiver down my spine – don’t you just internally (or externally) groan when you see them in your assay results? Do you mask them and pretend they are not there, or do you remove them from your analysis altogether, oftentimes based on just a hunch or visual observation? After all, outliers are clearly identifiable most of the time, and generally speaking, you know what your data should and should not look like… right? To help shed some light on this issue that keeps us all up at night, I asked one of the team to provide some answers (thanks Francis). In this blog we will discuss the impact that outliers can have on your assay as well as best practices for handling them.
Outliers are a common occurrence in bioassays but how they are dealt with can be a contentious issue. Many scientists Quantics speak to believe they should be able to simply remove a data point from their analysis if they consider it to be an outlier. However, from a regulatory point of view this is unacceptable, unless it is recorded and removed before any analysis has been carried out.
So first of all, what is an outlier? According to the USP, an outlier is data “that appears not to belong among the other data present”. An example is shown in the figure below – it is clear that most of the data (the green points) are clustered tightly together and follow a linear dose-response relationship, but there is one point (coloured red) that clearly “appears not to belong”.
Of course, there will always be some points that are further away from the mean than the others. Even so, they are not outliers unless they are very far away from the rest of the data – just being a little bit further is not enough.
Problems Caused by Outliers
It is clear that having numerous outliers in the data, say one in every dose group, is undesirable. However, what is not so obvious is that even one outlier can potentially cause problems. For example, looking at the Figure 1, at first glance everything looks pretty clear (even if the points had not been colour-coded), there is a linear dose-response relationship which runs through all of the points except the outlier… and then there is the outlier doing its own thing.
The implications of this one outlier on the overall model are seen when you try to fit a model to the data – the best fit will try to compromise. It will run close to the majority of the data, but it will also try to take into account the outlier, so the result is it ends up landing somewhere in between. This is shown in Figure 2, where the best fit including all the data is the red line. It passes above most of the data at the low doses because of the high outlier at the bottom dose. For comparison, removing the outlier entirely and fitting the remaining data gives the green line.
There is clearly quite a large difference between the fits – in particular, the red line has a smaller slope than the green line. In a bioassay, this could mean that the assay fails system suitability criteria, or if it passes, the relative potency estimate could be inaccurate.
Removing Outliers
Given the problems outliers cause, the simplest solution appears to be just remove them. However, this is not as simple as it sounds, as it is not always obvious from the data what an outlier is and what it is not. In Figures 1 and 2, the point at the lowest dose with a very high response (around 7) is clearly an outlier. If the response was much lower (e.g. 3), it would be typical for its dose group and clearly not an outlier. But in-between there are a range of values around 4 or 5 where it might not be as clear whether this is an outlier or just a slightly high response.
Making a manual judgement in such cases would be difficult, and manual judgements are subjective in any case – different people might make different decisions on whether a particular point is an outlier or not. For consistency and for regulatory reasons it is better to use an automatic method of detecting outliers.
A variety of methods for detecting outliers exist, which in practice give very similar results; a couple of the most popular are Studentised residuals and the Grubbs’ test. Both are commonly available in software packages used for bioassay analysis, including our commercially available software, QuBAS.
These two methods work by calculating the “residuals” (that is, the distance of each point from the best fit), adjusting for the expected size of the residuals, and then identifying whether any are too large. The adjustment step is a bit different in the two methods, which can lead to different points being identified as residuals in each case. The other difference is that Studentised residuals identifies all the points that it thinks are outliers, whereas the Grubbs’ test only checks whether the largest residual is an outlier. Therefore, if there are multiple outliers, the Grubbs’ test will only identify one. This can be dealt with by removing the outlier and using the Grubb’s test again until no more outliers are found. Most software such as QuBAS automatically conducts these repeated Grubbs’ tests when the “Remove outliers” option is ticked.
A completely different method of removing outliers is to try and transform the data. Sometimes a response that looks like an outlier can actually be due to increased variation in one part of the dose-response curve. For example, in Figure 3, the responses at the lowest dose are very far apart, so it looks like one of them is an outlier (although there is no way of telling which one).
But after log-transforming, the same data looks like Figure 4.
Now the bottom dose group does not look unusual – the responses are no further apart than at several of the other doses. Therefore, there are actually no outliers here at all! We’ve explored response transformations in more detail in it’s own blog.
Alternative Methods of Outlier Detection
An alternative to detecting outliers is to try and directly determine that a particular response “does not belong” by identifying an underlying cause. However, in practice this is rarely possible with bioassay work; as there is so much inherent variability that already exists, there are numerous possible causes of any observed discrepancy.
Another approach is to accept that there are occasional outliers in the data and choose an analysis method that is less affected by them. This approach is called robust regression. It can work well, but it requires specialist statistical software. We will focus on this method in an up-and-coming blog so please sign up below to ensure you receive the next update.
Regulatory Perspective
Finally, any approach to outliers should be consistent with regulatory expectations, for example Chapter 1032 of the USP. This states amongst other things that “discarding data solely because of statistical considerations should be a rare event”. Therefore, if a method like Studentised residuals or Grubbs’ test is used, the parameters of the method should be chosen to only remove the most extreme outliers.
If outliers are still frequently being removed, there is inevitably something else going on. Perhaps some error is consistently being made in the assay procedure, or maybe the “outliers” are not really outliers at all and can be removed by transforming the response. Both these possibilities should be investigated before deciding to remove any data point that you think is an outlier.
Comments are closed.