Apr 14
missing data

Post Covid 19 – Handling Missing Data in Clinical Trials

COVID-19 disruption of clinical trials is likely to result in missing data, disrupted timelines, and perhaps patient population changes. We thought we would dedicate a blog post to the actions that can be considered when a study is facing the issue of missing data.

The COVID-19 situation is clearly still developing and regulators are still in the process of clarifying their thinking, however the FDA has issued Guidance [1] and the EMA Committee for Human Medicinal Products has also just issued a “points to consider” document for public consultation [2].

The main statistical analysis considerations include:

  1. Outcomes and results may be safeguarded if appropriate action is taken quickly,
  2. Preserve the database and make clear recordings of how/why data is missing and reasons for discontinued subjects,
  3. Revisions may be needed to the SAP to address changes required to reporting of results,
  4. Are there wider impacts of COVID-19 on data gathering and patient outcomes, for example social or economic pressures on subjects?

Missing data is common

Missing data is a common problem in all clinical trials, but normally expected to be a small proportion of the total data set. Often the missing data can just be treated as missing, but if there is a significant proportion of data involved, as may occur during and post COVID-19, protocol amendments may be required to introduce more sophisticated statistical ways of managing the problem.

This is particularly important if the value of the primary outcome for a patient is missing.  This may result in a reduction in the power of the study, and therefore a need to revisit the sample size.

Minimise the impact of missing data

Some steps can be taken to try to minimise the impact.

  • Some data is always better than none, so when and where possible continue to collect outcome data, even if it is out of sync, or at the wrong time points. It is also an ethical mandate, as pointed out in the EMA paper [2], to proceed with a trial where possible.
  • Ensure that you have a mechanism for identifying missing data and record the reasons for any subjects discontinuing the trial.
  • If the trial involves patient reported outcomes, or diaries, consider contacting patients and encouraging them to continue recording even if clinic visits have had to be stopped / delayed.
  • If a patient discontinues a trial, efforts should be made to obtain the participant’s consent for the use of data on treatments and outcomes. This preserves the ability to analyse endpoints for all participants who underwent randomization and thus to make possible intention-to-treat inferences, which are grounded in randomization.

Take stock of the accumulated data – Data Monitoring Committee (DMC) and interim analysis

To assess the impact of missing data and changes in patient populations, a blinded review of data by an independent DMC should be considered. The review will be able to advise on changes to trial design, sample size, formal missing data management and the potential for an interim efficacy or futility analysis; if the trial progress was reasonably advanced when the lockdown hit.

An interim analysis that was not originally planned has significant impact on study power and MUST be carefully planned with statistical support to maintain scientific validity. The proposed analysis must be fully documented and, if unblinding is considered, a separate statistical team may be required to maintain blinding for the main clinical teams and CRO staff.

Plan for management of missing data

Assuming the trial is to continue, then now is the time to define in advance the plan for dealing with the increase in missing data. Formal planning is necessary in order to avoid introducing bias into the comparison of treatments which could happen if the methods are chosen after seeing the data.

There is no single correct analysis when data are missing:  all methods require assumptions which are usually unverifiable.  The best approach in this POST COVID-19 situation (where the missing data is related to external circumstances beyond the control of the trial) will depend on what is missing.

There are three fundamentally different scenarios for missing data:

  1. ‘Missing at random’: when a recorded characteristic about the participant can account (or partially account) for differences in the data for observed and missing cases.

As an example, suppose that the trial requires diary data on exercise. In the current situation, it may be that older patients, considered more at risk of COVID-19, will be less willing to take exercise outside than younger patients. The reluctance is just related to being older and more concerned about COVID-19, and is not related to their actual level of fitness.

  1. ‘Missing completely at random’: when all participants are equally likely to have any given variable missing.  This means that the complete cases are representative of all the original cases as randomized, so can be used for inferences about the treatments.
  2. ‘Missing not at random’: This occurs when the data is missing because of a factor related to the primary analysis. For example, Missing not at Random would occur if older patients exercised less because they had become unfit due to the COVID-19.  In this case, any analysis of the study endpoints has the potential to be biased by missing data.

Possible analysis approaches include the following:

  1. Complete-case analysis: participants with missing data are simply excluded from the analysis. This would be the basis of the interim analysis suggested above.

This approach is ONLY valid if the data are ‘missing completely at random’, so the complete cases are representative of all the original cases as randomized.

  1. Single imputation methods: a single value is filled in for each missing data point by means of, e.g.,
    1. last observation carried forward (LOCF)
    2. baseline observation carried forward (BOCF).

These methods MAY be valid if data are ‘missing at random’.  Clinical input (blinded to the data) is generally needed to inform and support the choice of imputation method.

  1. Methods based on statistical models, including:
    1. Repeated measures might be assumed to have a normal distribution with a specified form of mean and covariance matrix.
    2. Bayesian methods, in which inferences are based on a statistical model that includes an assumed prior distribution for the measurements.
    3. Multiple imputation, in which multiple sets of plausible values for missing data are created from their model-based predictive distribution and estimates and standard errors are obtained with the use of multiple-imputation combining rules.

The choice of strategy should be described and justified in the statistical section of the protocol and the assumptions underlying any mathematical models employed should be clearly explained.  Analysis methods that are based on plausible scientific assumptions should be used.

Sensitivity analysis

Whichever analysis method is chosen, a sensitivity analysis should be conducted for the method of handling missing values, especially if the number of missing values is substantial.   One approach is to use pattern mixture models, which examine subgroups of participants with different patterns of drop-out.

Other related blogs

 An Introduction to Survival Analysis for Clinical Trials


  1. FDA Guidance on Conduct of Clinical Trials of Medical Products during COVID-19 Pandemic
  2. Committee for Medical Products for Human Use. Points to consider on implications of Coronavirus disease 4 (COVID-19) on methodological aspects of ongoing clinical 5 trials European Medicines Agency, 25 March 2020 EMA/158330/2020

About The Author

Daniel joined Quantics in 2015. He has a Masters in Applied Statistics and Datamining from the University of St Andrews in Scotland. Since joining Quantics, Daniel has been part of our HTA team. He has used R and WinBUGS to conduct network meta-analyses for urology, ophthalmology and respiratory indications. He has also been involved in the reporting of these analyses.