When it comes to clinical trial statistical analysis, missing values are a major challenge that we need to address. Have they ever been a problem for you? If so, or if you just want to learn more about them, keep on reading!
By Mercedes Ovejero Bruna (Senior Statistician/Data Scientist) and Iratxe Herráez Sánchez-Mariscal (Junior Statistician)
Biostatistics and Data Management Unit at Sermes CRO
What are missing values?
Missing values are information losses that take place when one or more values are not stored (or available) in any of the patient’s variables. Although undesirable, missing data is something usual in clinical trials, despite all the efforts we make to avoid this situation. What is more, they can have a significant effect on the conclusions that are to be drawn from the data, as they reduce the power of the study and, in some cases, they may result in significant biases (Dziura et al., 2013).
Some of the most common causes of missing values are (Mack et al., 2018):
From the data analysis perspective, there are three categories of missing values (Allison, 2001; Mirzaei et al., 2022; Rubin, 1987):
Sometimes it is not easy to identify the typology of missing values, however, there are guidelines that can be helpful to identify if there is a pattern in the missing values, or if certain variables are related to a greater probability of missing values. For example, variables that have missing values can be visualized and the relationship between the appearance of missing values and a certain pattern in the study variables can be analysed. This would make it possible to the detect situations in which MAR and MNAR are involved. For this visual inspection, there are R packages such as VIM (Kowarik and Templ, 2016) and naniar (Tener et al., 2021) that allow, in a straightforward way, to understand the pattern of missing values. There are also omnibus statistical tests to study if missing data of MCAR type, such as the implemented one in the missmech package in R (Jamshidian et al., 2014).
Strategies for coping with missing values
The treatment of missing values is therefore of utmost importance, since a failure to consider missing values and their mechanism during the analysis can be misleading (Kang, 2013). That is why there are different strategies for coping with missing values (Jakobsen et al., 2017):
However, not all methods are suitable on every occasion since it depends on the type of missing value involved, and its amount. Those factors will guide the methodology to be applied. The figure below indicates the methods recommended for each type of missing value as well as practices that are not appropriate (Dziura et al., 2013; Fielding et al., 2008).
If you are working with R, packages like mice (van Buuren and Groothuis-Oudshoorn, 2011) and Amelia (Honaker et al., 2011) are two examples of versatile missing value implementations.
The role of sensitivity analysis
As we have seen so far, missing values in clinical trials are unintentional, but unfortunately unavoidable. When missing values are encountered, an additional complexity is derived from this, because every single statistical analysis makes assumptions about the distribution of the unobserved values that cannot be corroborated. If an incorrect assumption is made, the obtained treatment effect and its standard error will be biased, resulting in misleading inferences. Since the actual value of this data cannot be known, it is necessary to evaluate the impact of the approach by considering a sensitivity analysis (EMA, 2010).
Sensitivity analysis can be defined as a set of analysis in which data is managed differently compared to the primary analysis. Sensitivity analysis can show how assumptions, different from those made in the primary analysis, influence the results obtained (Jakobsen et al., 2017). Sensitivity analysis are to be specified either in the Clinical Trial Protocol or in the Statistical Analysis Plan before the study takes place, in no case should it be stipulated afterwards (Mack et al., 2018).
In conclusion…
The strategy to avoid missing values entails considering all the development phases of a clinical trial, from study design to final data analysis, implementing methods that minimize the risk of missing data, as well as having action plans that allow its detection and treatment (Pugh et al., 2022). This is the reason Sermes team works directly or indirectly so that the impact of missing values can be reduced and studies can be properly carried out. Tasks such as the design of the patient monitoring plan, the calculation of the sample size or the design of the CRF are critical to achieve this objective.
From a data analysis point of view, there is no single universal method for dealing with missing values that provides similar outcomes as an analysis with complete data. The best strategy should start from studying the assumptions and causes that produce these missing values and inspecting those to discover underlying mechanisms that can be helpful to identify missing values and its occurrence.
In clinical trials, it can usually be assumed that missing data belong to the MAR or MNAR categories, therefore implying, the need to implement adapted methodology for this type of case study, discarding traditional practices that have shown their lack of reliability and even greater likelihood of biased conclusions. Finally, sensitivity analysis are highly recommended to be conducted to evaluate potential biases in the results (Cro et al., 2020).
References
Allison, P. D. (2001). Missing Data. Sage publications.
Cro, S., Morris, T. P., Kenward, M. G., & Carpenter, J. R. (2020). Sensitivity analysis for clinical trials with missing continuous outcome data using controlled multiple imputation: a practical guide. Statistics in medicine, 39(21), 2815-2842.
Dziura, J. D., Post, L. A., Zhao, Q., Fu, Z., & Peduzzi, P. (2013). Strategies for dealing with missing data in clinical trials: from design to analysis. The Yale journal of biology and medicine, 86(3), 343.
European Medicines Agency (EMA) (2010). Committee for Medicinal Products for Human Use. Guideline on Missing Data in Confirmatory Clinical Trials. Available in: https://www.ema.europa.eu/en/missing-data-confirmatory-clinical-trials.
Fielding, S., Fayers, P. M., McDonald, A., McPherson, G., & Campbell, M. K. (2008). Simple imputation methods were inadequate for missing not at random (MNAR) quality of life data. Health and Quality of Life Outcomes, 6(1), 1-9.
Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: A Program for Missing Data. Journal of Statistical Software, 45(7), 1-47. URL https://www.jstatsoft.org/v45/i07/.
Jakobsen, J. C., Gluud, C., Wetterslev, J., & Winkel, P. (2017). When and how should multiple imputation be used for handling missing data in randomised clinical trials–a practical guide with flowcharts. BMC medical research methodology, 17(1), 1-10.
Jamshidian, M., Jalal, S., & Jansen, C. (2014). MissMech: An R package for testing homoscedasticity, multivariate normality, and missing completely at random (MCAR). Journal of Statistical software, 56, 1-31.
Kang H. (2013). The prevention and handling of the missing data. Korean journal of anesthesiology, 64(5), 402–406. https://doi.org/10.4097/kjae.2013.64.5.402.
Kowarik, A. & Templ, M. (2016). Imputation with the R Package VIM. Journal of Statistical Software, 74(7), 1-16. doi:10.18637/jss.v074.i07.
Mack C, Su Z, Westreich D. Managing Missing Patient Data in Patient Registries. White Paper, addendum to Registries for Evaluating Patient Outcomes: A User’s Guide, Third Edition. (Prepared by L&M Policy Research, LLC, under Contract No. 290-2014-00004-C.) AHRQ Publication No. 17(18)-EHC015-EF. Rockville, MD: Agency for Healthcare Research and Quality; February 2018. www.effectivehealthcare.ahrq.gov. DOI: https://doi.org/10.23970/AHRQREGISTRIESMISSDATA.
Pugh, S. L., Brown, P. D., & Enserro, D. (2022). Missing repeated measures data in clinical trials. Neuro-Oncology Practice, 9(1), 35-42.
Rubin, D. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, LTD.
Tierney, N., Di Cook, M., McBain, M. & Fay, C. (2021). naniar: Data Structures, Summaries, and Visualisations for Missing Data. R package version 0.6.1. https://CRAN.R-project.org/package=naniar.
van Buuren, S, & Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. DOI 10.18637/jss.v045.i03.