Clinical trial design - Comparing clinical trial endpoint summaries

Ann Yellowlees

This 3^rd clinical trial design blog from Quantics Biostatistics takes an introductory look at how the various types of clinical endpoints (introduced in our previous blog) determine the recommended calculations for what can be inferred from summary data.

Formal statistical testing for a clinical trial is traditionally based on the concept of proof by contradiction. This means we aim to disprove the assumption that there is no difference between the treatments (often placebo versus a new treatment). Bayesian approaches are also used but are discussed elsewhere.

Key Takeaways

Clinical trials commonly use hypothesis testing based on “proof by contradiction”.
A small P value indicates evidence against the assumption of no treatment difference.
Different endpoint types require different statistical tests.
Choosing an inappropriate test can lead to misleading conclusions.

The general approach is as follows. First, we calculate the difference between the average endpoint values for the treatment groups. Secondly, we determine the probability of observing a difference at least this large under the assumption that the true difference is zero. This probability is known as the P value. If the P value is small, the observed difference is unlikely to be due to chance alone.

If the P value is less than a pre-specified threshold (commonly 5% or 1%), there is evidence that the true difference is not zero. This threshold is called the alpha level and represents the probability of concluding there is a difference when none exists.

If a treatment could plausibly have either a positive or negative effect, a two-sided test is used. If only improvement is possible, a one-sided test may be appropriate.

Calculation of the P value

The method used to calculate the P value depends on the type of clinical trial endpoint. These methods are known as statistical tests.

Example 1: Binary endpoint

Consider a psoriasis trial where the primary endpoint is PASI-75 at 8 weeks (a 75% reduction in severity). In the placebo group, 31 of 63 patients (49%) achieved PASI-75. In the experimental group, 42 of 64 patients (66%) achieved PASI-75.

Fisher’s Exact test can be used to compare these proportions. Assuming interest only in improvement with the experimental treatment, a one-sided test is appropriate.

Fisher’s Exact test gives P = 0.045. Since P < 0.05, there is evidence at the 5% level to support a higher PASI-75 response rate with the experimental treatment.

Receive every Quantics blog as soon as it’s released

Subscribe to the Quantics Blog

Example 2: Continuous endpoint

In a back pain trial, pain reduction at 1 month is measured using a visual analogue scale (VAS). Median reductions were −59 in the experimental group (N=19) and −46 in the placebo group (N=18).

The Wilcoxon rank sum (Mann–Whitney U) test is suitable here, particularly as the data appear skewed. A one-sided test is used to assess greater improvement with the experimental treatment.

continuouse clinical trial design endpoint

The Wilcoxon test gives P = 0.282. Since P > 0.05, there is no evidence of a treatment difference at the 5% level.

A Student’s t-test could also be used, but it requires assumptions of normality and equal variances. Given the skewed data, the Wilcoxon test is preferred.

Example 3: Time-to-event endpoint

In a cancer trial where the endpoint is time to death, overall survival between treatment groups can be compared using Kaplan–Meier curves.

The log-rank test is used to compare survival curves. In this example, the P value exceeds 5%, indicating no evidence of improved survival with the experimental treatment.

More complex survival analysis methods will be discussed in a future blog.

About the Author

Ann Yellowlees

Company Founder and Director of Statistics – With a degree in mathematics and Masters in statistics from Oxford University, and a PhD in Statistics from Waterloo (Canada), Ann has spent her entire professional life helping clients with statistical issues. From 1991-93 she was Head of the Mathematics and Statistics section of Shell Research, then joined the Information and Statistics Division of NHS Scotland (ISD). Starting as Head and Principal Statistician of the Scottish Cancer Therapy Network within ISD, she rose to become Assistant Director of ISD before establishing Quantics in 2002.

Ann has very extensive experience of ecotoxicology, medical statistics, statistics within a regulatory environment and bioassay.

View all posts