Sample Size and Other Common Medical Device Trial Design Errors that Matter

Ian Yellowlees

Designing and running a clinical trial for a medical device is expensive, and to avoid wasting money, time and risking the validity of the trial, it is important to get the design right. To provide a little more insight, we thought we would discuss some recurring errors that Quantics has observed in trial synopses and protocols that have come our way.

Common, serious errors:

Incorrect sample size calculations
Choosing a primary endpoint that does not match the trial objective
Adding an interim analysis for early stopping without considering the sample size implications
Collecting too much (or not enough) data
Including too many analyses
Treating trials separately when they should be combined

1) Incorrect sample size calculations

Key Takeaways

Plan and justify sample size prospectively based on the actual trial objective and available subjects, rather than using arbitrary round numbers.
Define clear, measurable endpoints that directly address the stated objectives to satisfy regulators and support valid claims.
Pre-specify any interim analyses and adjust the sample size and alpha level accordingly to avoid inflating false-positive rates.
Be selective and purposeful with data collection and analysis; too many variables or subgroups adds cost and can generate misleading results.
Where patients are essentially the same population, design one larger trial (with subgroups if needed) instead of multiple small, non-independent studies.
Involve a statistician early in trial design to avoid costly design flaws and maximise the scientific validity of your results.

Sample size depends on the interaction of prior knowledge of the device, how that relates to the objective, funding and number of subjects available.

If you are making a claim about a product rather than just reporting observations, you need to know how reliable the claim is. In general, the more subjects, the more reliable the reported result will be; but more subjects equal more cost.

The point of the sample size is to prospectively ESTIMATE the number of subjects that will be needed to provide a given confidence in the result. Estimating sample size is complex because before the trial starts, you only have limited data about how the device will perform (that is the reason for the study!). If you get it wrong, any claim you make may have no validity, or you may waste a lot of time and money.

Example:

Many protocols we see have sample sizes that appear to be entirely unrelated to the objective and have a surprisingly consistent and round number (=100).

If money or resources are tight, you can ask the reverse question of your statistician – “with x patients, what confidence would I have in the result?” and then decide if that is good enough.

So what?

Don’t waste time and money or risk the trial validity by guessing the sample size, or by basing it on the wrong endpoint. Ask your statistician, that’s what they are there for!!

Note: Sample size is a prospective estimate. Once the trial is complete, the sample size you used is irrelevant to the analysis, because you now have actual data. We have been asked to help clients who have had Notified Bodies try to reject the result of a trial on the basis that they disagreed with the sample size calculation, even though the actual data collected demonstrated a good result with appropriate confidence.

2) Choosing an endpoint that does not match the objective.

Endpoints and objectives are often muddled up and not clearly linked.

An objective is just that, an objective… it is what you want to achieve.

An endpoint is a single trial outcome value that is measured for each subject. To address the trial objective, the individual subject endpoint data must be summarised over all subjects; this may be a simple or more complex statistical process.

Example:

Objective: “To assess the safety and efficacy of X when used in the following situation…”

Endpoint: “The primary endpoint for safety will be the number of serious adverse events reported in the 12-month period following surgery.” This data may then be presented for all subjects as, for example, an average number or rate.

Not: “The primary endpoint for safety will be to evaluate serious adverse events”

Endpoint: “The efficacy endpoint will be the change in distance achieved in the 6-minute walk test measured at the 12-month follow-up visit compared with the subject’s baseline walk test.” This data may then be presented for all subjects as, for example, an average distance walked 12 months after surgery.

Not: “The efficacy end point will be walking ability compared with baseline”

So what?

Failing to properly define the endpoint may result in failing to address the objective, and therefore failing to satisfy the regulators.

Receive every Quantics blog as soon as it’s released

Subscribe to the Quantics Blog

3. Adding an interim analysis without considering the sample size

Interim analyses can be extremely helpful to report how things are going (providing this does not affect blinding or introduce bias).

Example:

If the aim of an interim analysis is to see if the trial can be stopped early because the primary endpoint has been achieved, then this is an adaptive trial design and MUST be pre-specified before the start.

So what?

The extra analysis to check for an early stop actually increases the sample size required for the study. This must therefore be planned and documented before the study starts, to avoid the risk of increasing the chance of a falsely positive study conclusion. This would be picked up by regulators and is likely to result in rejection of the trial result.

4. Collecting too much (or not enough) data

Data collection and analysis is expensive, so it is particularly important to consider what data you need and how you are going to use it. Too much can be very wasteful and compromise the quality of the important data, but too little can compromise the entire study itself.

Examples:

Don’t collect data that cannot be (easily) analysed.
Don’t collect data that no one is going to look at.
Don’t collect data that repeats a previous collection. This is likely to lead to contradictions.
Do not collect data that is dependent on another piece of data.

For example, changes in medication use can be recorded, but are very hard to relate to an outcome measure, e.g. Does 10mg morphine provide more or less pain relief than 50mg tramadol with a non-steroidal?

Analysis costs are primarily related to numbers of tables, listings and figures (TLFs). Summarizing /analysing data that no one is interested in will cost you a lot with no added benefit!

What happens if you get different answers each time, which is correct and how do you analyse that?

For example, age, which is dependent on the time point at which it is collected rather than date of birth (which allows age to be calculated as required).

So what?

Getting this wrong can irritate the patients or clinicians and risks collecting conflicting data from the same subjects. One protocol we saw planned to ask patients to fill out 8 questionnaires containing a total of more than 100 questions, every week for 2 months! In practice this would very likely have resulted in a lot of missing or entirely unreliable data with major consequences for the cost and success of the study.

5. Including too many analyses

Running too many analyses risks creating misleading and incorrect conclusions. Secondary endpoints and subgroup analyses can quickly mount up unless there is a clear focus on what you are trying to find out.

Example:

Examining results by age is not 1 analysis but (perhaps) 4: age 22-30, 31-50, 51-64, 65+. One protocol we recently looked at turned out to have more than 250 subgroups… in a study with only 170 patients!

Even if you don’t calculate a statistical significance, the summary tabulations may be seriously misleading.

So what?

Famously, even in a highly positive trial such as ISIS-2, in which the overall statistical benefit for aspirin over placebo (in cardiac infarction) was extreme (P <0.00001), division into just the 12 subgroups of astrological star sign threw up two (Gemini and Libra) for which aspirin apparently had no benefit.1 [2] If you believed in astrology you would interpret this chance finding as positive proof.

If the secondary analyses or subgroups are reported with significance tests at p=0.05, then you can expect ~ 1:20 of your subgroups to be statistically significant in the absence of any true effect simply by chance.

6. Treating trials separately when they should be combined

When deciding whether to run two trials or one, ask yourself “are these the same group of patients / subjects?” If yes, then it should be one trial because the results are not independent. Designing them as separate trials, rather than one larger trial with subgroups, makes it appear that they are independent which changes the statistics.

It is possible to issue reports at different stages within a trial; you do not have to wait until the end (but there can be consequences of this – see point 3 above).

Example: Time points

If you want to analyse data at 1 month and 6 months, for the same set of subjects, this should be one trial. You will need to decide which is the primary end point because in general, there should only be ONE primary endpoint. (Sometimes co-primaries are possible, but this means BOTH have to pass for trial success. Multiple primaries are also possible, but this, like adding an interim for early stopping, has an impact on sample size and “alpha sharing”. Both co-primaries and multiple primaries need careful thought and guidance should be followed.[3] )

Example: Subgroups

If you want to analyse a group of patients by various different categories such as age, ethnicity, walking ability, these are subgroups and this should be one study. If the subjects in the subgroups do not overlap, you could run this as separate studies, but then you would not be able to look at how the device operated across all groups, i.e. you would waste useful information and lose the opportunity to make claims on the basis of the overall (larger) trial.

So what?

This really does matter, because, if the subjects are not independent the statistics will change. Reporting them as if they are independent gives the wrong results and could be regarded as an attempt at data fraud.

In summary:

These are common errors we have seen that can seriously compromise the scientific validity of the trial and therefore regulatory acceptance. Do not get started and then discover that your trial could ultimately be rejected by regulators…

We strongly recommend that you get advice during the very early stages of trial design to avoid wasted time, money and effort. Be sure you get it from a group with statistical expertise in device trials such as Quantics (but there are others!). Compared with the cost of a trial, good advice is not expensive. Can you afford not to seek it?

References

[1] ISIS-2 Collaborative Group. Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17 187 cases of suspected acute myocardial infarction.

Lancet. 1988; ii: 349-360

[2] Sleight P. Debate: Subgroup analyses in clinical trials: fun to look at – but don’t believe them! Curr Control Trials Cardiovasc Med. 2000;1(1):25-27. doi:10.1186/cvm-1-1-025.

[3] https://www.fda.gov/regulatory-information/search-fda-guidance-documents/multiple-endpoints-clinical-trials-guidance-industry

About the Author

Ian Yellowlees

Ian Yellowlees has an engineering degree and experience in software engineering and is also fully medically qualified, with 20+ years experience as an NHS consultant. He developed Quantics’ unique ISO9001 and GXP quality management system and provides business management and medical support to Quantics.

View all posts

Sample Size and Other Common Medical Device Trial Design Errors that Matter

Common, serious errors:

1) Incorrect sample size calculations

Key Takeaways

2) Choosing an endpoint that does not match the objective.

3. Adding an interim analysis without considering the sample size

4. Collecting too much (or not enough) data

5. Including too many analyses

6. Treating trials separately when they should be combined

In summary:

References

About the Author

Response Transformation: When and How?

Testing Diagnostics: Qualifying In Vitro Devices

USP <1033>: A Validation Example

Bioassay Simulation: Reduce Lab Work

Hidden Consequences of Interim Analyses & Adaptive Trial Options

How to Build a Confidence Interval

Quantics Biostatistics

Contact Us

Common, serious errors:

1) Incorrect sample size calculations

Key Takeaways

2) Choosing an endpoint that does not match the objective.

3. Adding an interim analysis without considering the sample size

4. Collecting too much (or not enough) data

5. Including too many analyses

6. Treating trials separately when they should be combined

In summary:

References

About the Author

Read Next

Quantics Biostatistics

Contact Us