Charles M. Beasley, Jr., and Roy Tamura: What We Know and Do Not Know by Conventional Statistical Standards About Whether a Drug Does or Does Not Cause a Specific Side Effect (Adverse Drug Reaction)
3. Potential sampling error in an RCT and what we learn from the lack of occurrence of an AE in an RCT (Rule-of-3) and impact on sample size calculations
An RCT or set of RCTs samples only a subset of the entire population of interest as subjects. Interpretations of the results of RCTs are then extrapolated to the entire population of interestand this is the very essence of clinical research. Even with the use of the best methods of random allocation of subjects to the treatments in an RCT, the observations in the RCT (within treatments and between treatment differences)can differ from what would be observed if the entire population of interest was studied in the RCT.The statistical“Rule-of-3”(Eypasch 1995; Hanley 1983)addresses the potential difference between what is not observed in a subset sample compared to what would be observed if the entire population of interest (or another subset) was to be studied in an RCT.
The followingis a simple example of the sampling problem and the “Rule-of-3”:
Let us say that we are interested in the entire human population and that the truth is that drug X does cause some ADR “Bad-Thing” in 1 in 1,000 persons (and nothing else but drug X causes the AE “Bad-Thing” that in this case is an ADR – background incidence of 0%). As of December 2017, the world’s population was estimated at 7.6 billion. If we could somehow study that entire 7.6 billion sample for a sufficient period to observe all occurrences of the ADR, we would observe 7.6 million cases of the AE “Bad-Thing,” with all these cases being ADRs. However, if we were to study only 1,000 subjects and we sampled the entire population perfectly, we would observe one case of this ADR. However, if we sample only 1,000 subjects, we are highly likely to obtain a sample where the incidence of the ADR differs from the incidence in the entire population. While we might observe more than one case of this ADR, we are more likely to observe no cases of this ADR. The “Rule-of-3” addresses the lack of observation of an outcome.
The “Rule-of-3” has two variants relevant to this discussion:
· Precise interpretation: If we study 1,000 subjects and do not observe a single case of AE “Bad-Thing,” we can conclude with 95% probability that the true incidence of AE “Bad-Thing” is only <1/334 subjects (AE might or might not be an ADR). The incidence of AE “Bad-Thing” has a 95% probability of being between 0/1000 and 1/333, where 333 is the approximate upper bound of the 95% confidence interval (CI) when 0 events have been observed in 1,000 observations.
· Extrapolation: We are studying only a subset of the population of interest and our sample might have an incidence of the ADR that differs from the incidence in the entire population. Therefore, we would need to study at least 3,000 subjects to have a 95% probability of observing even 1 case of the ADR “Bad-Thing” with a true incidence of 1 in 1,000 (with no cause of the AE other than it being an ADR).
This estimation only applies to cases of 0 observations (Ludbrook 2009) and the simple calculation of the upper bound of the CI is only validwith a relatively substantial number of observations (e.g., ≥100) (Jovanovic 1997).
Note that the two variants of the “Rule-of-3” only address not observing a single case of AE “Bad-Thing” and not “proof” of presence or absence of “Bad-Thing” as an ADR.
The potential difference in what is observed in a subset of the whole population of interest that is studied compared to what would be observed if the whole population of interest was studied,is important in understanding the results of sample size computation that Beasley provided in his response to Blackwell (2018). Sample size computations consider the potential for what is observed (in this case the incidence of an AE) in the sample selected for anexperiment to deviate from what would be observed if the entire population of interest was included in the experiment. The result of this adjustment for potential variation between the experimental subset and the entire population is that the sample size for any given power greater than ~50% power will result in p-values<0.05 If the experimenter was:
1) lucky enough to select a subset for which what is observed is equal to what would be observed in the entire population of interest (or greater than the incidence in the entire population);
2) lucky enough to guess the incidences that would be observed;
3) used these incidences in sample size calculations.
In other words, smaller sizes than those obtained with an 80% or 90% power sample size computation will be sufficient to “prove” that an AE is an ADR if one is lucky in guessing the observed outcome incidences and using these in the sample size calculations. However, one might not get lucky with sampling and miss “proving” that an AE is an ADR without a sample size that provides 80+% power even if one is lucky in guessing incidences in the entire population of interest.
Beasley CM. Charles M. Beasley, Jr’s response to Blackwell’s reply. Corporate Corruption in the Pharmaceutical Industry. www.INHN.org. January 12, 2018.
Eypasch E, Lefering R, Kum CK, Troidl H. Probability of adverse events that have not yet occurred: statistical reminder. BMJ 1995; 311:619-620.
Hanley JA, Lippman-Hand A. If nothing goes wrong, is everything all right? JAMA 1983; 249:1743-1745.
Jovanovic BD, Levy PS. A look at the rule of three. Am Statistician 1997; 51:137-139.
Ludbrook J, Lew MJ. Estimating the risk of rare complications: is the ‘rule of three’ good enough? Anz J Surg 2009; 79:565-570.
January 10, 2019