Michael Hoenig ()
In my May and June columns,1 I discussed some of the problems presented in litigations that essentially boil down to “trial by literature,” where experts rely on hearsay articles to create or fortify their opinions. I reported that many articles are not truly peer reviewed. Further, I elaborated massive shortcomings in the peer review process even when it is performed.
The instant column demonstrates how painstaking the “gatekeeping” task of a judge is when an expert, even one well qualified in the scientific field, expresses opinions and uses methods that depart from accepted principles and methodology recognized by the relevant scientific community. The gatekeeping challenge is particularly onerous when the field in question reflects publication of numerous technical articles or studies that may allow the expert to “cherry-pick” here and there in order to slap together a litigation opinion. Everyone should be on “red alert” in “trial by literature” scenarios.
The case on which I report here is complex. It involves questions as to whether reliable causation opinions have been advanced that a particular drug causes birth defects. To make the case understandable, I include some explanatory details about the science involved (as the court has described it). But the methodological issues and the tactic of experts “cherry-picking” among articles are common enough in numerous other litigation scenarios, so that the lessons to be gleaned are generally relevant.
On June 27, in multidistrict litigation (MDL) alleging that the antidepressant Zoloft caused birth defects in children born to mothers who took the drug during pregnancy, a federal judge in the Eastern District of Pennsylvania excluded the testimony of plaintiffs’ expert, Anick Berard, a Canadian perinatal pharmaco-epidemiologist.2 The order granting defendants’ motion was issued by Judge Cynthia M. Rufe months after a Daubert hearing at which testimony and evidence were presented in support of each side.
Berard has conducted research on the effect of drugs, including antidepressants, on human fetal developments. She opined that Zoloft, when used at therapeutic dose levels during pregnancy, is capable of causing a range of birth defects and, thus, is a teratogen. Defendants did not challenge Berard’s academic qualifications but argued that she used unreliable methods and principles to reach her conclusions that Zoloft may cause birth defects.
The challenge was to the reliability of the expert opinion. The expert was opining on “general” causation, i.e., whether the drug may cause birth defects, and not on “specific” causation, which would ask whether Zoloft caused the particular malformations detected in any particular child. Under the framework of the U.S. Court of Appeals for the Third Circuit (in which Judge Rufe presided), the focus of the court’s inquiry on reliability must be on the expert’s methods, not her conclusions. Experts “must use good grounds to reach their conclusions, but not necessarily the best grounds or unflawed methods.” Nevertheless, where the scientific community considers the evidence to be inconclusive, a difference of opinion “may undermine the reliability of an expert’s conclusion that there is a causal link, and may justify excluding that expert.”3
Berard was asked whether she believes that Zoloft may cause birth defects to exposed mothers, to a reasonable degree of scientific certainty. But to meet the Daubert standard she must demonstrate “good grounds” for her causation opinion, not subjective belief. There must be a reasonable degree of scientific certainty regarding her causation opinion. The Daubert reliability factors are (1) whether the expert’s theory can be tested; (2) whether studies have been subject to peer review and publication; (3) the potential for error in a technique used; and (4) the degree to which a technique or theory is generally accepted in the scientific community.4 The burden was on plaintiffs to demonstrate the requisite reliability.
Zoloft is a prescription antidepressant. It is used to treat depression, anxiety and other mental health conditions. Its active ingredient is sertraline. Zoloft belongs to the class of drugs called selective serotonin reuptake inhibitors (SSRIs). Serotonin is a neuro-transmitter produced by the human body. SSRIs do not contain serotonin. Rather, they alter the availability in the nervous system of the serotonin produced by the body. The Food and Drug Administration (FDA) categorizes Zoloft as a “Pregnancy Category C” drug (one of five categories). Category C means that animal reproduction studies have shown an adverse effect on the (animal) fetus, but there are no adequate and well-controlled studies in humans.
Teratology is the scientific field which deals with the cause and prevention of birth defects. The “gold standard” for epidemiological studies is the “double-blind, randomized control trial.” But there is a huge problem. Such studies may not ethically be conducted on pregnant women. Therefore, epidemiologists must rely upon observational evidence. Epidemiological studies examining the effects of medication taken during pregnancy on birth defects calculate a so-called “relative risk” (RR) or “odds ratio” (OR). These ratios are calculated by dividing the risk or odds of a particular birth defect in children born to medication users by the risk or odds of finding that birth defect in children born without prenatal exposure. Where the incidence of birth defects is about the same in medication-exposed and unexposed women, the RR or OR value will be close to one. In other words, one looks at what is the increase in risk of the outcome “above and beyond the baseline risk.”5
There are certain suspected and measurable “confounding factors” such as maternal age, weight, smoking, alcohol use, folic acid use and others. These may themselves contribute to an increased risk of the particular birth defect. Thus, researchers often statistically control for such factors. When this is done, researchers will report an “adjusted” ratio. Relative risks and odds risks ratios are only estimates. They may be affected by the confounding factors or by biases, sample sizes and study methods. Researchers also use statistical formulas to calculate a “confidence interval.” A 95 percent confidence interval, for example, would mean that there is a 95 percent chance that the “true” ratio falls within the confidence interval range.
Some reports will identify a ratio that is a “statistically significant” correlation or association between the medication exposure and the birth defect at issue. A statistically significant result does not necessarily indicate a large increase in risk. “It simply indicates that the increased risk found is unlikely to result from chance alone.”6 Teratologists will not draw firm conclusions from a single study since apparent associations may reflect methodological flaws. In general, before concluding that there is a “true” association between a medication and an adverse outcome, the teratology community requires “repeated, consistent, statistically significant human epidemiological findings, and studies which address suspected confounders and biases.”
Epidemiological scientists employ a well-established methodology but Dr. Berard, though using it in her published, peer-reviewed work, departed from it in her litigation report here. The methods she advanced were not shown to have been “exposed to critical scientific scrutiny” or to have been “adopted by other scientists in the field.”7 Berard claimed that one can assume teratogenicity based upon multiple weak associations found across many studies. But, an “equally plausible” conclusion is that the “true” association is “so weak that one cannot conclude that the risk is greater than that seen in the general population.” And that, in fact, is the conclusion most researchers in Berard’s field have reached regarding the association between Zoloft and birth defects.
The court relied on the famous Wade-Greaux decision,8 where the court excluded the opinion of a well-respected pediatric pathologist who, like Berard here, urged that repeated, consistent studies showing increased risk of malformations associated with use of a medicine were not needed. The Wade-Greaux court held the expert’s conclusions were not derived from methodology generally accepted by the teratology community. Similarly, Berard’s reliance on non-statistically significant findings has not been shown to be accepted within her scientific community.9
Berard did not address her reasons for relying upon her novel method of examining trends in odds ratios nor did she provide objective, independent validation for her novel method. The expert simply presumed that SSRIs, although distinct from one another in chemical structure and pharmacokinetic properties, will have similar birth-defect-causing effects. But the court disagreed. Even assuming a common biological mechanism that could possibly impact fetal development, such evidence would only give rise to the hypothesis of a class-wide effect. That hypothesis would have to be tested since even small differences in chemical structures can result in different effects.
The court analyzed a large number of studies and concluded that the published literature and studies did not support Berard’s conclusions.10 With respect to the expert’s opinion regarding Zoloft specifically, the court found that Berard “selectively” discussed studies “most supportive of her conclusions” but failed “to account adequately for contrary evidence.” That methodology is not reliable or scientifically sound. A causal conclusion requires examination of the literature as a whole. Her rationale for not doing so was not convincing.
The court said, “cherry-picking” is always a concern, but is of heightened concern in this case. Also of concern was the lack of scientific support for her altered opinion in the selected studies on which she does rely. “An opinion based on subjective belief, rather than grounded in science, is not admissible.”11
The court next discussed the well-established causation factors, the “Bradford-Hill Criteria.” These include the strength of the association between the exposure and the outcome; the temporal relationship between the exposure and the outcome; the dose-response relationship; replication of findings; the biological plausibility of or mechanism for such an association; alternative explanations for the association; the specificity of the association; and the consistency with other scientific knowledge. An expert need not consider or satisfy all these criteria in order to support a causal inference.
The court found that the strength of the associations between exposure to Zoloft and various birth defects “is weak, often not greater than one would expect by chance alone, and replications of statistically significant associations is rare.” Further, with respect to the factor of temporal relationship between exposure and outcome, one study showed the same risk in the absence of a temporal relationship, as well as confounding factors and biological mechanisms. Berard did not address any of them. The other “Bradford-Hill criteria” (dosage effects, biological plausibility, alternative explanations, etc.) likewise presented questions and issues that Berard did not satisfactorily address.12
In summary, said the court, the expert “takes a position in this litigation which is contrary to the opinion she has expressed to her peers in the past, relies upon research which her peers do not recognize as supportive of her litigation opinion, and uses principles and methods which are not recognized by the relevant scientific community and are not subject to scientific verification.” Accordingly, the opinion did not satisfy Federal Evidence Rule 702 and had to be excluded.13
The Zoloft decision provides analytical insights that seem relevant in a host of other litigation scenarios involving other scientific and technical fields. Cases involving asbestos, toxic torts, medical causation, psychology and other areas where hearsay literature and/or published studies abound are all settings where peer review weaknesses or expert “cherry-picking” among articles can possibly distort the accepted science and, possibly, even sneak unreliable opinions past the “gatekeeper.” Vigilance is required.
1. “‘Unreliable’ Articles, ‘Trial By Literature’ Revisited,” New York Law Journal, May 12, 2014, p. 3; “‘Unreliable’ Articles: More on Peer Review’s Frailties,” NYLJ, June 9, 2014, p. 3.
2. In Re Zoloft Products Liability Litigation, 2014 U.S. Dist. LEXIS 87592 (E.D. Pa. June 27, 2014).
3. Id. LEXIS, at *3 and n. 3.
4. Id. LEXIS at *4 (citing Daubert v. Merrell Dow Pharms., 509 U.S. 579, 593-94 (1993)).
5. Id. LEXIS at *7.
6. Id. LEXIS at *9.
7. Id. LEXIS at *15.
8. Wade-Greaux v. Whitehall Lab., 874 F.Supp. 1441 (D. V.I. 1994), aff’d, 46 F. 3d 1120 (3d Cir. 1994).
9. In Re Zoloft, supra n. 2, LEXIS at *17 – *18.
10. Id. LEXIS at *18 – *29.
11. Id. LEXIS at *29 – *34.
12. Id. LEXIS at *34 – *42.
13. Id. LEXIS at *43.