(Image: Clipart)

Probabilistic reasoning and statistical inference have become centerpieces of many civil and criminal cases. Big Data will only accelerate the trend. Knowing how to navigate probabilistic reasoning can improve a lawyer’s ability to use it in trial or mount a defense against poor information.

A good way to start is with an introduction to the concept of statistical significance. It’s the least understood and most easily exploitable concept in learning from data.

The basic idea behind statistics is a simple one: A bunch of more or less similar things exist, whether industrial pumps in a refinery or lung cancer patients taking an experimental drug. But they’re not identical (e.g. some have longer work lives or poorer responses). What statement can a lawyer make about them, collectively, that will help the client’s case?

Think about it this way: What’s one number that succinctly expresses an attribute of interest about the items? That number is called the central tendency. It’s usually calculated as a mean (an average), although median (the middle number in a list of values)and mode (the number that appears most often in a set of numbers) are common measures.

The lawyer wants to know how the individual pumps’ useful work lives or the patients’ responses to the drug tend to vary from this central tendency. Do the pumps all tend to fail within a few weeks of each other, or do some run for a decade while others need replacing after a couple of years? Such information increasingly serves as the basis of models from which a manufacturer can make predictions about maintenance, replacement and cost. It’s also a model against which the manufacturer can compare a future shipment of pumps.

Now imagine that a client has a shipment of pumps that, on average, failed sooner than expected. Does that mean something is wrong with the pumps? More importantly, can the manufacturer or an opposing lawyer prove it? The answer depends on how unlikely it is that this presumably random sample of pumps should contain pumps that mostly failed on the early side of their expected working life.

The concept of statistical significance comes from the attempt to answer the question: “Was this event so unlikely that it raises more than just a suspicion?” The usual answer is, “Yes, if it was statistically significant.”

But there’s a problem. Most people who give that answer don’t understand that statistical significance rests on the curious assumption that a shipment of pumps is just one of an infinite number of pump shipments and the belief that people won’t exploit the high risk of false positive results inherent in statistical significance testing. What could possibly go wrong?

Nitty-Gritty Numbers

Ninety years ago, British statistician R. A. Fisher quite arbitrarily proposed that a statistic was significant (i.e. so unlikely to result from chance that it could form the basis of sound decision-making) if the probability that it occurred by chance was one in 20 or less (typically encountered as p ≤ 0.05). Soon, and despite many early warnings, statistical significance became the imprimatur of a genuine scientific discovery. Unfortunately it’s anything but that.

Consider the following bet: If I roll a one on a fair die with 20 sides marked 1- 20, you pay me $100. But if I roll 2 – 20, I pay you $100. You’d take that bet because I’d lose 19 times in 20.

But what if I got more than one chance to roll a one? The chance of me not rolling a one twice is 19/20 x 19/20 or 90.25 percent; it’s still a good bet for you. However, if you gave me 14 chances to roll a one ((19/20) x (19/20) x (19/20) etc.) the odds I wouldn’t roll a one drop below 50 percent. If you let me roll the die 100 times, there’s a 99.4 percent chance that I’ll get a one. See where this is leading?

Let’s say researchers run a clinical trial on a drug that they hope will reduce the risk of death in people suffering from heart failure. They collect data about the body mass index, blood pressure, triglycerides, HDL cholesterol and LDL cholesterol for each patient in the clinical trial and assign one of three possible ratings to each patient’s potential risk factors: low, medium and high. That means there are 243 (3x3x3x3x3) possible combinations of BMI, blood pressure, triglycerides, HDL and LDL.

That means those researchers get to roll the die 243 times. With enough patients in such a clinical trial, it’s almost impossible not to find an effect. Indeed there’s a 99.9996 percent chance of finding a statistically significant effect purely by chance.

This business of rolling a die again and again until “rare” becomes “likely” has a name: the multiple comparison problem. It’s just one reason why “statistically significant” doesn’t mean what people think it means.

Gaming the system is known as “p-hacking.” That’s because what the researcher is doing is, in effect, rolling the die until reaching the magic number p ≤ 0.05.

Thanks to a vast increase in computing power, it’s possible to calculate in an instant not only the p-value of the measured effect but also how many more positive-effect examples must occur for the discovery to be statistically significant (or how many disappointing results need to be rationalized away).

The good news is that researchers have ways to detect p-hacking. One is to see what sort of p-values are being calculated and ask, “Are all these p-values right around the magic number of p ≤ 0.05 such a rare event as to raise the suspicion of p-hacking?” Another is to check all the data, including everything unpublished. If the study shows the miracle heart drug works wonders in overweight, high LDL, low HDL, low cholesterol patients, what did it do for the other 242 types of patients?

Statistically significant doesn’t mean “important,” “large” or “unusual,” and it certainly doesn’t mean “scientifically verified.” Yet science remains educated guesswork: a process of tracking back and forth toward the truth. Sound statistical analysis makes for the best guesses, and more data means more guesses.

If lawyers are going to sail in those waters, they’re going to have to learn how to navigate.

David A. Oliver is a partner in Vorys Sater Seymour and Pease in Houston. He is board-certified in personal-injury trial law by the Texas Board of Legal Specialization. He litigates allegations of injuries due to exposure to chemicals or pharmaceuticals. He is the editor of the blog Mass Torts: State of the Art.