When it comes to litigation, the e-discovery tail is wagging the litigation dog. More than wagging, the tail is throwing the dog around like a judo master. Discovery costs represent a greater percentage of litigation budgets than anyone foresaw 20 years ago, greater than any other litigation cost, by far. The increase in budgets is directly related to the overwhelming volume of information that corporations create. We have gone from the megabyte era to the gigabyte era, skipped the terabyte era, and we’re now firmly in the petabyte era.
Our ability to put human eyes on all documents that need to be reviewed is limited by petabyte math: number of available reviewers multiplied by the number of available hours multiplied by the number of documents per hour per reviewer. But the numbers don’t add up. There are just too many documents, and we’ve reached the mathematical limit of human review. More than reached it, we’ve blown past it with nary a rearward glance. To give one recent example, a company under investigation disclosed over 300 million emails to its regulator. If every employee of the regulator—from the director down to the janitor—did nothing but review emails, every day, all day, it would take them more than four years to read them.
Enter the mathematicians. Sophisticated algorithms are able to “understand” the contents of a document, perform “find more like me” searches and then suggest documents to be reviewed. Humans look at the suggestions, and feed their decisions back into the algorithm, which then suggests more documents. This iterative approach produces extraordinary efficiencies in both number of documents that have to be reviewed and in the accuracy of the review.
This workflow is called predictive coding. Independent tests have shown predictive coding can locate a high percentage of sought-after documents after reviewing, in one case, less than 1 percent of the total collection. Predictive coding is revolutionizing the e-discovery industry, slowly.
Enter the courts. Predictive coding was introduced years ago; it’s taken the courts time to catch up. After years of use by early adopters even absent judicial imprimatur, predictive coding has recently been the subject of three court cases. In two of the cases, courts have approved the parties’ use of predictive coding: in one case, the parties agreed on the principle but not the implementation; in the other, the court allowed its use in the face of one party objecting.
The third is the most interesting. The question is not whether predictive coding can be used, the question is whether it must be used. After years of waiting for judicial acceptance, the legal community has jumped directly to requesting judicial mandate. This is the Kleen Products case, which is in front of Magistrate Judge Nan Nolan in Chicago. Judge Nolan knows her e-discovery; she’s one of the leaders of the 7th Circuit’s E-discovery Pilot Program.
The lawsuit—a run-of-the-mill antitrust action—was proceeding normally until the defendants turned over their documents. The plaintiffs objected to the production, claiming that because their opponents didn’t use predictive coding (which they described as “content based advanced analytics”) and didn’t take advantages of the enhanced accuracy and efficiency, that the production was faulty. They are requesting the court to require a re-review, this time using predictive coding.
The legal requirement is that a party has to produce relevant documents. The search for and review of documents that are potentially relevant must be reasonable. The defendants used a keyword search to find relevant documents.
The problem is that keyword search, especially used on a collection of unknown documents, is notoriously inaccurate and imprecise. It is also the current “gold standard.” Because the producing party used the current gold standard, that should have been the end of the story. But the Judge has held two days of hearings, complete with expert testimony, about what kind of advanced algorithmic review tools are available, and what it means for e-discovery.
If Judge Nolan rules that the current “gold standard” is actually a “fools gold standard,” it will have a dramatic impact on the way litigants review documents, but not a negative impact. Greater efficiency and greater accuracy are good things. Using advanced technology will drive the cost of discovery down, and will bend the mathematical limit under our capability. It will also create some ethical issues: can attorneys charge clients for time spent doing keyword searches? Is failure to use predictive coding malpractice?
These and other questions are the consequence of one potential ruling in the Kleen case. I don’t know if the world is quite ready for this: after all, it’s only the first time these issues have been raised in court. One thing I know for sure: it won’t be the last time.