THE FEAR: Shakespeare's Dick the Butcher wanted to take an ax to the profession. Some lawyers fear that predictive coding could eliminate the jobs of lawyers who review documentary evidence.
THE FEAR: Shakespeare’s Dick the Butcher wanted to take an ax to the profession. Some lawyers fear that predictive coding could eliminate the jobs of lawyers who review documentary evidence. (British Library/Robana)

In one of the more debated lines in English literature, Dick the Butcher, a character in Shakespeare’s “Henry VI, Part II,” said, “The first thing we do, let’s kill all the lawyers.” Some argue the line shows Shakespeare’s disdain for lawyers, others that, because Dick was engaged in a nefarious plot, the line is actually commentary about how necessary lawyers are. Fast-forward four centuries, and technology brings a new twist to the necessity for lawyers: The development of assisted review technologies, including predictive coding.

Predictive coding — the use of computer algorithms by lawyers and technologists to classify legal case documents in an iterative (repetitive) process — has triggered debates about the roles of technology in the law and of lawyers in society. Some lawyers see predictive coding as a modern Dick the Butcher — dangerous to the practice of law — as attorneys read headlines like: Will Computers Replace Your Lawyer? Others believe such assisted-review technologies are the only way to deal with the avalanche of digital evidence in an era of big data. Of course, that’s before one gets to the majority of lawyers who have never even heard of predictive coding.

In this article, we look back at where we’ve been, where things stand now and where they may be going — while considering whether predictive coding really is a 21st Century Dick the Butcher for the legal profession.

Controversies over predictive coding often begin with what to call it. Many use the terms “predictive coding” and “technology-assisted review” (TAR) interchangeably, but predictive coding is really but one form of TAR, which has also been called computer-assisted review, machine-assisted review and a whole host of other names. Attempts to trademark predictive coding were unsuccessful, while patent disputes over predictive coding added confusion for just about anyone without a graduate degree in statistics.

As electronic-discovery vendors deb­ated the relative merits and limitations of various patents, predictive coding remained somewhat under the radar. Meanwhile, U.S. Magistrate Judge Andrew Peck in New York wrote in an October 2011 article for NLJ affiliate Law Technology News, “To my knowledge, no reported case (federal or state) has ruled on the use of computer-assisted coding. While anecdotally it appears that some lawyers are using predictive coding technology, it also appears many lawyers (and their clients) are waiting for a judicial decision approving of computer-assisted review.”

Such a reported case would come from Peck himself only a few months later.


In a February 2012 opinion and order in an employment action alleging sex discrimination, Da Silva Moore v. Publicis Groupe S.A., No. 11-CV-1279 (S.D.N.Y. Feb. 24, 2012), Peck wrote, “This judicial opinion now recognizes that computer-assisted review is now an acceptable way to search for relevant [electronically stored information] in appropriate cases.”

In the 2 1/2 years since Da Silva Moore, several courts have addressed the use of predictive coding and other forms of technology-assisted review. Peck’s opinion in Da Silva Moore was followed by Kleen Prods. LLC v. Packaging Corp. of Am., No. 1:11-CV-01279-ALC-AJP (N.D. Ill. filed Sept. 9, 2010); Global Aerospace Inc. v. Landow Aviation L.P., No. CL 61040 (Loudoun Co., Va., Cir. Ct. April 23, 2012); and; more recently, Progressive Cas. Ins. Co. v. Delaney, No. 2:11-cv-00678 (D. Nev. May 19, 2014).

Even before Da Silva Moore, the days of linear review — the process by which lawyers and paralegals review each and every document in an assembly-line process — were numbered, at least in complex commercial litigation. There may be a glut of lawyers, but all the law schools on the planet couldn’t produce enough attorneys to review all the documents being created as corporate data stockpiles increase exponentially.

Abandoning keyword searching has been trickier. Lawyers understand its limitations, and it’s a simple concept. For example, the word, “orange” could refer to a color, a fruit, a county in several states, an erstwhile football stadium or even the Dutch royal family. In Da Silva Moore, the lead plaintiff’s name resulted in 201,179 search hits. Researching hundreds of thousands of documents takes time and costs money and, invariably, many of the documents have nothing to do with the case at hand. Despite these acknowledged limitations, old habits die hard, and it’s not just Luddite lawyers who question the leap into electronic discovery based on artificial intelligence.

“No machine on Earth has the reading comprehension of a human 10-year-old,” said Andrew Kraftsow, chief ­technology officer and chief scientist at Austin-based RenewData Corp. in advocating for his firm’s TAR offering, Language-Based Analytics, which the company markets as harnessing human language knowledge as an alternative to predictive coding. [See "Document Review Without Artificial Intelligence," Page 11.] At the same time, Houston-based BeyondRecogition LLC, says its product addresses the inability of most TAR offerings to handle graphics and images.

However, predictive-coding advocates note the basic premise of machine learning is that lawyers, fact witnesses and experts are key participants in the iterative predictive-coding process and that the idea that computers take the place of lawyers in predictive coding is misguided.

Gareth Evans, a litigation partner at Gibson, Dunn & Crutcher, uses TAR in his practice. He believes technology-assisted review can be a valuable tool, and he’s an advocate for expanding its use in appropriate circumstances. Nevertheless, he’s aware of the reluctance of many litigators.

“Senior lawyers view litigation as an art, and they don’t always recognize the role of new technologies in the art of litigating,” Evans said, adding that some litigators have more specific concerns, including the concessions that opposing parties often seek as a condition for using predictive coding.” [See "Tools Let Attorneys Follow the Breadcrumbs," Page 10.]

In addition, some litigants disagree over predictive-coding protocols. Some courts and the Sedona Conference have called for transparency and cooperation in the process, which might include sharing the seed sets of documents used to train predictive-coding software, but some parties have objected to sharing nonrelevant documents contained in seed sets or other information they feel might fall under the work-product shield. For instance, in In re Biomet M2a Magnum Hip Implant Prods. Liab. Litig., No. 3:12-MD-2391 (N.D. Ind., April 18, 2013), the court declined to force a party to disclose documents used in a seed set, noting that Sedona did not expand the authority of the federal courts. The court added that it was powerless to order discovery beyond that required by the Federal Rules of Civil Procedure.

Nevertheless, in what could be a sign of things to come, U.S. District Judge Robert Miller Jr. wrote that he found the party’s lack of cooperation “troubling” and encouraged the litigant to “rethink its refusal,” warning: “An unexplained lack of cooperation in discovery can lead a court to question why the uncooperative party is hiding something, and such questions can affect the exercise of discretion.”

Just how prevalent is the use of predictive coding? In the Fulbright & Jaworski (now Norton Rose Fulbright) 2013 Annual Litigation Trends Report, 40 percent of corporate respondents in the United States and 23 percent in the United Kingdom indicated they used the technologies. In addition, 21 percent of those surveyed said they used it in 100 percent of their matters.

Many legal observers believe those use rates are probably higher than actual rates in all cases, noting that the report’s definition of TAR included not only predictive coding but also “other data analytics.”

In addition, a survey conducted by Kate Holmes and Mike Kinnaman of the FTI Technology unit of FTI Consulting Inc. and attorney Ari Kaplan, 7 percent of respondents had implemented predictive coding three years ago, 27 percent planned to implement it during 2014, but only 10 percent had plans for 2015. “Analytics and predictive coding may be hitting a tipping point,” the study said.

On the other hand, some industry observers believe predictive-coding use may be underreported, noting the difficulties in gauging actual use.

“Given where we are with pricing models and the variety of uses, trying to determine percentages where predictive coding is used or not used is really an exercise in Old World thinking,” said Drew Lewis, electronic-discovery counsel at Recommind Inc., noting his company no longer charges extra for predictive coding. “It’s now simply part of every e-discovery workflow,” he said.

Michele Lange and Kaitlin Shinkle of Kroll Ontrack Inc. agree with Lewis’ assessment, but they nevertheless estimate their clients’ use at about 25 percent. Kroll and other vendors, including Recommind and Symantec Corp., now offer TAR as an integrated part of their electronic-discovery platforms at no additional charge, which further complicates definitive demarcations.

The law of predictive coding is far from settled and its practice is still evolving, with new debates over review protocols developing during the past few weeks. Overall use may still be limited, and Peck did note in Da Silva Moore that his opinion didn’t mean TAR should be mandatory in all cases. However, predictive-coding jurisprudence has advanced already. “If a responding party wants to use TAR today — even over objection — courts should and will allow it,” Peck said.

In fact, he said, courts may take things a step further. “It’s dangerous to predict where case law will go in future years, but I think we’ll get to a point where if the requesting party wants to force the responding party to use TAR, the court may well order it — or at least use cost recovery as a way to force it.” Peck said. “In the future, judges may say, ‘If you don’t use computer-assisted review, don’t come to me for costs. You should take a very hard look at what you spend and what costs could be avoided.”