The most expensive part of the discovery process in litigation is document review. In a case involving millions of documents, armies of law firm associates or contract attorneys must spend weeks (if not months) reviewing documents prior to producing them to the adversary. As one might expect, the cost of such human review can be exorbitant. And even with proper training and the use of experienced attorneys, the rate of error is oftentimes far too high, meaning that nonresponsive documents are produced, responsive documents are not produced, or, worst of all, privileged documents are produced. Simply put, large-scale document reviews are fraught with problems associated with human cost and error.

One potential solution to these problems is predictive coding. Predictive coding can code millions of documents with a minimal amount of human input, and some argue with greater accuracy than human review. As such, it has the potential to save clients significant expense that would otherwise have to be devoted to human review. It is, however, not a perfect solution, and while it is being used more frequently in litigation, it has not yet gained widespread acceptance in the courts.

What is Predictive Coding?

Predictive coding is an iterative process that uses algorithms to predict how a human reviewer would code a document. It has its origins in the intelligence community. The National Security Agency has used the technology for years to monitor correspondence that contain certain patterns, filtering out for human review those communications it deems to be potential threats. Only recently has the technology been adapted for use in the legal world.

The process begins with a human reviewer, most likely a higher-level law firm associate with case expertise, who codes a “feeder set” for responsiveness. The feeder set consists of a few hundred sample documents chosen at random. In doing so, the human reviewer is essentially training the technology to code documents.

After the reviewer has reviewed those documents, they are plugged into the predictive coding technology, which generates a set of about 1,200 to 1,500 documents coded with one of three choices based on how the human reviewer coded the feeder set: responsive, nonresponsive or undetermined. The human reviewer is then given a second set of documents chosen at random, including those that have already been coded, to review and code. That set is then plugged back into the technology, which generates another set of coded documents. The process can then be repeated at will until the desired degree of accuracy is reached. “The more documents the human reviewer codes, the greater your accuracy will be with predictive coding,” said Samuel Morgan of eTERA Consulting. According to Morgan, for a set of approximately one million documents, three to four rounds of human coding, totaling 2,500 to 3,000 documents, can result in a 94 percent accuracy rate or higher.

The total amount of time spent between human review and running the predictive coding technology is about one to two weeks for a million-page set of documents. The only billable time is that which can be attributed to a small team of human reviewers. This stands in stark contrast to the one month or so it would take for a team of associates or contract attorneys to review the same set of documents.

Clearly predictive coding represents a major departure from the traditional, or linear, human document review, or even reviews that use search terms or other technology to help with the process. But is it right for every document review?

Pros of Predictive Coding

• Inexpensive.

With predictive coding, one no longer needs the armies of associates or contract attorneys to review each and every document, meaning thousands of billable hours will be cut out of the process. Now an entire review can be performed at a fraction of the cost. The client only needs to pay for the minimal amount of associate time spent coding feeder sets and the cost of setting up and maintaining the predictive coding technology.

• Fast.

Even with the armies of associates and contract attorneys that one used to be required to use to review documents, the process, with its multiple levels of review, could still take weeks or months. Also, the speed of the review depends not only on the ability of the human reviewers, but also the complexity and length of the documents. After all, one-page emails are much easier to go through than 300-page employee manuals. As with human review, the speed with which a predictive coding technology goes through a set of documents depends on the size of those documents, but the upswing is that the technology can go through them in a fraction of the time required for human review. All told, a review that would otherwise take months can be cut down to weeks (or even days) using predictive coding.

• Accurate.

Although the accuracy of predictive coding versus human review is still subject to debate, several studies have shown that it is in fact more accurate than human review. According to Morgan, studies of human document reviews have consistently shown an error rate between 20 and 30 percent. “Using only one feeder set, you can achieve a high percentage of accuracy with predictive coding,” Morgan said, “which will be better than most traditional document reviews.” Moreover, the more feeder sets one codes, the more accurate coding you’ll get from the technology. “If you code enough feeder sets, eventually you’ll get 100 percent accuracy.”

Obviously, the more human review you devote to the process, the more expensive it becomes, and so achieving 100 percent accuracy may not always be optimal. One potential alternative is to negotiate an acceptable rate of accuracy with your adversary (who might also be using predictive coding) that would apply to both parties.

Cons of Predictive Coding

• Human error.

“Ultimately, a document review run through a predictive coding technology is only as accurate as the attorney that reviewed the feeder set,” Morgan said. In other words, if the human reviewer makes errors with the feeder set, the technology will make those same errors with the remainder of the documents. To avoid this problem, when using predictive coding, it is advisable for a more experienced attorney, perhaps a senior associate or junior partner, to review the feeder sets. Obviously the hourly rate for such a review is higher than that of a junior associate or contract attorney, but the overall cost is still much lower because the total billable hours spent by that senior attorney are but a fraction of what would be spent on a traditional review.

• Privilege.

Predictive coding is not as reliable for reviewing documents for privilege. Ultimately, no technology has been created yet that can replace human privilege review. This means that for most reviews, at least a subset of the documents to be produced must also be reviewed by either associates or contract attorneys for privilege. It is possible to segregate a certain portion of the documents based on search parameters, but that is by no means foolproof. Alternatively, attorneys can review only those documents that have been coded responsive. Either way, the amount of documents subject to human review is still a fraction of what would be reviewed normally.

How Courts Have Ruled So Far

At this point, there are very few decisions dealing with the use of predictive coding. However, those few decisions tend to support the use of predictive coding in instances where the use of human review would be unduly burdensome.

The first decision on predictive coding was Da Silva Moore v. Publicis Groupe in the Southern District of New York. U.S. Magistrate Judge Andrew Peck, who is a vocal proponent of the use of predictive coding, held that the use of predictive coding was appropriate considering: “(1) the parties’ agreement [to use predictive coding], (2) the vast amount of ESI to be reviewed (over three million documents), (3) the superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches), (4) the need for cost effectiveness and proportionality under Rule 26(b)(2)(C), and (5) the transparent process proposed by MSL.” However, Peck also noted that predictive coding is not a one-size-fits-all approach and that it may not be appropriate in other cases.

About two months after the Da Silva Moore decision, a judge in Loudoun County, Va., approved the use of predictive coding by the defendant in Global Aerospace v. Landow Aviation , which concerns the collapse of several jet hangers at the Dulles Jet Center as a result of a 2010 snowstorm. That decision is significant because, unlike the Da Silva Moore case, where the parties had agreed to the use of predictive coding but had disagreed as to the parameters, the plaintiff in Global Aerospace opposed the use of predictive coding altogether, instead requesting that the defendant use human review.

The only instance in which a court has not approved predictive coding was where one party sought to force the other party to use the technology. In Kleen Products v. Packaging Corp. of America , U.S. Magistrate Judge Nan R. Nolan of the Northern District of Illinois declined to grant the plaintiff’s request that the defendant redo the entire production. In addition to the fact that the defendant had already produced one million pages of documents using alternative technology, Nolan also cited to Sedona Principle 6, which states that “responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.”

Can (and Should) You Use Predictive Coding?

Whether one can use predictive coding in a particular document production depends on the circumstances. The decisions cited above have allowed for the use of predictive coding in cases involving millions of documents, but it is unclear if courts would be amenable to its use in smaller-scale productions. The only definitive lesson to take away from those decisions is that a litigant cannot force its adversary to use predictive coding, or any other type of technology, without first proving that the technology used up until that point has failed to yield accurate results. Ultimately, a party can choose whatever technology it sees fit, but it must then show that the results are accurate and the production is complete.

Moreover, just because a party can use predictive coding does not mean it should in all cases. It may be better to use search terms or human review in cases involving fewer documents. Whether one technology should be used over another depends on the particular circumstances in each case. Litigants should consult their e-discovery counsel or e-discovery vendor to explore all available options. With that said, for large-scale document reviews, predictive coding has the potential to be a game-changer, allowing parties to resolve their disputes at a fraction of the cost and time required for human review. •

Aaron L. Peskin is an associate in Obermayer Rebmann Maxwell & Hippel’s litigation department and e-discovery practice group. He resides in the firm’s Cherry Hill, N.J., office where he concentrates his practice on complex commercial litigation. He can be reached at aaron.peskin@obermayer.com.