The hottest development in the electronically stored information (ESI) world is predictive coding. It is a promising tool that, in non-technical terms, can automatically tag or designate documents. The process is fairly complicated. Using an array of mathematic algorithms, a predictive-coding tool analyzes the language used in a document and “predicts” how it may be coded by a live person. An attorney must manually review a small subset of the total amount of documents potentially involved in the litigation. In theory, as the attorney codes or “tags” documents as relevant, the predictive coding software is “trained” as to the characteristics that make a document relevant and is able to review and tag the entire set of documents itself. 

Predictive coding has been the buzz of ESI conferences over the last few years, but no judge or court had officially signed off on it as a defensible way to review documents.

This finally happened a few weeks ago when Magistrate Judge Andrew J. Peck, a frequent speaker at ESI conferences and author of articles on the potential use of predictive coding in large data cases, heard arguments in Da Silva Moore v. Publicis Group, No. 11-CV-1279 (S.D.N.Y.). The case involved the potential review of 2.5 million documents. The parties were asked to submit an e-discovery protocol, which included predictive coding, as early as Feb. 16, 2012.

Last week, as anticipated, Peck issued an opinion detailing his reasons for using predictive coding.Da Silva Moore v. Publicis Group, No. 11-CV-1279 (S.D.N.Y. Feb. 25, 2012). In making this decision, he emphasized that manual document reviews are still the “gold standard,” but “computerized searches are at least as accurate, if not more so, than manual review.” Even keyword searches, in Peck’s opinion, are the equivalent of a game of “Go Fish” because the “requesting party guesses which keywords might produce evidence to support its case without having much, if any, knowledge of the responding party’s ‘cards’ (i.e., the terminology used by the responding party’s custodians).”

Peck listed five considerations when approving the use of predictive coding:

  1. The parties’ agreement
  2. The vast amount of ESI to be reviewed (over 3 million documents)
  3. The superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches)
  4. The need for cost effectiveness and proportionality under Rule 26(b)(2)(C)
  5. The transparent process proposed by [the defendant]

The last point is especially important. Peck stressed that he was allowing predictive coding, in part, because the defendants were willing to be transparent with the plaintiff in implementing their proposed ESI protocol. This is a another great example of how transparency can be a powerful tool to save ESI costs and re-focus the case back onto substantive issues.

Still, Peck recognized that predictive coding was not appropriate in every case. He suggested that predictive coding is best used for large-scale matters or “big data” cases, where a larger seed set can be developed and statistical sampling can be maximized. 

So what does this mean? Savvy trial counsel must be aware of the potential use of predictive coding. The courts do not demand perfection in collecting, reviewing and producing ESI. More than anything, they want parties to act reasonably and show that they are serious in complying with their ESI obligations. Predictive coding may not be a panacea for all of your ESI woes, but it can be a great tool for big data cases where it’s vital to reduce the time attorneys spend reviewing and tagging documents.