Producing electronically stored information (ESI) has unfortunately become one of the most expensive tasks in litigation. Typically, parties comply with obligations to produce ESI by identifying the witnesses most likely to possess relevant documents and files, creating forensic images of their stored ESI from computer hard drives and shared servers, and then using Boolean search terms to identify the electronic documents that may be relevant. Finally, armies of junior associates and contract attorneys review the culled set of documents for responsiveness and privilege. Depending on the number of witnesses and the volume of their files, this process often involves searching hundreds of gigabytes of data, and may cost the producing party tens or even hundreds of thousands of dollars.

Because the costs imposed by this process often outweigh any benefit to the litigants, courts are actively experimenting with ways to limit the scope and expense of e-discovery. For example, the Federal Circuit and the Eastern District of Texas recently issued model discovery orders that seek to reduce e-discovery costs in patent cases by limiting such e-discovery parameters as the number of witnesses whose ESI must be produced, the scope of email production and the number of Boolean search terms that may be used. These model orders are discussed in prior articles in this series.

Two judges in the Southern District of New York recently broke new ground in the battle against e-discovery costs by approving the use of a completely different technology—computer-assisted, predictive document coding. Moore v. Publicis Group SA, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb 24, 2012)(Peck, M.J.); Moore v. Publicis Group SA, 2012 U.S. Dist. Lexis 58724, *5 (S.D.N.Y. April 26, 2012). This process promises to reduce the time and money spent on e-discovery by employing software to review large volumes of electronic documents, make preliminary relevance determinations and drastically reduce the number of potentially relevant documents to be reviewed by human eyes.

Even though it depends on software algorithms, predictive coding still requires attorney involvement at the outset to train the software to recognize relevant documents. The first step in employing predictive coding is for the attorneys who will be producing the documents to manually code a small, random sample of documents for privilege, relevance and key issues. The opposing party’s attorneys then have the opportunity to review the coding and evaluate its accuracy and to determine whether the relevant documents are being properly identified. If the parties cannot agree on the coding parameters, they can enlist the magistrate’s assistance.

After the coding parameters are initially determined, the attorneys responsible for producing the documents upload the sample set and its associated coding to the software, which then identifies similar documents over the entirety of the document set and automatically issue codes them. The responding attorney then pulls 500 documents at random from those that the software has identified as relevant and manually re-codes them for relevance, issues and privilege.

The opposing attorneys are again given a chance to evaluate the results, the re-coded 500 documents are uploaded and the software re-codes the entirety of the document set based on the refined information provided by the second pass through a sample set. The parties repeat this process for a total of seven iterations, with the understanding that the coding parameters will be refined and improved through each cycle, thereby resulting in more accurate identification of the most relevant documents. After the final iteration, assuming a high level of accuracy, the responding attorneys perform a manual review of only those documents identified by the software as likely to be relevant.

Thus, by iteratively reviewing a few thousand documents, the parties eliminate the need to review large volumes of irrelevant documents prior to production. Significantly, the parties in Publicis stipulated to this rather cumbersome procedure, and other parties should be able to use Publicis as authority to support similar regimes appropriate to their specific cases that are more or less rigorous than those described here.

Judge Andrew Peck justified the use of predictive coding in the case by citing studies showing predictive coding is just as effective as comprehensive linear manual review in locating responsive documents. In particular, one study concluded that “the myth that exhaustive manual review is the most effective—and therefore the most defensible—approach to document review is strongly refuted. Technology-assisted review can (and does) yield more accurate results than exhaustive manual review.” That same study noted that technology-assisted reviews require attorney review of only approximately 2 percent of all documents.

Judge Peck also noted problems inherent with the use of keywords to cull non-responsive documents. For example, the parties usually do not know which keywords will appear in the most relevant documents, and search terms may result in an over-inclusive set of documents, forcing the responding attorneys to manually review a large quantity of non-responsive documents. Additionally, at least one study demonstrates that keyword searching is only able to identify approximately 20 percent of responsive documents, which undermines its efficacy as a discovery tool.

Judge Peck believed that computer-assisted review was the best method of review for the Publicis case because of:

  • The vast amount of ESI to be reviewed—3 million documents
  • The superiority of predictive coding to linear manual review or keyword searching
  • The goal that discovery be completed in a cost effective manner and that the burdens imposed be proportional to the likely benefits to the litigants

The jury is still out, however, on the effectiveness of this approach, since the Publicis case has not advanced far enough to permit the court to review the efficacy of the predictive coding process employed in that case.

In combination with the other court-approved strategies, parties now have several available options to limit costs in responding to e-discovery requests. The possibilities range from strategies as simple as limiting the number of document custodians and search terms to those as complex as the predictive coding process described in Publicis.

With the volume of e-discovery and the technologies available to address it advancing at a rapid clip, courts are clearly open to creative solutions to provide litigants the necessary discovery while reducing the sometimes exorbitant costs. Parties are well-advised to propose creative methods to deal with e-discovery costs in every complex case.