A common dilemma faced by general counsel when their companies are confronted with litigation is handling the volume of electronically stored information (ESI), particularly the associated costs of collecting, reviewing and producing relevant ESI. Frequently this cost dominates the e-discovery process.

Although different ways to manage ESI exist, the default approach has typically been the time-honored manual review and coding of each individual document by the outside attorneys handling the litigation or contract attorneys retained for the purpose of at least conducting an initial screening.

Regardless of how the project is handled, the manual review of large volumes of ESI is a time-consuming and expensive process, especially in complex litigation where a typical document production can average up to multiple millions of pages—if not documents.

As companies struggle to manage the increasing burdens associated with reviewing large quantities of ESI, e-discovery vendors have developed a range of technology options designed to minimize these burdens. One such option is predictive coding.

Predictive coding is a software-based process designed to simplify the organization, coding and prioritization of entire sets of ESI according to their relation to outstanding discovery requests, privilege status and specific issues important to the case. This process can be used in the review process for document production, as well as documents produced by opposing counsel.  The underlying goal of predictive coding is to permit an accurate, cost-effective and expedited document review.

The basic approach behind predictive coding is to have lawyers familiar with the case specify relevant criteria using a number of methods, such as keyword searches, key concept searches, Boolean searches, relevance ranking, clustering and/or reviewing relevant samples of the ESI. The software then uses an iterative search process and algorithms to automatically classify and retrieve a set of documents based on the criteria provided by the lawyers. The results are then measured for accuracy. This iterative process can be repeated multiple times if necessary to effectively “train” the computer to predict the relevancy of the remaining documents and eliminate any irrelevant documents.  Relevant documents can then be grouped in categories for purposes of prioritization and selection for review.

While predictive coding requires some manual review, the primary difference from a full manual review is that predictive coding requires significantly less time since only a small subset of documents are evaluated instead of the entire collection.

Despite its promise, predictive coding has not yet been adopted on a large scale.  There are a number of reasons for this:  

  1. The natural concern with turning over a traditional manual process conducted by attorneys to an essentially automated process that uses complex computer algorithms is not easily understood or explainable by anyone other than a computer scientist.

  2. Predictive coding does not fully eliminate the potential for human error.  The iterative process is only as good as the input criteria, and a missed keyword or key concept could lead to entire categories of relevant or privileged documents being inadvertently missed.

Although legitimate arguments could be made that this potential for human error is equally present in a purely manual review, there is at least a perception that a full manual review allows for a greater degree of understanding and control over the process by the reviewing attorneys, as opposed to what many currently perceive as a “black box” process. These two concerns naturally lead to a more important pair of concerns—defensibility if challenged in court, and the protection of privileged documents.  Federal Rule of Civil Procedure 26 imposes a reasonable inquiry requirement in connection with document searches and production.

Thus, there is an open question whether predictive coding is sufficiently rigorous and transparent that a court would be satisfied that its use met this reasonableness standard, particularly in view of Federal Rule of Civil Procedure 37(a)(4), which provides that “an evasive or incomplete disclosure, answer or response must be treated as a failure to disclose, answer, or respond.”  

Similarly, Federal Rule of Evidence 502 governs inadvertent disclosures of material that would otherwise be protected by attorney-client privilege. (Under this rule, the disclosure of information does not waive the attorney-client privilege so long as the disclosure is inadvertent, the holder of the privilege or protection took reasonable steps to prevent disclosure and the holder promptly took reasonable steps to rectify the error.)

Thus, there also is an open question whether use of predictive coding meets the requirement that reasonable steps to protect the privilege and rectify any error were taken.

Whether the technology underlying predictive coding can become sufficiently transparent and accurate to address these concerns remains to be seen. In the meantime, use of predictive coding is certainly an option to explore, either alone or in combination with a manual review, in the never-ending battle to manage the burden and cost of dealing with large volumes of ESI. In the right situation it just might be the right tool to streamline the e-discovery process.