There have been two important decisions within the last month concerning courts’ acceptance of a new technology that may help reduce the costs associated with, and increase the efficiency of, processing and reviewing electronically stored information (ESI) in e-discovery. The technology, known as “predictive coding,” allows attorneys to select a sample set, or “seed set,” of relevant and irrelevant documents, which are then coded as the baseline for the automated collection and review of documents. Using the seed set, the computer is able to quite literally learn what the criteria for relevance and irrelevance are, and automatically review data based on that knowledge, thereby potentially eliminating a significant portion of the burden and costs associated with the process of having attorneys perform the initial first-tier review—the stage at which the majority of wasteful e-discovery costs are incurred.1

As discussed below, the U.S. District Court for the Southern District of New York in Da Silva Moore v. Publicis Groupe, Case No. 1:11-cv-01279 (S.D.N.Y. April 26, 2012), and the Circuit Court for Loudon County, Va., in Global Aerospace v. Landow Aviation, Case No. CL 61040 (Va. Cir. Ct. Loudon Co. April 23, 2012), have endorsed the use of predictive coding as a viable and economical alternative to the traditional method of keyword searching and manual review of documents.

The debate over the use of predictive coding touches upon fundamental concerns regarding the costs and burdens associated with discovery. In recent years, e-discovery, particularly in complex cases, has become a threat to our legal system’s ability to see that justice is done in resolving disputes. Not only is e-discovery, as currently done, extraordinarily expensive, but it is also wasteful and inefficient. In complex cases, hundreds of thousands of dollars, and often millions of dollars, are required to search through irrelevant documents looking for the minority of relevant documents. In some cases, only 10 percent of the documents selected by search terms are relevant. Yet, they must be reviewed by lawyers to determine their lack of relevancy. This search for relevant documents often follows seemingly endless negotiations over search terms and custodians. The massive costs of discovery may drive litigants to settle for much less than they are entitled, or, conversely, to pay much more than a case merits. It may also cause parties not to pursue legal remedies at all. In these situations, justice on the merits has been denied by the cost of discovery.

In addition to predictive coding, there are other methodologies aimed at improving the efficiency and minimizing the burden of document collection and review, including the use of iterative search term analysis and clustering of documents by concept, but these methodologies are time-consuming and expensive themselves, and do not alleviate the burdens of manual review. The concept of proportionality as found in Rules 1 and 26(b)(2) of the Federal Rules of Civil Procedure is underused by attorneys and may be helpful in limiting discovery in appropriate cases. A few courts, which recently have confronted e-discovery issues, have in large part indicated approval of predictive coding, which may begin to alleviate the concerns associated with the current e-discovery model.

‘Da Silva Moore’

On April 26, 2012, Judge Andrew L. Carter of the U.S. District Court for the Southern District of New York affirmed Magistrate Judge Andrew J. Peck’s decision to allow the use of predictive coding in Da Silva Moore. Prior to this decision, Peck (who has criticized “the ‘Go Fish’ model of keyword search” without “sufficient testing and quality control”)2 had been the only federal judge to issue a decision regarding the use of predictive coding, which he believes “should be used in those cases where it will help ‘secure the just, speedy, and inexpensive’ determination of cases in our e-discovery world.”3

In Da Silva Moore, the parties agreed to use predictive coding in connection with their collection and production of ESI, but the plaintiff took issue with the reliability of the method by which the defendant planned to use the technology. The plaintiff argued, among other things, that the defendant’s predictive coding method did not include a standard of relevance mutually agreed upon by the parties. Yet, the plaintiff was privy to the coding decisions regarding relevance that the defendant had input into the computer, and thus the plaintiff had full access to the defendant’s seed set to monitor the technology’s reliability.4

Enforcing the parties’ agreement, Peck “recognize[d] that computer-assisted-review is not a magic, Staples-Easy-Button, solution appropriate for all cases,” but also emphasized that predictive coding “is not a case of machine replacing humans.”5 Peck further noted that, while predictive coding “is not perfect, the Federal Rules of Civil Procedure do not require perfection,” and concluded that “the use of predictive coding was appropriate considering: (1) the parties’ agreement [to use predictive coding], (2) the vast amount of ESI to be reviewed (more than three million documents), (3) the superiority of computer-assisted review [compared] to the available alternatives (i.e., linear manual review or keyword searches), (4) the need for cost effectiveness and proportionality under Rule 26(b)(2)(C), and (5) the transparent process [for conducting predictive coding] proposed by [the defendant].”6

Peck cautioned, however, that, when evaluating the utility of predictive coding in a particular case, parties should not consider cost “in isolation from the results of the predictive coding process and the amount at issue in the litigation.”7

Affirming Peck’s decision, Carter explained that “[t]here is simply no review tool that guarantees perfection,” and “there are risks inherent in any method of reviewing electronic documents.”8 Carter further noted that manual review, while costly, may be appropriate in certain situations, but such a process is also “prone to human error and marred with inconsistencies from the various attorneys’ determination of whether a document is responsive.”9

‘Global Aerospace’

Three days earlier, on April 23, 2012, the Circuit Court for Loudon County, Va., issued what appears to be the first court order mandating the use of predictive coding even where the parties were not in agreement on its use. Unlike in Da Silva Moore (in which the parties’ agreed to use predictive coding), the parties in Global Aerospace contested the use of technology-assisted review. After the defendant requested that the court order the parties to use predictive coding or, in the alternative, require the plaintiff to reimburse the defendant for the costs it would incur in connection with a traditional form of manual review, the court ordered, over the plaintiff’s objection, that predictive coding be employed.

Also noteworthy is Kleen Products v. Packaging Corporation of America, Case No. 10 C 5711 (N.D. Ill.), a case pending in the U.S. District Court for the Northern District of Illinois. In Kleen, the parties are currently embroiled in a dispute as to whether the defendants should have employed predictive coding, rather than keyword searches and manual review, in their ESI collection and review process. The plaintiffs argue that, given the complexity of the issues in their antitrust case, the keyword search strategy employed by the defendants to find responsive documents was deficient because, among other things, it was incapable of identifying variations of key concepts relevant to the case.

Such an argument may resonate with practitioners who have faced the tedious and time-consuming task of selecting (and agreeing upon) effective search terms that will not only reliably gather all or most of the relevant ESI available, but also eliminate large swaths of wholly irrelevant material. For example, a traditional keyword search for a description of a product that is often referred to as “quick and easy” will not only return documents related to that product, but also random e-mails using that phrase on a wide variety of totally irrelevant and possibly personal topics. Predictive coding may help reduce the likelihood of attorneys having to spend large amounts of time (at massive cost to the client) reviewing irrelevant information, as the technology would be more discerning in its responsiveness analysis, analyzing words based on surrounding contextual clues, rather than just the word itself.

The defendants in Kleen (echoing arguments made by skeptics of predictive coding) have countered that the keyword search method by which they collected and reviewed documents is “what courts regularly endorse in commercial litigation,” while the plaintiffs’ proposed content-based search method has not been ordered in any case where the traditional search-term method of ESI review has been employed.10 However, as Peck recently pointed out, “judicial decisions…[have been] highly critical of the keywords used by the parties,” and, contrary to the view of many practitioners, it is not necessarily true that the “judiciary has signed off on keywords, but has not on computer-assisted coding.”11 To date, no final ruling on the use of predictive coding has been issued by the court in Kleen.

Embracing the Technology

Given the difficulties in formulating and implementing effective and efficient search terms and reliable ESI collection parameters, some practitioners are beginning to embrace predictive coding as a cost-reducing, efficiency-improving e-discovery tool. Certainly there are attorneys and judges that believe that no rock is too remote to leave unturned and that the cost is irrelevant. They therefore conclude that allowing this new technology to remove categories of documents from human review runs the risk that relevant documents will be missed.

However, no matter what search system is used, documents will be missed, errors will be made, and irrelevant and/or privileged information may be inadvertently produced to the opposing side. Just as predictive coding may be susceptible to the occasional missed document or error, so too are human reviewers employing the traditional keyword search and manual review methodology. Simply put, there is an inherent risk in all discovery that not every document will be found, information will be missed, and documents will be mistakenly produced, regardless of the method of document collection and review employed.

The concern over high discovery costs has been long recognized by courts and practitioners. The Supreme Court in Twombly12 recognized a growing concern over high discovery costs, and this was later echoed by Seventh Circuit Judge Richard Posner in a comparable context in Stark Trading v. Falconbridge, 552 F.3d 568, 574 (7th Cir. 2009), where he emphasized that “[d]efendants [ought] not to be subjected to the costs of pretrial discovery in a case in which those costs…are likely to be great, unless the complaint makes sense.” The problem of the cost of discovery was also cited in the legislative history of the Private Securities Litigation Reform Act’s automatic stay of discovery, which was implemented as a way to control burgeoning and coercive discovery costs in private securities cases.13

And the Southern District of New York recently implemented a pilot program regarding case management techniques in complex civil cases, pursuant to which the court, among other things, will evaluate the complexity of the case and “make a proportionality assessment and limit the scope of discovery as it deems appropriate.”14 Additionally, Judge John Koeltl of the Southern District of New York is serving as chair of a subcommittee of the Civil Rules Advisory Committee, which is examining possible modifications to the Federal Rules of Civil Procedure relating to discovery with the goal of increasing efficiency and reducing costs.

Under the current model, wasteful costs incurred in e-discovery will continue to burden the litigant and partially be passed on to the consumer. As another negative result, there is a generation of junior litigators who are spending far too much of their time reviewing irrelevant documents, rather than learning the real skills of this great profession. The use of predictive coding or other new technology—particularly as it is refined, tested and improved—offers the hope that litigators can get back to focusing on the preparation of their cases, young lawyers can be trained in useful skills, and justice can be done on the merits, rather than cases being resolved on cost considerations. The recent decisions in Da Silva Moore and Global Aerospace suggest that practitioners should consider new e-discovery technology and the pros and cons of its use in their own cases, as it appears that predictive coding is gaining acceptance.

Gregory A. Markel is a partner at Cadwalader, Wickersham & Taft, and co-chairman of the firm’s litigation department. Erika B. Engelson is special counsel to the firm. Gregory D. Beaman, an associate at the firm, assisted in the preparation of this article.


1. See Ari Kaplan, Advice from Counsel: An Inside Look at Streamlining E-Discovery Programs, 1, 8, FTI Consulting Technology LLC (2012) (FTI Survey).

2. Andrew J. Peck, “Search, Forward: Will Manual Document Review and Keyword Searches Be Replaced by Computer-Assisted Coding?” 25, 29, L. Tech. News (2011) (Peck Article).

3. Id.

4. Da Silva Moore v. Publicis Groupe, 2012 WL 607412, *8 (S.D.N.Y. Feb. 23, 20120).

5. Id. at *9.

6. Id.

7. Id. at *12.

8. Case No. 1:11-cv-01279, Dkt. #175 at 4.

9. Id.

10. Case No. 10 C 5711, Dkt. #293 at 2-3.

11. Peck Article.

12. Bell Atlantic v. Twombly, 550 U.S. 544 (2007).

13. P.L. 104-67, PSLRA Legislative History Report at 14.

14. Report of the Judicial Improvements Committee Pilot Project Regarding Case Management Techniques for Complex Civil Cases at 2 (2011).