More than a year ago, Magistrate Judge Andrew Peck of the U.S. District Court for the Southern District of New York issued the first judicial opinion recognizing technology-assisted review (TAR), also referred to as predictive coding or computer-assisted review, as a legitimate discovery tool. Since that opinion, several other cases have demonstrated that not only do courts welcome (or even demand) the use of TAR, but requesting parties may prefer (or demand) that producing parties use it to identify electronically stored information (ESI) for production.
Moreover, there are no reported cases holding that TAR is an illegitimate means of document review, while at the same time a growing corpus of legal and scientific scholarship confirm its superiority over traditional means of ESI review, such as keyword searching and manual linear review. As the cases discussed below suggest, where the results of the application of TAR can be validated satisfactorily through statistical testing, there is no reason to doubt the completeness of the document production.
The first widely discussed case indicating judicial recognition of TAR as a legitimate means of performing a document review was Magistrate Judge Peck’s ruling in Da Silva Moore v. Publicis Group & MSL Group. In Da Silva Moore, a gender discrimination action, the plaintiffs stated that they were not opposed to defendants’ use of TAR, but rather had “multiple concerns…on the way in which [defendant MSL] plan[ned] to use employ predictive coding.” Specifically, the plaintiffs objected to the defendants’ selection of custodians whose emails would be searched and the phasing of discovery from those custodians. They also differed with the defendants on which sources of ESI the defendants should be required to search. The parties submitted differing protocols to the court indicating how these and other ESI discovery issues should be resolved.
The plaintiffs argued that the defendants had not provided sufficient information to determine whether the results of its application of TAR would be accurate. Judge Peck dispensed with this argument on the grounds that it was premature. He noted that defendants’ proposed protocol was “transparent” in that plaintiffs would be able to see how defendants coded each email in the “seed set.” He also stated that resolution of the parties’ differences could depend on proportionality considerations, which would be impossible to determine at this stage of discovery.
The opinion goes on to discuss research and commentary establishing the superior accuracy of TAR over traditional discovery methods such as keyword searches and manual linear review. Judge Peck stated that his determination that the use of TAR was appropriate was based on:
(1) the parties’ agreement, (2) the vast amount of ESI to be reviewed (over three million documents), (3) the superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches), (4) the need for cost effectiveness and proportionality under Rule 26(b)(2)(C) and (5) the transparent process proposed by [defendants].
State courts have also approved the use of TAR. In Global Aerospace v. Landow Aviation, the defendants moved for a protective order, objecting to breadth of plaintiffs’ requests. The defendants argued that using TAR would be both less expensive and more reliable than either keyword searching or manual review.
In their motion, the defendants compared the estimated costs of various methods of review. They estimated that a first-pass manual review for relevance would cost $2 million and locate only 60 percent of the relevant documents. Keyword searching might be more cost-effective, according to the defendants, but they stated that it would likely retrieve only 20 percent of the potentially relevant documents. With respect to TAR, the defendants asserted that it:
…is capable of locating upwards of seventy-five percent of the potentially relevant documents and can be effectively implemented at a fraction of the cost and in a fraction of the time of linear review and keyword searching. Further, by including a statistically sound validation protocol, [defendants’] counsel will thoroughly discharge the ”reasonable inquiry” obligations…
The court ordered that “defendants shall be allowed to proceed with the use of predictive coding for purposes of the processing and production of electronically stored information.”
Not only have judges accepted TAR, but in at least one case the use of TAR has been ordered sua sponte. In the Delaware Chancery Court, which regularly presides over complex corporate litigation, the judge in EORHB v. HOA Holdings LLC issued an order from the bench that the parties either use TAR or “show cause why this is not a case where predictive coding is the way to go.”
Federal courts have also included TAR protocols in case management orders. In In re Actos (Pioglitazone) Products Liability Litigation, Magistrate Judge Hanna Doherty ordered a detailed protocol to govern the use of TAR, to which the parties had stipulated. Specifically, the order included a “Search Methodology Proof of Concept” and stated that the parties “agree to meet and confer regarding the use of advanced analytics as a document identification mechanism for the review and production of data.”
Email from four key custodians was selected to help form the “seed set.” The parties were to work together to “train” the software and decide upon the appropriate statistical validation criteria. The order further provided that random sampling would be applied for “quality control” and that the defendants would still be able to perform a manual review before producing.
Requesting parties have demanded that producing parties use TAR from the outset to identify relevant ESI. In Kleen Products LLC v. Packaging Corp. of America, the plaintiffs asserted that the defendants’ proposed keyword search would find less than 25 percent of the responsive ESI, while TAR would find 70 percent with no additional cost. The defendants argued that they could employ quality control methods to ensure the necessary degree of accuracy and that given how much work had already been done, additional costs involved in applying TAR would be unduly burdensome. After an evidentiary hearing and several months of negotiation between the parties, the plaintiffs agreed to withdraw their demand.
Similarly, in In Re: Biomet M2a Magnum Hip Implant Products Liability Litigation, a hip implant products liability case, the plaintiffs argued that Biomet should have used TAR from the initial stages of its ESI review. Biomet started its efforts to filter its original universe of ESI by applying keyword searching, then de-duplicated the remainder and only then applied TAR to what was left. The court ruled that this process satisfied Biomet’s discovery obligations.
The parties had jointly identified a population of 19.5 million to search for relevance. They also agreed on protocols to “facilitate identification, retrieval and production of ESI.” Biomet asserted that its production method was consistent with these protocols.
The initial stage of keyword searching reduced the originally identified 19.5 million documents down to 3.9 million, equivalent to 1.5 terabytes, constituting 16 percent of the original 19.5 million. Subsequent application of de-duplication further lowered the population to 2.5 million. Finally, TAR culled relevant ESI from the 2.5 million documents that were left.
Biomet performed a statistical analysis to validate its results. Sampling showed a 99 percent confidence rate that between .55 percent and 1.33 percent of the documents to which TAR was applied were responsive and with the same confidence rate indicated that between 1.37 percent and 2.47 percent of the original 19.5 million documents were responsive. The costs associated with the production were $1.07 million at the time the motion papers were submitted, and Biomet estimated that when complete the costs would be between $2 million and $3.25 million.
The plaintiffs averred that Biomet had produced only a fraction of the relevant documents. They argued that published research and scholarship has established the relative inaccuracy of keyword searching to TAR. Accordingly, they asserted that the keyword search had “tainted” the entire process. They declined Biomet’s offer to suggest additional keyword search terms, claiming that it would be impractical because the plaintiffs were unfamiliar with Biomet’s terminology. The only way the plaintiffs believed the process could be fixed was to start again from the beginning and apply TAR to the entire original universe of 19.5 million documents.
The court rejected plaintiffs’ demands based on, inter alia, proportionality grounds, in light of the results of Biomet’s sampling and the projected cost of complying with plaintiffs’ demands:
Even in light of the needs of the hundreds of plaintiffs in this case, the very large amount in controversy, the parties’ resources, the importance of the issues at stake, and the importance of this discovery in resolving the issues, I can’t find that the likely benefits of the discovery proposed by the Steering Committee equals or outweighs its additional burden on, and additional expense to, Biomet. FED. R. CIV. P. 26(b)(2)(C).
Note that in this case as in others, the requesting party, presumably the party with the strongest interest in receiving a complete production, was pushing for expanded use of TAR. Indeed, the requesting party argued that the failure to use TAR from the beginning “tainted” the entire process. This is a stark demonstration of the fact that TAR is not only appealing to defendants with large volumes of ESI who are concerned with reducing cost, but also to requesting parties who are interested in receiving a complete and accurate production.
Clearly TAR is gaining traction among the judiciary as well as litigants. Given that so far the reported cases involving TAR all support its validity as a discovery tool, it seems inevitable that the use of TAR will increase. The next front may be what thresholds are acceptable for TAR-generated productions in terms of statistical measures such as confidence level and margin of error. However, the fact that the results of TAR can be measured through transparent, generally and judicially accepted statistical methodologies only further supports its defensibility.
The views expressed in this article are those of the author and do not necessarily represent the views of Ernst & Young LLP.