ALM Properties, Inc.
Page printed from: Law Technology News
Select 'Print' in your browser menu to print this document.
Technology-Assisted Review From the Plaintiffs' Side
Law Technology News
For all the importance that lawyers place on being rational, we can be an awfully irrational bunch when it comes to technology. We accept that a doctor can suture a human heart using a computer-controlled robot in an operating room thousands of miles away. We know that computers enabled NASA to land a spaceship gently on Mars after a trip of 255 days traveling more than 40 million miles. We have seen an "intelligent" computer win at Jeopardy against the most successful human players. Yet we resist public acceptance of the notion that computer analytics can, at least in large cases with masses of electronic data, identify documents relevant to a lawsuit more effectively than lawyers composing lists of keywords. And despite abundant evidence, some lawyers do not want to accept that a computer running complex algorithms can locate key documents more reliably than a roomful of humans a species that still considers perfect game play at tic-tac-toe to be a notable achievement.
That the legal profession is notoriously slow to adopt new technologies is hardly breaking news. However, the resistance among current practitioners to even consider the use of technology-assisted review,[FOOTNOTE 1], especially in large complex cases, is a particularly confounding episode of techno-legal disconnect.[FOOTNOTE 2] Even more confounding is the resistance to engage in open dialogue about the possibility of using TAR to facilitate cooperative, efficient, and expeditious discovery.
It is clear by now that TAR, implemented correctly in cases involving large-scale document review, can significantly reduce the costs and burdens of discovery. The failure to consider and, in certain cases, implement TAR can no longer be justified given the demonstrated efficacy of currently available TAR programs in comparison to human review.[FOOTNOTE 3] Attorneys who fail to inform themselves about TAR and consider its application in appropriate cases may impede rather than facilitate the just, speedy, and inexpensive administration of justice.[FOOTNOTE 4]
1. Document review processes including advanced computer analytics can produce more accurate results than reviews using only keyword search and human review.
The use of computers to assist in document review is not new. Initially, primitive databases were used to index the physical locations of paper documents. In the 1970s, large document collections could be indexed and sorted by date, author, recipient, and other manually input fields. Once computer storage and scanning technology advanced sufficiently, law firms began scanning documents to large computer "platters," and loaded the platters into electronic jukeboxes so that users could recall documents on their computers as opposed to pulling hard copy documents from storage. The review process itself, however, remained manual. Documents might be organized by virtually any criterion (date, author, custodian, department, subject, project, etc.), but there was no automated tool available to cull irrelevant documents from a collection, whether the documents were in a dusty warehouse or on a shiny computer disk.
The proliferation of desktop computers in the 1990s not only triggered the well-documented deluge of data volume but also dictated that future data collection would be largely custodian- or location-based, not subject-based, resulting in the urgent need for a method of culling huge data sets to a manageable size for document review. Advances in optical character recognition and database software led to the development of document review platforms that permitted complex Boolean keyword searches on the full text of documents as well as the linking of coding entries to document images to allow more efficient retrieval of documents on specific topics. The use of keyword search was adopted by litigants and endorsed by some courts as a method of reducing the volume of documents to be evaluated by human reviewers, despite widespread recognition that the method was far from perfect in locating responsive documents.[FOOTNOTE 5] Keyword search has been found to be both under-inclusive (up to 80 percent of the responsive documents in a collection may be missed) and over-inclusive (over 70 percent of the documents "hit" may, upon review, be deemed irrelevant).[FOOTNOTE 6] Under-inclusiveness risks important documents being overlooked; over-inclusiveness raises the cost of review unnecessarily.
A number of studies in recent years have shown that review teams utilizing TAR tools can achieve more accurate results than teams using only keyword search and manual review. In a recently published monograph entitled Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery[FOOTNOTE 7] (hereinafter, "RAND"), the RAND Institute for Civil Justice concluded that, "although no experimental setting to assess the relative qualities of human or computer-categorized review can be completely free of 'unrealism and artificiality,' the empirical evidence that is currently available does suggest that similar results in large-scale reviews would be achieved with either approach."[FOOTNOTE 8] Simply put, it can no longer be reasonably disputed that use of well-developed and tested TAR in a well-designed and implemented review process can improve the quality of the review.
2. TAR can provide major cost savings in the review process.
Corporations, business-funded lobbying groups, and the defense bar claim that "the discovery system is broken," and that e-discovery has "pushed the civil justice system to the brink."[FOOTNOTE 9] Some voices from the corporate sector go so far as to suggest that discovery costs are "sapping the competitiveness of our country,"[FOOTNOTE 10] and that drastic and immediate revisions to the Federal Rules are needed to narrow the scope of discovery itself.
Document review is, by far, the most expensive component of discovery. The RAND report concludes that review consumes about 73 percent of discovery costs (with collection and processing accounting for the rest). There are few published reports providing sufficient detail about the use of TAR in actual discovery productions to allow for direct comparisons to the cost of all-human review, and cost-saving claims in vendor marketing materials are suspect. The manner in which TAR is implemented can vary greatly, producing a wide range of estimated savings in the total cost of review. The RAND researchers found that estimates of actual cost savings in large-scale document reviews generally ranged from 20-30 percent at the low end to more than 70 percent, with one litigant reporting a reduction of about 80 percent in the number of attorney review hours. RAND concluded that, even accounting for the added cost of vendor services and the use of experienced counsel for the machine-learning tasks, the cost of a technology-assisted review is "likely to be substantially lower than the costs of human review." Commentator Ralph Losey opines that litigants can reasonably expect the use of TAR to reduce review costs by 50 to 75 percent.[FOOTNOTE 11]
Given the enormous burdens that some corporations claim e-discovery creates, it is somewhat surprising that corporate litigants have so rarely sought the consent of an adversary or the approval of a court for the use of TAR, even in large and complex cases. The failure is even more curious considering that, according to the RAND survey, companies appear to be using TAR tools for "internal analysis of electronically stored information (such as for developing litigation strategies or locating specific documents of interest)," but not for actual review and production.[FOOTNOTE 12] Another recent survey found that 54 percent of respondents reported using predictive coding, with the majority using the technology not for coding but to allow reviewers "to focus in on important materials faster" to facilitate early case assessment.[FOOTNOTE 13]
3. Endorsement of the use of TAR applicable in the right circumstances.
For more than five years, courts have recognized that technologies more sophisticated than keyword search can (and perhaps should) be used to control costs and increase accuracy in civil litigation. In Disability Rights Council of Greater Washington v. Washington Metropolitan Transit Authority, 2007 WL 1585452 (D.D.C. June 1, 2007), Judge John Facciola directed the parties' attention to "recent scholarship that argues that concept searching, as opposed to keyword searching, is more efficient and more likely to produce the most comprehensive results." Id. at *9 (citing George L. Paul & Jason R. Baron, "Information Inflation: Can the Legal System Adapt?" 13 Rich. J.L. & Tech. 10 (2007)). Also in 2007, The Sedona Conference issued a set of principles explicitly stating that a party could satisfy its discovery obligations by "using electronic tools and processes, such as data sampling, searching or the use of selection criteria, to identify data reasonably likely to contain relevant information." In its "Best Practices" commentary on the Use of Search and Information Retrieval Methods in E-Discovery, also published in 2007, The Sedona Conference suggested the use of "conceptual searching" [and] other machine learning and text mining tools that employ mathematical probabilities." In 2008, in Victor Stanley, 250 F.R.D. at 256-57, Judge Paul Grimm referred to the "growing body of literature that highlights the risks associated with conducting an unreliable or inadequate keyword search or relying exclusively on such searches," and went on to suggest a number of more advanced search methodologies that could be used in discovery, including concept searching and document clustering.
The use of TAR received another powerful endorsement in 2008 with the enactment of Federal Rule of Evidence 502, which provides that the inadvertent disclosure of privileged information does not operate as a waiver if (among other conditions) the disclosing party took "reasonable steps" to prevent the disclosure. The Judicial Conference Advisory Committee's Explanatory Note on Rule 502 explicitly states that "a party that uses advanced analytical software applications and linguistic tools in screening for privilege and work product may be found to have taken 'reasonable steps' to prevent inadvertent disclosure." Judge Grimm drove this point home in a recent article assessing the effectiveness of Rule 502:
[T]he Committee Note stresses how important it is that reviewing courts be receptive to the use of search and information retrieval methods that facilitate pre-production review of ESI via computer-based analytical methods, rather than the far more labor-intensive and expensive process of having lawyers review each digital document. Simply put, one of the "two major purposes" of Rule 502 was to bring down the cost of pre-production review of ESI by enabling lawyers and parties to use computer-based analytical methods to search for and identify privileged and protected information, as well as other analytical methods, such as sampling, that avoid the enormous expense associated with personal review of each digital document. The rule cannot achieve this goal if lawyers do not use these analytical methods, or if courts do not support their use by acknowledging that when the methods are properly used, they are reasonable.[FOOTNOTE 14]
In 2009, a district court in New Jersey held that privilege had not been waived when privileged documents were inadvertently produced due to errors arising out of a party's "commendable effort to employ a sophisticated computer program to conduct its privilege review."[FOOTNOTE 15] The court noted that although the party's implementation of the new application was the cause of the errors, "[t]he use of sophisticated analytical software should be encouraged."
In 2011, the Southern District of New York implemented a pilot program for complex civil cases specifically listing "concept search, machine learning, or other advanced analytical tools" among the approaches parties may consider for search and review of ESI.
Still, through 2011, the glacial pace of TAR-implementation was widely attributed to the absence of case law explicitly approving the use of advanced analytics in discovery.[FOOTNOTE 16] Craig Ball, a noted e-discovery authority, asked, "What are we waiting for? ... It's not as though we held off using keyword search until a judge gave it the nod. We just did it ... If you believe enhanced automated search is better and cheaper, have the courage and wisdom to lead the way in its use."[FOOTNOTE 17]
To date, in 2012, three courts have approved the use of TAR. In Moore v. Publicis Groupe, 2012 WL 607412 (S.D.N.Y. Feb. 24, 2012), adopted sub nom. Moore v. Publicis Groupe SA, 2012 WL 1446534 (S.D.N.Y. Apr. 26, 2012), Magistrate Judge Andrew Peck held that "computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases" and approved a detailed TAR protocol for use in a case involving 3.3 million documents. In terms of production accuracy, Peck wrote, "[c]omputer-assisted review appears to be better than the available alternatives, and thus should be used in appropriate cases." The parties in Moore had agreed on the use of predictive coding but disagreed on its implementation. The district court, in approving the use of TAR, pointed out that "if the predictive coding software is flawed or if Plaintiffs are not receiving the types of documents that should be produced, the parties are allowed to reconsider their methods and raise their concerns with the Magistrate Judge."
In In re Actos (Pioglitazone-Products Liability. Litigation), 2012 WL 3899669 (W.D. La. July 30, 2012), a federal magistrate approved a detailed TAR protocol to which the parties had agreed. In both In re Actos and Moore, the agreed protocols included a significant degree of transparency, with the parties making joint decisions about the relevance or non-relevance of the documents used to train the predictive coding software.
In Global Aerospace Inc. v. Landow Aviation, L.P., No. CL 61040 (Va. Cir. Ct. Apr. 23, 2012), the defendant obtained the court's approval, over plaintiffs' objections, for the use of predictive coding on an estimated 2 million reviewable documents. The court's order was without prejudice to plaintiffs' right to later raise issues concerning the "completeness or the contents of the production or the ongoing use of predictive coding" as discovery progresses.
It is important to note that, in all three cases, TAR is being used to cull documents prior to human review, the same way keyword search has been used in the past. TAR is not being used as a complete substitute for human review. Every document identified by TAR as potentially responsive is expected to be manually reviewed and then produced if it is found to be responsive and not privileged.
In July 2012, Judge Shira Scheindlin voiced her approval of TAR, observing that "there is no guarantee that using keywords will always prove sufficient," even in the simplest of cases. "[B]eyond the use of keyword search, parties can (and frequently should) rely on latent semantic indexing, statistical probability models, and machine learning tools to find responsive documents." Nat'l Day Laborer Org. Network v. U.S. Immigration & Customs Enforcement Agency, 2012 WL 2878130, at *12 (S.D.N.Y. July 13, 2012) (citing Shira A. Scheindlin, Daniel J. Capra, & The Sedona Conference, Electronic Discovery and Digital Evidence: Cases and Materials, at 327 (2d ed. 2012)). Also of note in 2012, the Federal Trade Commission proposed revisions to its Rules of Practice providing that parties responding to Commission requests may "utilize one or more search tools such as advanced key word searches, Boolean connectors, Bayesian logic, concept searches, predictive coding, and other advanced analytics."[FOOTNOTE 18]
Most recently, a Delaware Chancery Court judge directed the parties in a contract action to consider the use of TAR. Vice Chancellor J. Travis Laster told the parties: "This seems to me to be an ideal non-expedited case in which the parties would benefit from using predictive coding. I would like you all, if you do not want to use predictive coding, to show cause why this is not a case where predictive coding is the way to go."[FOOTNOTE 19] The use of new technologies would be preferable to "burning lots of hours with people reviewing," the Court added.
The bottom line is that judges are ready and more than willing to accept the use of TAR in discovery. Lawyers who continue to advise their clients that the use of advanced analytic tools is risky because the technology is too new and court acceptance is uncertain (or who use the technology without disclosing its use to adversaries) may find that the real risk is to their own credibility
4. TAR can be implemented with minimal risk to the producing party.
The collection and identification of relevant and responsive documents in discovery is a multistep process. Keyword search is not a review process; it is a tool most often used to separate potentially relevant documents, which will be submitted to human review, from documents considered so unlikely to be relevant that it is reasonable not to have them reviewed at all. TAR tools can be used in precisely the same way to eliminate documents from the review set. However, because the computer can apply a much finer set of criteria using complex analytics than the very blunt tool of keyword search, many more documents can be removed from the review set.
TAR carries no greater risk than keyword search in this context, provided that sufficient sampling and quality assurance testing is performed on the documents excluded from review to demonstrate that their removal was reasonable. Just as cooperation and transparency can effectively eliminate the risk that the use of keyword searching will be challenged by an adversary, a similar approach can minimize any possible risk in using TAR. A party that chooses to cull documents from review without the involvement of its adversary may well need to defend its process, whether the process employed is keyword search or advanced analytics.
Consultation and collaboration with an adversary require time and effort and may, in some cases, give rise to disputes requiring resort to the court.[FOOTNOTE 20] However, with competent counsel on both sides, the airing of any disagreements about collection and review methodologies early in discovery is far less risky for the producing party than unilateral implementation, which leaves the party potentially open to severe criticism, challenges, cost increases, and maybe even sanctions if problems surface later in litigation. As Craig Ball has noted, whether a particular TAR process is court-approved is less important than whether it is accepted by the opposing party. "The most cost-effective method is one the other side accepts without a fight."[FOOTNOTE 21] A requesting party is far more likely to accept the absence of supportive ESI in a TAR-based production if the requesting party had some input into (or at least a clear view of) the machine training process.
Alternatively, TAR may be used simply to prioritize documents for review after the collection has been culled by keyword searches. As reviewers gain familiarity with the issues and vocabulary in their assigned document tranches, they gain speed and accuracy. Reviewers can also eliminate batches of nonresponsive documents as near-duplicates and closely related documents appear sequentially. Here, the user achieves some savings with no risk of having to validate TAR results or explain the technology to an adversary or a court. Every document that would have been manually reviewed in the absence of TAR is still being reviewed. As a result, the approval of (or disclosure to) the court or opponent is not a concern but only a small fraction of the potential cost reduction is being achieved. This appears to be the primary context in which TAR is being utilized with any frequency.[FOOTNOTE 22]
Reliance on TAR as the sole means of identifying potentially relevant and responsive documents, or identifying privileged or confidential documents, may well become common practice in the near future, and standards may evolve to give litigants confidence that the tools and methods they choose are presumptively reasonable. At present, however, a litigant seeking to rely on advanced analytics without a layer of human review would be well-advised to obtain agreement from the receiving parties and/or approval from the court.
The decision to rely solely on TAR for coding without disclosure to the requesting party is a calculated risk that carries a significant downside if a production problem gives rise to a need for the producing party to defend its process. A good faith and timely attempt by the parties to agree on a TAR protocol should not lead to extensive litigation.[FOOTNOTE 23] Requesting parties have real incentives to agree to the use of TAR by their adversaries. First and foremost, the production should include significantly fewer documents of no or extremely low relevance.[FOOTNOTE 24] The requesting party can anticipate receiving production more quickly and the production can be prioritized by subject matter, making the requesting party's review of the documents more orderly and efficient. Identification and production of the most significant documents early in litigation also facilitate early case assessment, orderly motion practice and possible settlement.[FOOTNOTE 25] It is highly unlikely that a court would not approve an agreement between parties on the use of TAR, but such agreements can be reached only if the parties are willing and counsel is competent. Failure on either of these counts is no longer excusable.
5. So what's the problem?
In the RAND study of e-discovery costs, the RAND Institute asked why, given TAR's "potential to reduce review costs without compromising quality," the technology is not being used by more litigants. After extensive interviews with key legal personnel at major U.S. corporations, RAND identified major factors inhibiting adoption of TAR as (i) concerns about the adequacy of the tools to perform certain tasks, such as locating "smoking gun" documents or identifying privileged or confidential information, and (ii) the perceived risk of using an evolving technology in the absence of judicial guidance. Whether or not these concerns were valid when the RAND interviews were conducted between October 2010 and June 2011, we suggest that they are now outdated for the reasons explained above. There is, in addition, simple inertia leading some practitioners to continue doing what they have always done.
There are other factors that may be impeding the adoption of TAR in litigation. RAND reported, for example, that "there may be concerns that, with disclosure [of the use of TAR], opposing parties might enlarge the scope of the demand due to a perception of lower costs." Of course, a party eschewing TAR for this reason may find the court unreceptive to the argument that the requested discovery would entail undue burden and expense because the party is using antiquated tools.[FOOTNOTE 26]
The era of TAR is upon us; resistance is futile, not to mention counterproductive. Parties failing to use TAR in appropriate cases are squandering time and money. To be considered competent under the recently amended ethical rules, practitioners must keep abreast of emerging technologies. To provide excellent legal services in today's environment, deeper knowledge of TAR tools is essential, along with the practical experience to implement them properly. Not every litigator can be an expert in e-discovery technologies. But every litigator should know when to find one.
FN 1. The term "technology-assisted review" as used herein refers to a review process that combines human input with advanced computer analytics based on linguistic and/or mathematics-based content analysis.
FN 2. Our comments herein are intended to apply to cases with large document sets 25,000 or more although we recognize that there is no "magic number" and the appropriateness of using TAR in a particular case depends on other factors as well.
FN 3. See, e.g., Maura R. Grossman & Gordon V. Cormack, "Technology-Assisted Review in EDiscovery Can Be More Effective and More Efficient Than Exhaustive Manual Review," XVII Rich. J.L. & Tech. 11 (2011).
FN 4. As of August 2012, the official comments accompanying the ABA's Model Rule of Professional Conduct 1.1 state that an attorney's ethical obligation to provide competent representation includes keeping abreast of "the benefits and risks associated with relevant technology."
FN 5. See, e.g., Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008).
FN 6. See, e.g., David C. Blair & M.E. Maron, "An evaluation of retrieval effectiveness for a full-text document-retrieval system, Communications of the ACM, Mar.1985, at 289 (1985). More recent studies by TREC Legal Track indicate that 25 years of practice at keyword search has not brought results any closer to perfect (24 percent in the 2008 study, 22 percent in 2009). See the overview of the TREC 2009 Legal Track.
FN 8. Id. at 66 (footnote omitted). The RAND report cites Barnett, Thomas, Svetlana Godjevac, Jean-Michel Renders, Caroline Privault, John Schneider, and Robert Wickstrom, Machine Learning Classification for Document Review, paper presented at Workshop DESI at the 12th International Conference on Artificial Intelligence and Law (ICAIL 2009), June 8, 2009; Roitblat, Herbert L., Anne Kershaw, and Patrick Oot, "Document Categorization in Legal Electronic Discovery: Computer Classification vs. Manual Review," Journal of the American Society for Information Science and Technology, Vol. 61, No. 1, 2010; Equivio, Am Law 100 Firm Uses Equivio Relevance to Find More Relevant Documents and to Find Them Faster: an Epiq-Equivio Case Study, 2009a (Equivio 2009); and Grossman, Maura R., and Gordon V. Cormack, "Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review," Rich. J. L. & Tech., Vol. 17, No. 3, Art. 11, Spring 2011; Grossman, Maura R., and Gordon V. Cormack, "Inconsistent Assessment of Responsiveness in E-Discovery: Difference of Opinion or Human Error?" DESI IV: The ICAIL 2011 Workshop on Setting Standards for Searching Electronically Stored Information in Discovery Proceedings, June 6, 2011.
FN 9. Lawyers for Civil Justice, Comment to the Civil Rules Advisory Committee, Aug. 18, 2011.
FN 10. Thomas Y. Allman, "Amending the Federal Rules (Again): Finding the Best Path to anEffective Duty to Preserve," Journal of the Federalist Society Practice Groups, Vol. 11, Issue 2, Sept. 10, 2010.
FN 12. RAND at 69.
FN 13. Ari Kaplan and Joe Looby, "Advice From Counsel: Can Predictive Coding Deliver on Its Promise?"
FN 14. Paul W. Grimm, Lisa Yurwit Bergstrom & Matthew P. Kraeuter, "Federal Rule of Evidence 502: Has It Lived Up to Its Potential?", XVII Rich. J.L. & Tech. 8 at 36-37 (2011) (footnote omitted).
FN 15. United States v. Sensient Colors, Inc., 2009 WL 2905474 (D.N.J. Sept. 9, 2009).
FN 16. Magistrate Judge Andrew Peck, shortly before approving predictive coding in Moore, wrote in October 2011: "While anecdotally it appears that some lawyers are using predictive coding technology, it also appears that many lawyers (and their clients) are waiting for a judicial decision approving of computer-assisted review." Search, Forward, Law Technology News, October 2011.
FN 18. 77 Fed. Reg. 3191-01 (Jan. 23, 2012).
FN 19. Transcript of Oral Argument at 66-67, EORHB, Inc. v. HOA Holdings LLC, No. 7409-VCL (Del. Ch. Oct. 15, 2012).
FN 20. See, e.g., Moore, supra, at 8.
FN 22. See supra, n.8 and 9 and accompanying text.
FN 23. As has been widely reported, two days of hearings concerning the use of TAR were held in Kleen Products, LLC v. Packaging Corp. of America, No. 10-5711 (N.D. Ill.), where plaintiffs moved to compel defendants to use predictive coding after defendants had produced more than 3 million pages using traditional means. This approach is not recommended.
FN 24. Requesting parties have additional incentive to agree to TAR where a court may consider shifting the cost of review to the requesting party, as in Adair v. EQT Production Company, 2012 U.S. Dist. LEXIS 75132, at *11 (W.D. Va. May 31, 2012).
FN 25. Of course, with the receipt of productions of million of pages of documents, requesting parties can also benefit by using TAR to prioritize their own reviews, expedite identification of the most relevant documents, and reduce overall costs.
FN 26. RAND and other sources also suggest that the use of TAR may be discouraged by outside counsel who stand to lose the significant revenue produced by large-scale manual document review. While the prospect of losing that revenue may cause some counsel to hesitate, the clients that stand to benefit most from TAR are sufficiently sophisticated and well-informed that they will soon adopt TAR with or without current counsel.
At Milberg, Henry J. Kelston is senior counsel, Ariana J. Tadler is partner, and Paul McVoy specializes in managing complex e-discovery issues.