Over the past two years, the issue of predictive coding—using computer-generated algorithms to aid in the determination of discoverable material—has attracted a great deal of attention. Yet this new frontier of legal technology has also raised significant issues, including how the work-product doctrine and its protections will translate as parties and courts confront the use of technology in the discovery process.1 In this article, we briefly survey the recent decisions concerning the use of predictive coding, and explore the conflict over the extent to which the core processes of predictive coding are discoverable, including information about the “seed set” used to derive the relevant computer algorithm.

TAR: Predictive Coding

Predictive coding is a specific aspect of Technology-Assisted Review (TAR), which is an effort to make the e-discovery process more efficient through the effective use of technology. (For example, one commonly utilized TAR function is the use of keyword searches to reduce the volume of material to be reviewed.) Specifically, predictive coding employs computer algorithms to sort through the documents acquired through the e-discovery process, and to select the relevant material for production. This is not a purely automated process, however; the computer algorithms “learn” what documents are likely responsive through interactions with a human reviewer. The human reviewer codes a “seed set” of documents, noting which documents are responsive or non-responsive to specific issues. To be effective, the process will include identifying the “key” documents in a matter (to the extent known), as well as documents that are entirely irrelevant to the matter but likely to exist in the overall review population. The computer algorithm observes the properties of the relevant documents in this “seed set,” and, based on these observations, is subsequently able to classify the other documents in the set of potentially relevant material, without recourse to a human reviewer. This process does not stop with one round of analysis. Effective use of computer-assisted review requires an interactive process. After the seed set is developed, the computer algorithm provides tentative classifications, which are then confirmed or rejected by a human reviewer. This iterative feedback process, which requires human reviewers to again classify documents codified by technology, is a means to train the system. The documents used in this process are sometimes considered part of the underlying seed set, or are otherwise referred to as “training sets.” Some practitioners believe that in an appropriate case a properly devised computer algorithm can be faster, cheaper, and more accurate than other methods of document review.2