While keyword searches for Google purposes, online shopping and even legal research are safely ensconced in the public consciousness, the use of keywords in e-discovery is drawing considerable fire from a range of opponents. As we enter 2012, keyword search is officially under siege.
U.S. Magistrate Judge Andrew Peck, in his recent endorsement of computer-assisted review in Da Silva Moore v. Publicis Group SA, launched what is now becoming an increasingly typical salvo by stating “key words, certainly unless they are well done and tested, are not overly useful.” Judge Peck was rather tame compared to other noted e-discovery commentators, like Ralph Losey, who’ve been even more vociferous. Mr. Losey explains in his noted Secrets of Search blog series that:
The First Secret: Keywords Search Is Remarkably Ineffective at Recall. First of all, and let me put this in very plain vernacular so that it will sink in, keyword search sucks. It does not work, that is, unless you consider a method that misses 80% of relevant evidence to be a successful method. Keyword search alone only catches 20% of relevant evidence in a large, complex dataset, such as an email collection. Yes, it works on Google, it works on Lexis and Westlaw, but it sucks in the legal world of evidence gathering.
“Go Fish” has no place in e-discovery
In order to really understand the “evils” Mr. Losey is attacking, it’s critical to understand the exact scenario that is being vilified, which often is referred to as “blind keyword search.” A blind keyword search workflow has been compared to the “Go Fish” game of random guessing, where the requesting party attempts (in a vacuum) to divine keywords that they think will identify responsive ESI.
This method of shooting arrows in the dark is correctly criticized by many of the e-discovery cognoscenti, including the standards body The Sedona Conference. In its Best Practices Commentary on the Use of Search & Retrieval Methods in E-Discovery, one of the definitive works regarding information retrieval in a legal setting, Sedona notes a fundamental issue with basic keyword searching which occurs because “simple keyword searches end up being both over- and under-inclusive in light of the inherent malleability and ambiguity of spoken and written English.
And yet, there’s no reason to throw the baby out with the bathwater, since keyword search can be a uniquely successful tactic to examine large amounts of ESI. Commentary to Sedona Principle 11 notes:
The selective use of keyword searches can be a reasonable approach when dealing with large amounts of electronic data… This exploits a unique feature of electronic information – the ability to conduct fast, iterative searches for the presence of patterns of words and concepts in large document populations.
Iteration and other keyword search best practices
Not surprisingly, the Sedona Commentary does not prescribe a specific recipe parties should follow. It does suggest that the following key components should be used in an effective and defensible search methodology.
- Testing. Searches need to be tested for efficacy, i.e., whether the search is producing over- or under-inclusive results. As Sedona states: “[m]ore advanced keyword searches using ‘Boolean’ operators and techniques borrowed from ‘fuzzy logic’ may increase the number of relevant documents and decrease the number of irrelevant documents retrieved.”
- Sampling. The primary way to test the efficacy of a search is through sampling. In Victor Stanley v. Creative Pipe, Magistrate Judge Paul Grimm states that “[t]he only prudent way to test the reliability of the keyword search is to perform some appropriate sampling of the documents determined to be privileged and those determined not to be in order to arrive at a comfort level that the categories are neither over-inclusive nor under-inclusive.”
- Iterative feedback. Last, and certainly not least, it is critical to create an iterative feedback loop in order to learn from previous over/under-inclusive searches. Additional Boolean operators can be used to then refine the process prior to significant human review.
If deployed, this advanced type of keyword search is no longer akin to the Go Fish search game because actual data is examined as part of the search strategy process. These factors inherently increase the precision and recall (i.e., over/under-inclusiveness) beyond a basic, blind keyword search.
The role of keyword searches in a predictive coding landscape
Ironically, several new advances under the umbrella of technology assisted review (TAR), including predictive coding, are causing much of the increased scrutiny over the role of keyword search. And yet, as seen in the workflow in the recently minted Da Silva Moore case, Judge Peck acknowledges the role of keyword searches as part of a larger predictive coding protocol:
The remainder of the seed set was created by MSL reviewing “keyword” searches with Boolean connectors (such as “training and Da Silva Moore,” or “promotion and Da Silva Moore”) and coding the top fifty hits from those searches.
Similarly, keyword searches were likely used to confirm that the targeted custodians were the right individuals because these searches can be deployed much more quickly than the more involved (and conceivably more accurate) predictive coding workflow.
It’s useful to look to the Sedona Conference to provide a “North Star” in these quickly changing times. In Practice Point 8 (to the Search Best Practices), Sedona wisely warns:
Parties and the courts should be alert to new and evolving search and information retrieval methods. What constitutes a reasonable search and information retrieval method is subject to change, given the rapid evolution of technology. The legal community needs to be vigilant in examining new and emerging techniques and methods which claim to yield better search results.
So, while the future is bright with a range of advances in information retrieval, it’s important to recognize that keyword search is an important arrow in the quiver of any e-discovery practitioner. Is keyword search alone perfect? No, but neither is any other e-discovery approach. The critical distinction is recognizing the strengths and weaknesses of any given technology and utilizing best practices (like testing, sampling and iteration) to address those weaknesses.
Fortunately, e-discovery doesn’t require perfection. So, for the foreseeable future reasonable keyword search can be a useful and cost effective e-discovery tool.