A recent commentary in The Economist states that, according to one estimate, “mankind created 150 exabytes (billion gigabytes) of data in 2005. This year, it will create 1,200 exabytes.” “The Data Deluge,” The Economist, Feb. 25, 2010. To put these numbers in perspective, all the catalogued books in the Library of Congress total 15 terabytes, while five petabytes (approximately 5,000 terabytes) is roughly equal to all the letters delivered by the U.S. Postal Service in 2010. “All too much: Monstrous amounts of data,” The Economist, Feb. 25, 2010.

An exabyte is about 1,000 petabytes, and is estimated to be equal to all the printed material in the world. Not surprisingly, the exponential growth of information results in challenges to the justice system, since evaluating electronically stored information (ESI) is often one of the most important facets of litigation. “The Sedona Conference Best Practices Commentary on the Use of Search & Information Retrieval Methods in E-Discovery,” 8 Sedona Conf. J. 189, 194 (2007).

In recent years, legal teams have relied on basic keyword-search technologies to identify and exchange ESI related to a matter. Although leveraging technology to search for keywords was a great leap forward among lawyers who historically reviewed paper documents without the aid of technology, keyword-search technology alone is often inadequate to meet electronic-discovery challenges without resulting in unnecessary risk, expense or both.

The problem is that keyword searches tend to be both over- and underinclusive, in light of the inherent ambiguity of language. For example, the ambiguous nature of the word “strike” could result in a keyword search that identifies documents relating to labor unions, a military action, bowling or a baseball game, even though the matter in question deals only with a labor union strike. The overinclusive nature of the search results could have serious financial consequences, since the excess documents that are retrieved must typically be segregated as responsive or nonresponsive by legal teams paid hourly for document review. Alternatively, failing to guess the right keyword search terms could have the opposite underinclusive effect, since potentially relevant documents may be overlooked. The failure to retrieve these documents may affect not only the outcome of the matter; it could also lead to sanctions if the information is not identified and produced to the requesting party.


More intelligent search technology evolved to meet the limitations of basic keyword-search technology. Although none of these technologies solved all the limitations of keyword searching, “traditional concept search” technology made strides in addressing the risk of underinclusive keyword searches by expanding search results to include conceptually related documents. The trade-off is that traditional concept-search technology can inherently lead to overinclusive search results that could significantly increase downstream processing and review costs.

Traditional concept searching reduces the risk of information being overlooked by leveraging technological advances to search more broadly than keyword-search technology, but the benefits are often overshadowed by the technology’s lack of precision. In other words, traditional concept searching reduces risk by retrieving a much larger number of potentially responsive documents than keyword searching to ensure thoroughness, but many of these documents are irrelevant or “false positives” that must be segregated from responsive documents through manual (linear) document review at great cost. According to industry estimates, attorneys typically review between 50 and 60 documents per hour when conducting a linear page-by-page document review. That means if law firm associates are billed out at an average of $100 per hour to review 50,000 documents, the cost of reviewing those documents would be somewhere between $80,000 and $100,000. Bennett B. Borden, “The Demise of Linear Review,” Williams Mullen E-Discovery Alert, October 2010.

The problem stems from the fact that traditional concept searching is performed in a “black box.” A black-box search means users have no visibility or control over which concepts are included as part of a search, because every concept related to the chosen key term is automatically included whether or not the concept is relevant. This one-size-fits-all approach means every traditional concept search tends to be unnecessarily broad, since users cannot intelligently narrow the search results.

For example, although concept searching the term “strike” to investigate a labor dispute would likely recall relevant documents about labor union contracts, it might also recall a greater number of irrelevant documents than a basic keyword search. Since linear document review is one of the costliest facets of e-discovery, lawyers and their clients may choose to run the risk of providing incomplete document productions rather than using traditionally broad concept-search technology to help minimize these risks.


Despite initial limitations, advances in concept-search technology that add visibility into search construction can reveal a gold mine of more relevant search results. The key is to enable attorneys to refine the relevancy and enhance the precision of their searches to increase the accuracy of search results. In addition to reducing the risk that relevant documents are overlooked, next-generation concept-search technology can significantly reduce the time and expense resulting from overinclusive document retrieval. This is accomplished by providing a transparent view into the contents of the “black box,” and listing concepts related to a particular keyword. This transparency allows users to select (or deselect) concepts related to a particular term before the search is executed so search results will be more relevant.

The ability to choose which concepts will be included as part of a search introduces a new era in early case assessment that gives users early insight into documents that may be critical to the outcome of a case. For example, searching for the keyword “diamond” as part of an investigation into insider trading of Apple stock by Diamond Investment Co. would yield significantly different results depending on the technology used. Traditional black-box concept-search tools would automatically include every concept (such as other precious gems) related to the term “diamond,” and return a high number of irrelevant documents, thereby delaying the ability to assess the case thoroughly and quickly.

On the other hand, concept-search technology that gives users the flexibility to see and select only concepts related to the word “diamond” intended to be included in the search (such as “investment” and “insider”), will provide the most relevant search results. The ability to search with precision enables legal teams to understand and assess case strategy early instead of following traditional approaches that tend to require painstaking review of irrelevant documents before the case can be properly assessed.

Next-generation concept-search technology also expedites document review by providing tools to review ESI in a less expensive nonlinear fashion. Attorneys who review randomly organized documents are not as productive as they could be if the same set of documents were logically organized according to similar concepts and degrees of relevance. Studies have shown that reviewers using advanced techniques can review documents up to six times faster than traditional linear reviewers and with better accuracy. Applying a sixfold increase to the review rates mentioned earlier would result in an astounding reduction in review costs exceeding 80%. The ability to construct intelligent concept searches for documents with a degree of precision that is not possible in traditional concept-search tools allows documents to be organized and reviewed logically. The more logical the connection between the documents being reviewed by a person, the faster and more accurately that person will be able to review those documents.

No search technology is a “silver bullet” solution that can completely eliminate the risk and expense of manual document review. However, by giving users the flexibility to apply a wide variety of search technologies that provide transparency and control over searches, legal teams will be empowered to substantially reduce the risk and expense of e-discovery, and maximize strategic advantages.

Matthew Nelson is senior e-discovery counsel at Clearwell Systems, where he leverages his legal and technology background to help organizations address challenges related to e-discovery, compliance and records management.