Litigation and investigations create angst for business executives because of how they impact the bottom-line. In large part, this comes from the cost and time required to analyze data during discovery—a process known as document review—which often takes up most of the discovery budget. However, a spectrum of tools exists that accelerates document review, significantly reducing cost and time by as much as 70 percent to 80 percent. As a result, it is important to consider this spectrum before deciding how to conduct a review.

For now, the spectrum includes:

  • Email threading
  • Enhanced linear review
  • Clustering
  • Technology-assisted review

In this article, we will discuss this spectrum of solutions, explain their ideal use-cases, and identify the strengths and weaknesses for each approach.

The old-school way of reviewing documents was linear,  looking at each document to assess it for relevance and privilege. This approach made sense because the cost and time it took to assess each document seemed justified by the risks it was perceived to manage. With the creation of computers, PDAs and other methods for conducting business electronically, the sheer volume of potentially-relevant information in most cases has made this manner of review ineffective. In addition, the solutions addressed herein have also made this manner of reviewing documents passé.

Most employees communicate electronically, making email a significant category of ‘documents’ in litigation. In addition, email conversations are often repetitive. As a result, the first thing to do to accelerate review is identify these redundant emails and tag them. This is done by a process called email threading whereby emails within a conversation are grouped and only the most inclusive email is reviewed. This entire thread of emails is then tagged consistently across the conversation. As a result, both the time needed to review this category of documents and its costs are reduced, sometimes by as much as 50 percent. Email threading can also be used with other technologies to accelerate the document review process further, particularly in cases that have other language redundancies as described below.

An option that recognizes this is enhanced linear review. This solution presents documents to reviewers electronically, making it much easier to identifying redundancy in them. A number of solutions offer this feature, but a few are more intuitive than others. Hence, the time required to train your reviewers and get them started is minimal.

Another way of accelerating document review is by clustering technology. This process uses text analytics to index all of the words in the data set and then assess the frequency with which these words appear. The system makes another assessment of the proximity that these words appear next to each other and uses both of these measures to determine the relevance of the documents to rank them in clusters. It then derives labels based on the predominant key words within each grouping.

One key benefit of clustering is that it organizes the data set objectively, without preconceptions, which is helpful in exposing unexpected information that can help or hurt a case. This factor is particularly helpful when reviewing production sets since it can give you a much quicker idea of what the opposing party has produced to you.

Last, but not least, is technology-assisted review (TAR). We have read a great deal about how TAR can hyper-accelerate document review, particularly in cases that involve a significant amount of data, require document review to be completed quickly or both.

The two categories of TAR include artificial intelligence-based TAR and language-based TAR. The most significant benefit of these approaches is to hyper-accelerate review, saving a great deal of time and money relative to linear review. To determine which approach is most appealing depends on the time available to train the application, and the need for transparency across the process.

Since artificial intelligence-based TAR identifies relevant documents by computer, its ability to understand context is driven by the information provided to it. Hence, greater attention should be given to the training set used to identify these documents. Generally, this means that enough time must exist to read about 10,000 documents to train the system.

Conversely, if the data set contains complex semantic patterns, use language-based TAR. Because it leverages human decision-making rather than a computer to identify relevant documents, this language-based approach is better able to recognize these semantic patterns and how they can contain relevant information. This process also provides greater transparency because the decisions are made by humans rather than by an artificial intelligence system. Lastly, language-based TAR is also preferable when the re-use of the decisions made in the initial review could be used across subsequent, similar matters to further expedite the review process.

A best practice to accelerate document review is to include the entire spectrum in your toolkit. Few cases are similar enough in volume, data type and the like to justify always relying on one approach. As a consequence, make it a practice to consider these options with each case. Lastly, include a document review expert to facilitate this analysis who is also tasked with documenting the decision-making process, as well. Each of these steps will enhance the likelihood that the document review is a time and cost-effective as it can possibly be.