Those old enough to have watched TV in the early ’80s will undoubtedly remember the FRAM oil commercial in which the mechanic utters his iconic catchphrase: “You can pay me now, or pay me later.” The gist of the vintage ad was that the customer could either pay a small sum now to replace his oil filter, or a far greater sum later to replace the car’s entire engine.

This “pay me now/pay me later” scenario perplexes many of today’s organizations as they try to effectively govern (understand, discover and retain) electronically stored information (ESI). The challenge is similar to the oil filter conundrum, in that companies can make rather modest upfront retention/deletion decisions in order to prevent monumental, downstream e-discovery charges.

Fortunately, savvy organizations are starting to realize that the cost of storage shouldn’t be the main factor in determining if data is ever deleted. Given the nearly unlimited storage reality that the cloud is promulgating, the question shouldn’t be, “What does it cost to keep data indefinitely?” Instead, the more germane question is, “How much will it cost to search through endless terabytes/petabytes of data when there’s a governmental inquiry, e-discovery event or internal investigation?”

A number of recent surveys have shed light on this issue by contrasting the low cost of storage with the much higher cost of conducting basic e-discovery tasks, such as preservation, collection, processing, review and production. The results are startling. In a recent Association for Information and Image Management webcast it was noted that “it costs about 20 cents a day to buy one gigabyte of storage, but it costs around $3,500 to review that same gigabyte of storage.” It turns out that the $3,500 review estimate (which sounds prohibitively high, standing alone) may actually be on the conservative side.

A study by the Minnesota Journal of Law, Science and Technology noted that e-discovery costs range anywhere from $5,000 to $30,000 per gigabyte. The $30,000 figure is also roughly in line with other per-gigabyte e-discovery cost models, according to another survey by the RAND Corp. In an article titled “Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery,” authors Nicholas M. Pace and Laura Zakaras conducted an extensive analysis and concluded that “the total costs per gigabyte reviewed were generally around $18,000.”

It’s likely that these review costs have practically hit bottom given the twin realities of review speeds and hourly wage compression. To begin, the ability to significantly improve review speeds is unlikely at best. The RAND survey found that “given the trade-off between reading speed and comprehension, especially in light of the complexity of documents subject to discovery in large-scale litigation, it is unrealistic to expect much room for improvement in the rates of unassisted human review.”

If review rates are near their theoretical maximum, then the only other way to impact the per-gigabyte document review calculation is to decrease the labor rates for the review task. This too is unlikely to produce any demonstrable gains. According to the RAND survey, “the rates currently paid to such project attorneys during large-scale reviews in the United States may well have bottomed out, with further reductions of any significant size unlikely.”

Accordingly, the legal community is likely stuck with something on the range of the $18,000 per gigabyte metric for human-based document review. Given that data is doubling every 18 months, there are only two foreseeable ways to address the oncoming onslaught of e-discovery costs.

  1. Using technology assisted review (TAR), including predictive coding, to make the review process less manual. Predictive coding refers to a type of machine learning technology that can be used to automatically predict how documents should be classified. While human reviewers are still involved, their efforts are highly leveraged by the review platform to break through barriers of the traditional linear review process. To properly deploy this technology, a smaller team of attorney reviewers trains the system by classifying (or coding) statistically significant subsets of the overall corpus of ESI based on criteria such as relevance and privilege. Once properly trained, the machine is able to then extrapolate the training sessions to the rest of the ESI population, thereby minimizing attorney review costs.
  2. Create an information governance program to defensibly manage and delete ESI. By managing information better, organizations can attack direct costs, such as the $18,000 per gigabyte review figure, as well as address the latent information risk that’s attendant with keeping too much data around. If you also add in the risks for sanctions due to spoliation, the true (albeit still murky) information risk portrait comes into focus. It is often this calculation that is missing when legal departments go to bat to argue about the necessity of information governance solutions, particularly when faced with the host of typical objections (“storage is cheap,” “keep everything,” “there’s no return on investment for proactive information governance programs”).

Human review costs are significant and not likely to decrease. This means that exploding data rates will inevitably turn into higher e-discovery costs for most organizations. The good news is that as the e-discovery market continues to evolve, practitioners (legal and IT alike) will come to a better and more holistic understanding of the latent information risk costs that the unchecked proliferation of data causes. It will be this increased level of transparency that permits the budding information governance trend to become a dominant umbrella concept that unites legal and IT. Failure to “pay now” by defensibly managing and deleting data means that organizations won’t be able to free themselves from the yoke of increasingly escalating litigation budgets.