When was the last time you sat at your computer and deleted old files? Yesterday? Never? Don’t remember? Before today’s ubiquitous search engines, there was practical value in being a filer rather than a piler — it was difficult to find a document in a filing cabinet without an index.
Today’s sophisticated search engines obviate the need to manually index. Search technology is wonderful if we know what we are looking for, but is it an information management panacea? Information is growing at an astonishing rate, so much so that the numbers used to communicate growth projections are now so huge that they are almost meaningless.
Until recently, this unfettered growth was generally viewed as hazardous. It drives up storage costs, makes it difficult to find the wheat among the chaff, and increases electronic data discovery risk and cost, the argument goes. The resulting mantra: “We need to categorize it, control it, and clean it up!”
Companies have spent decades paralyzed by a near inability to adapt modernist paper records management programs to decidedly postmodern information systems. Today, no part of the organization (including IT) exerts centralized command and control over data, and we have yet to find an easy replacement for the file clerk. Enter Big Data, where uncontrollable information growth is no longer viewed as evil, or even a necessary evil. In the Big Data world, system administrators now treat bursting databases and file shares not as a shameful secret shared sotto voce in committee meetings, but as something to brag about. In Big Data, information has no downside. It is exalted in Davos, where the World Economic Forum recently “declared data a new class of economic asset, like currency or gold.” It’s been profiled by The New York Times. Proponents call it “the new oil,” proclaiming it presents the biggest opportunities since the dawn of the Internet.
So why does Big Data matter to the legal community? Because it heralds a new battle, over a single question: Should we keep the information we create forever, or should we throw some of it away? The answer used to be simple. It was not feasible to keep everything. The cost was too high, the effort too great. Overburdened systems fail. Information overload reduces productivity. Data must be migrated from old to new systems, with great difficulty and expense.
The chance that you might have a smoking gun buried in the data creates too high a risk of liability. After all, if we learned one lesson from the seminal EDD cases metastasizing from the bankruptcies of Enron (Andersen v. U.S., 544 U.S. 696, 704 (2005)) and Sunbeam (Coleman (Parent) Holdings, Inc. v. Morgan Stanley & Co., Inc., No. 502003CA005045XXOCAI (Fla. Cir. Ct., March 1, 2005)), it is that data skeletons in the closet can be spooky.
But Big Data changes the calculus. The software used by Google and Yahoo to index the Internet is open source, called Apache Hadoop. This brings Internet scale and speed to just about any organization, and it can be run on cheap, off-the-shelf disk drives. Tools to analyze the data (some first commercialized in EDD) are accessible and powerful, promising profound new business and societal insights drawn from the vast pools of data. The fundamental promise of Big Data is that it enables insights into business (and the world) that were not possible before. Proponents see Big Data creating a better world, one fulfilling the promise of the Internet itself.
But Big Data advocates downplay the downsides of data, and specifically, the EDD challenges. In the near-Nirvana contemplated by some Big Data proponents, all data is good and more data is better. In EDD, the opposite is usually true.
A recent study by the Pew Research Center about the future of Big Data was positive overall, but acknowledged concerns related to privacy, social control, misinformation, civil rights abuses, and the possibility of simply being overwhelmed by the deluge of data. Within legal, the burden of finding, processing, and producing Big Data in EDD is a foreign concept to most Big Data advocates. Perhaps this is because the Big Data “hype cycle” has not yet reached the “trough of disillusionment” where the hype faces the reality of corporate culture and complex legal and compliance requirements.
Records management doctrines specify that organizations should clearly define the business or legal purpose of a piece of information when created. That analysis determines whether, for how long, and in what form the data should be kept. Records retention schedules are intended to provide a measure of defensibility against spoliation claims, as they evince an intent to delete a record based on a proactive and standardized calculation of its value, rather than a reactive determination based on fears about bad evidence. Many organizations have attempted to play records management catch-up in advance of pending litigation and have paid the price.
Big Data advocates argue that the economies of scale now make it feasible and desirable to capture and store information that currently has no clear or definable business value. Although large organizations have long collected and analyzed data (using business intelligence software), proponents argue that Big Data is different. They posit that cheaper storage and technical innovations make it easier and faster than ever before to analyze that data, eliminating the need to identify the business purpose of data before it is collected and retained.
With Big Data, no rigid “schema” or organizational approach is necessary before capturing content (unlike in a traditional database). Data professionals now (or in the future) can ask open-ended questions of the data. That includes questions that may not be germane now, but may be critical in an unpredictable future.
As a result, more data will be kept longer, in a manner that is unmoored from records management tenets. Without a doubt, this philosophy will complicate the governance and e-discovery of data.
So, when was the last time you sat down in front of your computer and deleted old files? In the world of Big Data, this is not only unnecessary, it’s undesirable. And it’s a waste of time.
Should we keep everything forever? Absolutely not. Too much information still has a downside. It is a liability, as well as an asset. Information has risk. Information has real, unavoidable legal and regulatory requirements. Information has a bite that Big Data proponents ignore at their peril.
But the good news: The same tools and infrastructure that empower the potentially profound insights of Big Data can and should be employed to help organizations make informed decisions about data retention. A vast amount of unstructured data in many organizations (over half, according to some studies) is duplicate, outdated, transitory junk that has no business value. Getting rid of this information en masse, without dragging every employee into the process, is now possible.
E-discovery is the place where the cost of information management myopia becomes painfully visible, and is why EDD has consistently driven innovation in handling and understanding vast amounts of data. However, even with these innovations, the risk and cost of information in EDD is undeniable, and is correlated to the overall volume of information in the organization.
These are the contours of the coming battle between Big Data and e-discovery. It is a philosophical and cultural battle. It is the responsibility of EDD and information governance attorneys and practitioners to gird themselves for this battle. Learn about Big Data, and inform the discussion and decisions in your organization. •