In 2009, A&E Television Networks began broadcasting a reality-based program, Hoarders. Each week the show examines the case of an individual who suffers from a symptom of obsessive compulsive disorder or obsessive compulsive personality disorder that causes him or her to experience extreme distress at the prospect of discarding specific items, or in some cases, anything at all.
According the show's website: Hoarders not only captures the drama as experts work to help each person get on the road to recovery, but also highlights the individual's inner challenges and triumphs. Although cleaning marks the first step of tackling this disorder, success is not definite. For some individuals, throwing away even the tiniest object is so traumatizing that they are not able to allow the cleaning process to continue, no matter how it may impact their lives.
Extreme examples included episodes chronicling a woman with 76 cats, a man who saved every copy of National Geographic ever printed, and a woman who could not part with a collection of 50,000 dolls. The audience watches in amazement at a person who is seemingly unable to make a rational decision to throw junk away in order to improve his or her life.
Unfortunately, the phenomenon of this condition is not confined to individuals, but wreaks havoc on large organizations as well. Just as this condition carries significant consequences for people, the toll it extracts on corporations is equally destructive, if not readily apparent. There are significant costs associated with the tendency to save nearly everything, regardless of its value.
Indeed, the failure to dispose of anything is itself a decision that all data is of equal value, imposes the same risk on an organization, and is justified by the costs imposed on the organization. Much like hoarders, an organization cannot be as productive in this state, because no one can find, use, or protect what is actually valuable to the organization.
DO THE MATH
A recent article in Science magazine, "The World's Technological Capacity to Store, Communicate, and Compute Information," stated that collectively we have accumulated 295 exabytes of information. While legal and corporate IT departments are finally getting a grip on managing terabytes and moving on to petabytes, exabytes are lurking and ready to be thrust into reality. According to the "Gartner IT Key Metrics Data 2012" report, the total cost of storing and managing a petabyte of information is nearly $5 million per year. Loosely, this translates to about $5,000 per terabyte. However, this is only part of the story. If we assume an organization that stores 10 petabytes of data might have about 1 petabyte of email throughout its IT infrastructure including production email, PSTs, or Lotus Notes files (NSFs), or other loose email files on individual hard drives or file shares and an email archive. (We are purposely avoiding the issue of backup tapes.)
Further assume that this size of organization might pay upwards of $20 million a year on electronic data discovery. From this figure, it is possible to back into the EDD "tax" that must be assessed to a given terabyte of data from a target-rich environment such as email. The RAND Institute for Civil Justice issued a study report, "Where the Money Goes: Understanding Litigant Expenditures for Producing Electronic Discovery," this year that showed a median cost for collection of $910 per gigabyte, $2,931 per gigabyte for processing, and $13,636 for reviewing a gigabyte of data.
Plugging these numbers into the $20 million spent for a corporation above, we arrive at a "probability of review" for a given message of about .1 percent. Therefore, for every terabyte of key data, we see an EDD "tax" of 1 gigabyte or about $15,000 when the costs of collection, processing, and review are tallied. If we add the $5,000 in hard costs from the IT figure above, we arrive at about $20,000 per year.
However, for the purposes of this analysis let's set aside the EDD costs. Finance departments often struggle to properly account for projected costs that are probabilistic, discounting these costs to the "best case scenario." Considering solely the IT costs of $5,000 per terabyte, some rather ominous mathematical calculations begin to take shape.
According to the Compliance Governance and Oversight Council, the amount of data that an organization could defensibly dispose of is staggering. The Council's postulate is that information must be retained for three reasons: 1) it is subject to legal hold, 2) it is subject to a regulatory requirement, or 3) it is valuable for business purposes. According to CGOC, about 5 percent of information is subject to regulatory obligations, about 25 percent of corporate data is of business value, and only about 2 percent is subject to legal hold.
Assuming "safe margins" in that it is somewhat difficult to separate wheat from chaff even with the highest level of will and technology let's round that up to 50 percent. If 50 percent of corporate data is of no value and carries no obligation, it represents tremendous opportunity for savings. In a company with 10 petabytes of data, 5,000terabytes are candidates for disposal. When the cost per terabyte is juxtaposed against the percentage of data that must be retained, stark conclusions appear.