Lawyers are taking greater notice of the legal, ethical and technological issues around Big Data. In addition to the mountains of data that have always been part of the legal industry, the explosion of digital and social-media content has created a unique set of challenges. Law firms and attorneys are looking for new approaches to help them make sense of massive amounts of data. A promising application for Big Data analysis is in electronic discovery, where fast, high-performing data analytics can substantially reduce the time and cost of preparing for a case.

Discovery has always been an intensive process. The traditional approach was largely reactive, waiting until discovery was required before beginning the arduous task. A common sight was attorneys poring over boxes of documents.

Although technology has helped streamline the discovery process, it has also led to a huge increase in the volume of data that must be assessed. The amount of digital data an organization creates, stores, shares and receives is increasing exponentially, making e-discovery even more time-consuming and expensive. According to a study by analyst firm International Data Corp., the world’s data supply doubles every two years. John Gantz & David Reinsel, “Extracting Value from Chaos,” International Data Corp., June 2011, at 1. For litigators, sifting through and making sense of terabytes of data is a daunting process. What’s more, the costs and legal risks associated with e-discovery are greater than ever.

One of the primary reasons e-discovery expenses are soaring is the increased use of large, complex data sets. Businesses in health care, financial services and related industries are required to keep data for long periods of time to comply with federal regulations. This often includes large, image-only data, proprietary data and legacy data in nontypical data sets. Additionally, innovations in data storage are contributing to the overall increase in digital data. In an age of considerable legal and compliance risk, and with storage more affordable, organizations are inclined to keep everything. Easy-to-use Internet or cloud-based platforms make it possible for organizations to store extraordinary amounts of electronic information for long periods at relatively low cost.


The unprecedented growth of unstructured data represents a significant challenge for e-discovery. Structured data are fairly straightforward — they are digital information organized in a database, with data that are easily identifiable and relatable. Given that they are ordered according to the parameters of the database, structured data are simple to access and sort during an e-discovery search. According to International Data, however, structured data make up only 20 percent of all potentially usable business information. Gantz & Reinsel, supra, at 2.

Unstructured data comprise the vast majority of corporate data today. They include emails, word documents, instant messages, tweets, blog posts and other digital communications. Employees in nearly every industry have immense electronic trails of unstructured data from the digital office tools they use daily. In addition, organizations often have multiple copies of unstructured data, such as shared documents and emails sent to various people within a company and to external customers, partners and vendors. All of these unstructured data, which are shared and stored, both inside and outside of an organization, can be subject to e-discovery during corporate litigation. This staggering amount of data contains text and images that are not structured for a database and are much more difficult for e-discovery software to access and evaluate.

Firms and attorneys need to identify and extract meaning from all relevant data sources — structured and unstructured — to make the best possible decisions for their clients. While it is essential that any e-discovery search include both structured and unstructured data sets, not many organizations have the technical tools and expertise to apply the same degree of sophistication to the analysis of unstructured data. This puts organizations at risk of missing key information or, worse, facing charges of noncompliance or sanctions for not including all the information required in a particular case.

Lawyers need to work faster and smarter, leveraging new tools and resources to provide the best possible counsel. Attorneys must gain control over all types of data in a case so that they can determine what has to be processed and reviewed, and how to find information relevant to their case.

There are several best practices for conducting e-discovery more efficiently while controlling costs in the Big Data environment.

• Rules. Establish a clear set of rules for e-discovery from the start. Appoint a project manager to oversee and coordinate the process, and involve personnel from multiple departments, including legal, information technology and records management. Establish a procedure for documenting extracting reports.

• Prioritize. All data are not created equal. Prioritize by creating a schedule for running specific queries; this will help make the process more manageable. Perform a gap analysis to determine if information is missing and establish realistic expectations about the amount of time required to gather all necessary data.

• Strategy. Develop a strategy for information governance that addresses the creation, use, storage and deletion of corporate information aligned with the company’s business, legal, regulatory and data-privacy requirements. Information governance should include a records-management policy to deal with the retention, classification, archiving and destruction of digital and paper records. A typical policy might detail records categories, a retention schedule and a records-destruction program. Train staff in these guidelines. When executed properly, records-management and information-governance plans help improve productivity and ensure compliance with federal and state regulatory requirements.

• Technology. E-discovery tools should help attorneys determine where information is stored, what that information is and whether it should be retained or destroyed in accordance with legal standards. When evaluating e-discovery procedures, consider which tools will be most effective at improving workflow and which can be integrated with the organization’s existing information technology infrastructure.


Cloud-based analytics tools are emerging as an affordable way to tackle Big Data. Cloud offerings don’t require costly on-site hardware and infrastructure and are inherently flexible, giving organizations a scalable environment for dealing with increasing volumes of data.

Additionally, open-source analytics platforms, continually enriched through the contributions of developers, have proven extremely fast at processing data. These technologies are highly efficient when it comes to Big Data because they were designed to make sense of information chaos. They pull massive amounts of structured and unstructured data into a refinery system and break it down quickly so attorneys have quick and easy access to relevant information.

The accelerating volume of data creation is outpacing the ability to manage it. This is a genuine concern in the ultracompetitive legal industry. The explosion of data is having a direct effect on litigation, in particular, where conducting e-discovery is more complex and costly than ever. Attorneys must find a way to harness and make sense of countless terabytes of data from myriad disparate sources — thoroughly, expeditiously and cost-effectively.

Big Data analytics can improve a law firm’s efficiency and productivity by helping attorneys navigate huge amounts of seemingly unmanageable data while reducing risks and streamlining the time-intensive e-discovery process.

Brian Ingram is head of litigation technology consulting at LexisNexis Group. He has more than 20 years of experience in e-discovery and litigation support at corporations and leading law firms.