And you thought the issues swirling around electronically saved information (ESI) were difficult to understand and even harder to manage. Now throw big data into the mix, and it becomes clear that attorneys have their technological work cut out for them. To simplify as much as possible, there are seven core issues concerning the interplay of ESI and big data that attorneys today must study and comprehend: storage structure; balance between big data analytics and document destruction; importance of centralized data sources; use of big data tools with integrated litigation hold and ESI functionality; applicability of technology-assisted review and predictive coding; management of ESI created in real-time; and proportionality.

Storage structure

Finding the right system of record is particularly important in the context of big data, given the sheer quantity of information at issue. Attorneys should ask clients about how data is stored and retained. Creating a map of client data will enable more effective action if and when litigation begins. As part of the initial review, attorneys should identify the “systems of record.” A system of record is the central data source; data might be stored in other places as well, such as mobile phones, archive storage, or tablets, but the system of record is the most central data source. By locating the right system of record, you can streamline your preservation and collection efforts and ignore truly redundant data sources.

Balance between big data analytics and document destruction

The big data revolution has created an incentive for clients to keep and store much more data. But by saving (or over-saving) fodder for big data analytics, companies are creating a new e-discovery challenge: keeping discovery costs manageable. The more data that exists, the more time-consuming and expensive it is to preserve, collect and review when litigation arises. With the help of counsel, companies need to reach a balance between retention and destruction. Savvy lawyers have started to advise their clients to delete as much unusable data as possible in a careful and systematic way that ensures no potentially relevant data is deleted once litigation has become reasonably foreseeable.

Importance of centralized data sources

A key aspect of effective data preservation and collection is the use of centralize data sources. Having all data of a certain type contained in one place makes it is easier to find, identify, preserve and collect. Especially in the era of big data and vast accumulation of ESI, lawyers should encourage clients to establish a centralized data source from the start. This will make it much easier to manage data in the event of litigation.

Use of big data tools with integrated litigation hold and ESI functionality

Many newer relational database tools now have “baked-in” litigation hold and ESI capabilities, which make it much easier to preserve and manage data during a lawsuit. Some creators of big-data applications are likely to do the same — but not all. People who develop big data applications are not concerned with having to produce the same data in a lawsuit. Their focus is on getting value from analyzing the data and doing it as quickly and inexpensively as possible. Thus, advise clients to use big data tools that have litigation hold and ESI capabilities “baked” into the application.

Applicability of technology-assisted review and predictive coding

Big data analytics are already are being employed in technology-assisted review (TAR), otherwise known as predictive coding. When using TAR, an attorney or team codes a “seed set” of documents, and the TAR program uses the seed set to predict how a reviewer would classify the document. In the past year, prominent courts have approved the use of TAR to conduct discovery. Litigators must understand the advantages and disadvantages of using TAR in a given case. And TAR is not just for data collection and production anymore. One major TAR vendor is launching a new product that can be used for document retention and management, particularly for companies using big data. A seed set is created for the type of documents that an enterprise wants to retain, and the system independently searches internal networks for relevant documents.

Management of ESI created in real-time

One of the most difficult aspects of managing ESI during active litigation is preserving, collecting and reviewing new data that is created after the litigation hold has been issued. This difficulty is compounded by big data applications that introduce loads of new data into the system in real time. Lawyers need to understand what data is being fed into the system, the sources, its format, what changes occur after it enters the database, and whether a big-data database is the system of record. In some circumstances, a report from a database can be used instead of preserving and producing a huge amount of raw data.

Impact of big data on proportionality

The Federal Rules address proportionality and basically state that the burden of a particular discovery request has to bear some relation to the size, scope or severity of the issues at stake in the litigation. Proportionality is a very hot issue in the ESI world, and new amendments to Rule 26 may expand the principle to the preservation of potentially relevant ESI. It is not difficult to imagine a litigants trying to leverage the cost of preserving and producing big data to force early, unjustified settlements. Defense lawyers must be ready to challenge discovery of big data by arguing that the cost of producing this information is disproportionate to the nature of the case, the amount in controversy, and the issues at stake in the litigation. When faced with an asymmetrical big data-ESI burden, create a plan for using the principle of proportionality to fairly limit your client’s discovery obligations.

It’s an open secret that many lawyers have tried to avoid learning about the technical aspects of electronically stored information (ESI). But how can a lawyer help clients with document preservation if he doesn’t understand the structure of that client’s data? How can a lawyer negotiate format issues with adversaries if she doesn’t understand the specifics of the data being sought or produced? And how can a lawyer prevent an adversary from taking advantage of an asymmetrical burden, if she cannot apply Rule 34 of the Federal Rules of Civil Procedure to big data? In a world where data is king and the vast majority of data is electronic, ignorance is no longer an option.