Cloud computing—once a cutting-edge trend among high-tech startups—is transforming the way corporations nationwide are doing business. Virtually every company has data in the cloud, whether it knows it or not. And the trend isn’t limited to startups. Cisco Systems Inc., Dell Inc., and Motorola Solutions Inc. are just a few of the Fortune 500 that are increasing their use of cloud-based applications. 

Cloud computing applications save companies money by using computer resources more efficiently. These applications also enable employees to work remotely and collaborate on files with colleagues with ease. But despite the enormous promise of cloud computing, the technologies are creating a host of headaches for IT and legal departments, which are grappling for the first time with conducting discovery in this new space.

Search and Collection

The paradigm in the legal services world is that the party conducting the discovery controls the data. For instance, a corporate litigant in a dispute with a competitor might collect data from its own servers, its employees’ hard drives and paper files. That data can then be transferred into a searchable database through which responsive documents can be identified. But with corporations and employees increasingly storing data in the cloud, companies no longer have that degree of control over the data they generate. 

The first challenge is figuring out how to collect the data from these remote sources. Many cloud technology providers never created mechanisms for users to retain or export the data that they create. For instance, some popular web-based spam filters lack ways for customers to change their retention schedule or delete data. Likewise, some networking sites don’t permit users to export data from their accounts to local machines.

That immediately raises a second thorny issue—how to search the data that cannot be exported within the providers’ systems. Keyword searching is the most common way to collect and cull data in discovery. Typically, the parties agree on a set of search terms and produce data that contains those terms. Many cloud-based systems do not support the sophisticated Boolean searches that are commonplace in discovery. Some may not distinguish an “and” operator from an “or,” and other systems cannot process searches that seek certain search terms in a given proximity from one another. Moreover, many web-based applications may not search deep enough within the data for litigation purposes. For instance, the search may only review the bodies of emails and not data contained in email attachments.

The first and most important line of defense against these risks is to involve legal professionals in the process of selecting cloud-based providers. Businesspeople and IT professionals consider the cost, functionality and security of the service. Legal can have the foresight to consider the company’s needs should litigation arise, thus easing some of these discovery concerns before they become potential minefields for spoliation claims. Even providers that do not have sufficiently robust search capabilities internally can be workable from a litigation standpoint if they provide an efficient mechanism to get around those limitations by exporting the data. Raising those concerns before a company begins using a new provider can avoid issues down the line.

Social Media

The increased mingling of workers’ personal data with corporate data adds another layer of complexity to the discovery process. Social media has become an important source of data that can be relevant in many types of litigation. For instance, in a noncompete case in which an employer alleges that an ex-employee uses LinkedIn to wrongfully solicit its customers and employees before leaving the company, those individuals’ interactions on social media sites may be discoverable.

Collecting this data can be problematic because social media sites are designed for access by the end user—i.e. the individual Facebook or Twitter subscriber—and not by an organization such as the user’s employer. Managing this problem takes proactive planning. Make sure you can access this data if the need arises. Have employees agree in writing in advance that they will cooperate with allowing the company to preserve and collect data from their social media and personal email accounts when necessary. While such a policy would not prevent a rogue employee from deleting data that he doesn’t want his bosses to see, it at least gives the employer a colorable defense to claims that the data destruction was deliberate or willful, which can be key to avoiding sanctions.

Cost Control

There is a tendency for civil litigants to make their document requests overly broad when dealing with electronic data based on the erroneous assumption that because the data is stored digitally, it should be easy to obtain, review and produce. In reality, when collecting files from multiple sources of data stored in the cloud, e-discovery can rapidly become a hugely burdensome undertaking, even in a case in which the claims are narrow and the potential damages are small.

Counsel can control the cost and scope of these reviews by negotiating with opposing counsel at the outset to determine what data sources will be searched and how those searches will be conducted. A corporate litigant can defensibly define a relevant dataset by considering the probability that a given source will yield material, nonduplicative documents and evaluating that probability against the cost of conducting the search—and in some cases, shifting that cost to the opposition.

Defensible Processes

Finally, another challenge is creating a defensible process for identifying responsive documents from the massive amounts of data stored in the cloud.

Although no one wants to collect or review more data than they need to, the best practice when dealing with cloud-based data sources is to preserve broadly—extract more data than you will need in the litigation. Transfer that large dataset to an environment that is under the company’s control, such as the company’s own document review software. Then search and cull responsive documents only from that dataset rather than returning to the cloud provider and conducting searches on their servers.

The advantages of that approach are twofold. One is that you have a static dataset from which you are culling responsive documents. Because the company does not control the data on cloud providers’ networks, there is nothing to stop the provider from changing the way it stores and retains data. Therefore, the same search conducted within the cloud-based service providers’ system may produce different results on different days. The second advantage is that a search conducted on a static dataset is more easily defensible because it can be recreated and reviewed. The same search steps will always produce the same result. Being able to walk through each step of the process of how you collected the data and ruled out certain documents will be essential if the process is ever called into question.