As companies realize the benefits of big data on their research & development, marketing, sales, branding, and revenue growth, they will increasingly have to reckon with its risks. Utilizing and monetizing big data raises enormous legal questions and potential liabilities. The most salient of these legal issues, at least in the near term, revolve around privacy, regulatory compliance, and duty to intervene.

When companies analyze extremely large pools of data, they often attempt to protect the privacy of individuals through “anonymization,” the process of removing or replacing individual identifying information from a communication or record. Communications and records can be made complete anonymous by removing all identifiers or made pseudonymous by assigning each individual replacement identifiers, like a 10-digit code.

Of course, stories of incomplete or ineffective anonymization are rife. In one of the most infamous incidents, the Massachusetts Group Insurance Commission released “anonymized” data on state employees’ hospital visits in the mid-1990’s as part of a study. In order to prove the existing limitations of anonymization, then-graduate student, Latanya Sweeny, publicly identified Governor William Weld without difficulty. Continuing her work on this topic, Sweeney showed in 2000 that 87 percent of all Americans could be identified using only three data points: birthdate, gender, and zip code. 

In August 2006, AOL released three months of search queries by 650,000 of its users to the public, with the hope that this data would be useful for academic research. Despite AOL’s efforts to anonymize the data, many of the users could be identified based solely on the pattern and substance of their searches. This anonymization failure was widely reported by the media and sparked significant public backlash. In 2011, users of AOL’s website brought a class action suit against the Internet giant for disclosing search queries to the public. The action was settled on May 24, 2013, to the tune of nearly $6 million, along with a stipulation that AOL maintain policies and procedures to prevent future privacy leaks.

Similarly, in October 2006, Netflix released an “anonymized” database of 100 million movie ratings and offered $1 million to the first team who could use that data to “significantly improve” Netflix’s recommendation algorithm. Using publicly available user ratings in the Internet Movie Database (IMDb) for fifty Netflix members, researchers were easily able to identify to a statistical near-certainty two users in the Netflix database.

Although class actions based on data breaches and ineffective anonymization are exorbitantly expensive to pursue, litigation of this type will continue to mount. Companies should exercise the utmost caution when utilizing seemingly anonymized data or they might find themselves facing significant legal troubles.

One of the biggest challenges today for companies working with big data is that the regulatory regime is in a state of tremendous flux. Lawmakers and agency officials are trying to regulate technologies that are themselves changing on a daily basis, and they are trying to satisfy competing demands for privacy protection and commercial freedom. As the laws and regulations within the United States evolve, companies must be extremely attentive.  

And the domestic legal landscape is just one of many relevant jurisdictions. Every country has its own patchwork of laws and regulations that concern data and privacy. Keeping track of all of these laws in real time is nearly impossible. Merely keeping track of where the data resides is a job in and of itself. As data warehouses manage their load balance, they can, without users’ knowledge, shift data from one data center to another. Those data centers may be located in completely different parts of the world and each governed by a different regulatory scheme.

The difficulty of tracking data, managing data, and protecting privacy in an international economy will only intensify over the coming years. As the Internet grows and more people have access to mobile devices and broadband frequencies, data proliferation will increase. Workers and data will be utterly globalized. Governments will try to keep pace, and laws and regulations will abound. These issues will not be the province of privacy lawyers alone. Litigators in general will need to understand how to advise clients about privacy and data protection, how to access data that resides on foreign soil, and what rights they have to use “foreign” data in U.S.-based litigation.

Another very interesting, although nascent, legal question for corporate users of big data is whether the predictive capabilities of big data analytics impose greater duties to identify risks and intervene before incidents occur. In other words, if companies use big data analytics to look at historical data to predict where problems, accidents, or financial irregularities are likely to arise, do they have a greater duty to act to prevent problems before they cause injury? If so, how will big-data applications be used to prove that notice existed and that the company should have acted sooner? And if big data applications do not analyze the data correctly, are these providers liable for failing to identify the potential for injuries or unfortunate events?

Many of the questions raised here cannot be answered right now. And big data will undoubtedly give rise to other as-yet-unforeseen legal challenges over the next decade. It is important, however, for lawyers to be thinking about these issues and preparing clients for the legal realities of commerce in a big-data driven world.