One summer, during my college years, I painted houses for an old hippie I’ll call Jake. Somehow, Jake came up with the idea that he should direct market his painting services to owners of houses that were both old and valuable, as they were likely to be "heritage homes" with wood cladding. So Jake went down to City Hall to get access to property tax records.
The city allowed the public to view microfilmed records for free, or they could jump through a series of bureaucratic hoops and pay thousands of dollars to purchase a database. Jake couldn’t afford the database, and even if he could have, he would have refused to pay for public information on principle. But his direct-marketing plan wouldn’t work if he had to sit at the city’s microfilm reader and transcribe each record one at a time.
So, he decided to build his own database. The city microfilm facility was rarely supervised, so he would visit every few days and stuff his sweatpants with as much microfilm as they would hold. He then would waddle home to his newly purchased microfilm reader, where a newly hired student would transcribe the records into his new database.
Once transcribed, the microfilm would go back into his sweatpants, back to the city, and the process would repeat itself. This genius plan was still going on when I went back to college, so I never found out how the story ended. Maybe Jake is still ferrying microfilm around in his sweatpants.
I thought of Jake recently when reading about the controversy triggered by a newspaper’s decision to publish the names and addresses of handgun permit holders in suburban counties near Newtown, Conn., the site of the Sandy Hook Elementary School shootings. The Journal News of White Plains, N.Y., published an interactive map showing where permit holders live as a response to the shooting. The map was based on publicly available information — data required by state law to be available.
However, as the paper found out — and as Jake knew — there is an important difference between information that is simply available and information that is easily accessible. The newspaper received many threats, and hired armed guards for its headquarters. Christopher Fountain, on his For What It’s Worth blog, punched back by creating a similar map showing where newspaper employees live, and some employees received death threats.
Interviewed by The New York Times, Steve Doig, a journalism professor at Arizona State University, said, "The Journal News, I personally think, should have rethought the idea as actually going so far to identify actual addresses." He continued: "This particular database ought to remain a public record. Just because it’s available and public record doesn’t mean we have to make it so readily available."
So, what does "readily available" mean in our age — an age when a huge chunk of all human knowledge can be accessed in seconds using a device you carry around in your pocket? Does better access to information simply offer speed and convenience, or does it offer something more profound?
Both these stories — of a house painter and a newspaper — show that making information easier to access can allow it to be used in ways never intended. In Jake’s case, it meant that property tax records could be used for direct marketing. In the newspaper’s case, it meant that gun-permit records could be used to make a political point. This is not the reason that the law requires governments to create and retain public property tax and gun-permit records. But navigating these questions will only grow more complex in the age of Big Data.
The promise of Big Data is based on a central assumption: that information will be easily, quickly and cheaply available, on a grand scale. The plumbing of Big Data — the technology infrastructure — is designed to bring Internet scale to enterprise data. Some of the surprising insights that data scientists hope to gain from Big Data analytics come from correlating information from disparate sources, in a context that was never imagined when the information was first created — such as correlating the type of computer used to book a trip with how much a traveler is willing to pay for a hotel room. Or using prescription drug history to screen health insurance applicants.
The problem of protecting privacy, intellectual property and other rights will only grow more complex as our ability to access and process information becomes more sophisticated. These stories illustrate what can happen when records created for one purpose are used for another. But what happens when we turn the Big Data machine toward information that was not created on purpose at all?
Most of us today generate large volumes of information accidentally, simply as a byproduct of our daily working and personal lives. Most of us do not sit down at the end of the day and write a comprehensive account of our day in a diary. Instead, the email messages, meeting requests, calendar appointments, geolocation data from our mobile phones, text messages, pictures, data feeds from our Internet-connected fitness wristbands, and dozens of other pieces of data create that diary for us — whether we know it or not.
Even the least connected among us use email. Email was explicitly designed to help us communicate informally with other people, not to negotiate contracts, create CYA records of our interactions with a troublesome supplier, document child custody snafus with an ex-spouse, or deliver records to a regulator. Yet, we use email for these and thousands of other important business and personal purposes.
Through the process of communicating, we incidentally rather than purposefully create a multitude of records. These and other incidental records have a clear downside, as litigant after litigant has discovered in court when a so-called "smoking gun" email emerges. But they also have potential value, especially if they can be analyzed en masse for meaning.
Social media are no different. The purpose of a tweet is to communicate something in less than 140 characters, but looking at millions of tweets together can reveal — and even predict — surprising patterns, such as the geographic progression of the flu across a nation or a national revolution, as was the case in the Arab Spring.
The volume and variety of incidental records logically can only be expected to grow as ever more of our interactions occur digitally, through mobile devices. Big Data business models based on analyzing and delivering insight from this information can be expected to grow alongside.
But what happens if users start to object? What happens if the creators of all this incidental information want to take back some control over who sees it, and how it is used and monetized? Over the past year, a class of software has emerged that is designed to give the user such control.
For example, Snapchat, a popular mobile application, helps users communicate by sending photos back and forth. But the images self-destruct — the user can select the self-destruct period, from one to 10 seconds. The application gained a reputation as a tool for teenagers to "sext" using photos, although a recent survey of users aged 18 to 29 found that only 13 percent indicated that they use the app for that purpose.
In any case, the appeal of Snapchat and similar applications — for example, Facebook recently launched a copycat application called "Poke" — is that users keep control of their data.
A Snapchat founder put it succinctly: "There is real value in sharing moments that don’t live forever." Yes, there are some workarounds — for example, users can take a screenshot, although in Snap­chat this generates an alert to the sender, and in some cases video files can be viewed using a phone file system browser — but a measure of control seems to be attracting users.
The idea of self-destructing messages is not new. In the 1990s, a company called Disappearing Inc. offered an email service with evaporating messages. A survivor from this era, called Hushmail, appears to still be operating.
And there appears to be some new momentum. Philip Zimmermann, the creator of one of the most widely used encryption protocols for secure messages, PGP, aka Pretty Good Privacy, recently launched a startup called Silent Circle. It promises to offer secure mobile communication, including "burn notices" for text messages.
Other recent startups offer similar services, such as:
• Burn Note ( burnnote.com), self-destructing email.
• VaporStream ( vaporstream.com), recordless messaging services for enterprises.
• Wickr ( www.mywickr.com), self-destructing texts, pictures and video.
• Gryphn ( http://gryphn.co), self-destructing text messages with screenshot capability disabled.
Do these tools represent a trend that threatens the promise of Big Data, by making the massive volumes of useful, incidental information simply disappear? While they may indicate an appetite for secrecy, it seems barely measurable compared with, say, the massive growth of Facebook and other social networks.
In addition, their use might come with a cost — a possible inference of guilt by association that could be exploited by the other side in litigation.
Barclay T. Blair is the principal of ViaLumina LLC, an information governance consultancy. A version of this article appeared in NLJ affiliate Law Technology News.