Illustration: James Steinberg
In the wee hours, a beat cop sees a drunken lawyer crawling around under a streetlight searching for something. The cop asks, "What's this now?" The lawyer looks up and says, "I've lost my keys." They both search for a while, until the cop asks, "Are you sure you lost them here?" "No, I lost them in the park," the tipsy lawyer explains, "but the light's better over here."
I told that groaner in court, trying to explain why opposing counsel's insistence that we blindly supply keywords to be run against the email archive of a Fortune 50 insurance company wasn't a reasonable or cost-effective approach to electronic data discovery. The "Streetlight Effect," described by David Freedman in his 2010 book Wrong, is a species of observational bias where people tend to look for things in the easiest ways. It neatly describes how lawyers approach e-discovery. We look for responsive electronically stored information only where and how it's easiest, with little consideration of whether our approaches are calculated to find it.
Easy is wonderful when it works; but looking where it's easy when failure is assured is something no sober-minded counsel should accept and no sensible judge should allow.
Consider "The Myth of the Enterprise Search." Counsel within and without companies and lawyers on both sides of the docket believe that companies have the ability to run keyword searches against their myriad silos of data: mail systems, archives, local drives, network shares, portable devices, removable media and databases. They imagine that finding responsive ESI hinges on the ability to incant magic keywords like Harry Potter. Documentum Relevantus!
Though data repositories may share common networks, they rarely share common search capabilities or syntax. Repositories that offer keyword search may not support Boolean constructs (queries using "AND," "OR" and "NOT"), proximity searches (Word1 near Word2), stemming (finding "adjuster," "adjusting," "adjusted" and "adjustable") or fielded searches (restricted to just addressees, subjects, dates or message bodies). Searching databases entails specialized query languages or user privileges. Moreover, different tools extract text and index such extractions in quite different ways, with the upshot being that a document found on one system will not be found on another using the same query.
But the Streetlight Effect is nowhere more insidious than when litigants use keyword searches against archives, email collections, and other sources of indexed ESI.
That Fortune 50 company call it All City Indemnity collected a gargantuan volume of email messages and attachments in a process called "message journaling." Journaling copies every message traversing the system into an archive where the messages are indexed for search. Keyword searches only look at the index, not the messages or attachments; so, if you don't find it in the index, you won't find it at all.
All City gets sued every day. When a request for production arrives, they run keyword searches against their massive mail archive using a tool we'll call Truthiness. Hundreds of big companies use Truthiness or software just like it, and blithely expect their systems will find all documents containing the keywords. They're wrong … or in denial.
If requesting parties don't force opponents like All City to face facts, All City and its ilk will keep pretending their tools work better than they do, and requesting parties will keep getting incomplete productions.
To force the epiphany, consider an interrogatory like this:
For each electronic system or index that will be searched to respond to discovery, please state:
A. The rules employed by the system to tokenize data so as to make it searchable.
B. The stop words used when documents, communications or ESI were added to the system or index.