Is there any form of ESI production worse than .tiffs and load files? If you’ve experienced the ease of e-discovery with tools purpose-built for native review, you know what I’m talking about. Once you "go native," you’ll never go back!
By native, I mean data in the original electronic formats the producing party uses for, e.g., email, word processing, spreadsheets and presentations.
A native file is inherently electronically searchable and functional until it’s converted to .tiff images, when it loses both searchability and functionality. It’s like photographing a steak. You can see it, but you can’t smell, taste or touch it, you can’t hear the sizzle, and you surely can’t eat it.
Because converting to .tiff takes so much away, parties producing .tiff images attempt to restore a measure of electronic searchability by extracting text from the electronic document and supplying it in a load file accompanying the .tiff images. A recipient must then run searches against the extracted text file and seek to correlate the hits in the text to the corresponding page image. It’s clunky, costly and incomplete.
The irony of .tiff and load file productions is that what was once a cutting-edge technology has become an albatross around the neck of electronic data discovery. To understand how we got to this unenviable place requires a brief history lesson.
Before the turn of the century, when most items sought in discovery were paper documents, .tiff and load file productions made lawyers’ lives easier by grafting rudimentary electronic searchability onto unsearchable paper documents. Documents were scanned to .tiff images and coded by reviewers, and their text was extracted via optical character recognition (OCR) software. It was expensive and crude, but speedier than poring over thousands or millions of pieces of paper.
The coding and text had to be stored in separate files because .tiff images are just pictures of pages, incapable of carrying added content. So, in "single page .tiff" productions, each page of a document became its own image file, another file held aggregate extracted OCR text, and yet another held the coded data about the data, i.e., its metadata.
The metadata would include information about the content and origin of the paper evidence, along with names and locations of the various images and files on the media (i.e., CD or DVD) used to transmit same. Thus, adding a measure of searchability yielded a dozen or more electronic files to carry the pieces of a 10-page document.
To put Humpty Dumpty back together again demanded a database and picture viewer capable of correlating the extracted text to its respective page image and running word searches. Thus was born a new category of document management software called "review platforms." Because the files holding the document’s OCR’ed text and metadata were destined to be loaded onto a review platform, they came to be called "load files."
Different review platforms used different load file formats to order and separate information according to guidelines called load file specifications. Load files are plain text files employing characters called delimiters to separate the various information items in the load file. Thus, a load file specification might require that information about a document be transmitted in the order: Box No., Beginning Bates No., Ending Bates No., Date, and Custodian. The resulting single line of text delimited by, e.g., commas, would appear: 57,ABC0003123,ABC0003134,19570901,Ball C.
Load files were a headache. But we put up with the pain because adding searchability to unsearchable paper documents was worth it. A stone ax is better than no ax at all.
Because large document cases and attorney review pyramids were integral to law firm growth and profitability, lawyers invested in .tiff review platforms, and service providers emerged to compete for lucrative scanning and coding work. The electronic data discovery industry was born, circa 1987.
Fast forward to 2013, and hardly any documents are born on paper. Today, we seek electronically stored information, viz., email, word-processed documents, spreadsheets, presentations and databases. With paper, what you see is what you get.
By contrast, ESI divides its digital goodness between information readily seen and information requiring a mouse click or two to view. Documents are layered, multi-media and multi-dimensional, and much ESI defies characterization as a document. Replete with embedded formulae, appended comments, tracked changes, and animated text, ESI thumbs its nose at the printed page.
Despite a sea change in what we seek to discover, lawyers resolutely refuse to embrace modern forms of production. They cling to .tiff imaging and load files, downgrading ESI’s inherent searchability and eviscerating the multi-dimensional character of ESI. Thus, an obsolete technology that once made evidence easier to find now deep sixes probative content.
Producing parties dismiss this lost content as "just metadata," as if calling it metadata makes it something you’d scrape off your shoe. In fact, they fear such "metadata" may reveal privileged attorney-client communications (which should clue you in that it’s more than just machine-generated minutiae). Producing parties have blithely and blindly been erasing this content for years without legal justification or disclosure in privilege logs.
When a producing party insists on converting ESI to .tiff images over a requesting party’s objection, they often rely on Federal Rules of Civil Procedure 34(b)(2)(E)(ii), which obliges parties to produce ESI in "the form or forms in which it is ordinarily maintained or in a reasonably usable form or forms." Courts have struggled with the notion of "reasonably usable," but haven’t keyed into the fact that .tiff imaging destroys user-generated content. Producing parties are happy to expunge content that may hurt their position and to postpone purchasing software supporting native review, so they’ve gotten good at making the case against native production.
Requesting parties seeking native production back down too easily because they’re desperate to get moving and uncertain how to make the case for native production. Courts tend to be swayed by the argument, "We’ve always done it this way," without considering why .tiff imaging came into wide use and why its use over objection has become unfair, unwise, and wasteful.
The case against native usually hinges on four claims:
1. You can’t Bates label native files.
2. Opponents will alter the evidence.
3. Native production requires broader review.
4. Redacting native files changes them.
Each claim carries a grain of truth swaddled in bunk. Let’s debunk them:
1. You can’t Bates label native files. Nonsense! It’s simple and cheap to replace, prepend or append an incrementing Bates-style identifier to the filename of all items natively produced. An excellent free file renaming tool is Bulk Rename Utility, available at www.bulkrenameutility.co.uk. You can include a protective legend, such as "Subject to Protective Order" in the name; and, no, renaming a file this way does not alter its content, hash value or last modified date. If the other side grouses that it’s burdensome to change file names to Bates numbers, remind them they’ve long used Bates numbers as file names in .tiff image productions.
It’s indeed difficult to emboss Bates numbers on every page of a native file until it’s printed or imaged. Yet many forms of ESI (e.g., email, spreadsheets, social networking content, video and sound files) don’t lend themselves to paged formats and will never be Bates-labeled.
We don’t put exhibit labels on every item produced in discovery because only a tiny fraction of production will be introduced into evidence. Likewise, little ESI produced in discovery is used in proceedings. When it is, simply agree that file names and page numbers will be embossed on images or printouts.
Sure, file names can be altered, but changing a Bates number or removing a protective legend from a .tiff image or printout is child’s play using software found on any computer. Demanding that Bates labeling for ESI be tamperproof is demanding more than was required of .tiff or paper productions.
2. Opponents will alter the evidence. Alteration of evidence is not a new hazard, nor one unique to ESI. We never objected to production of photocopies because paper is so easy to forge, rip and shuffle. Tiffs are just pictures, principally of black and white text. What could be easier to manipulate in the Photoshop era?
Though any form of production is prey to unscrupulous opponents, native productions support quick, reliable ways to prevent and detect alteration. Simply producing native files on read-only media (e.g., CDs and DVDs) guards against inadvertent alteration, and alterations are easily detected by comparing digital fingerprints of suspect files to the files produced.
Counsel savvy enough to seek native production should be savvy enough to refrain from poor evidence-handling practices like reviewing native files using native applications that tend to alter the evidence.
3. Native production requires broader review. Native forms hold content (such as animated text in presentations and formulae in spreadsheets) added by users but not visible via .tiff. But animated text and formulae aren’t what concern your opponent.
The other side worries most about embedded commentary in documents — those candid communications between users and collaborators that are quietly stripped away when imaged. From an evidentiary standpoint, these aren’t different from Post-It notes or email between key custodians.
It’s crucial to help the court understand that the information stripped away is user-contributed content, and that a form of production isn’t reasonably usable if it destroys the information. If opposing counsel argues they put some of the excised content into load files, that’s disingenuous: If you cannot see a comment or alteration in context, its meaning is often impossible to divine.
Your opponents may also be reluctant to concede their obsolete tools don’t show contemporary content. Fearful that your tools might show content their tools miss, they jettison content rather than upgrade tools.
4. Redacting native files changes them. Indeed, that’s the whole idea. So the argument that the integrity of native productions will be compromised by removing privileged or protected content is silly! Instead, the form of production for items requiring redaction should be that form or forms best suited to efficient removal of privileged or protected content without rendering the remaining content wholly unusable.
Some native file formats support redaction brilliantly; others do not. In the final analysis, the volume of items redacted tends to be insignificant. Accordingly, the form selected for redaction shouldn’t dictate the broader forms of production when native forms have such distinct advantages.
Don’t let the redaction tail wag the production dog. If they want to redact in .tiff or PDF, let them, but only for the redacted items and only when they restore searchability after redaction.
Cast off the albatross! Tiff production had its day. Now, .tiff dumbs ESI down to the level of paper just so we can use old, familiar tools and workflows. Native production isn’t simply better, it’s cheaper, too. Why pay to convert native forms to .tiff and load files? Smaller native file sizes also trim the cost of ingestion and storage.
Tiff: You get less, pay more and destroy evidence to boot. Isn’t it time to go native?
Nowhere on " La Joconde" — that most famous of all Leonardo da Vinci master-works — does it say "Mona Lisa." Yet, despite theft and 100 years of efforts to "redact" her, using everything from acid to teacups, the world knows Mrs. Giocondo without stamping "Mona Lisa" across her enigmatic smile. Neither must we downgrade or deface electronic evidence produced in its native forms to retain the benefits once derived from Bates-stamping paper documents.
A native file can be given almost any name without altering its contents or changing its hash value (aka its digital "fingerprint"). So, it’s fast, free and easy to rename an electronic file to carry any Bates-style identifier — even a legend like "Produced Subject to Protective Order" — so long as the length of the name stays under 255 characters.
When replacing a file’s name (versus prepending or appending an identifier in the name), preserve and produce a record of the original and substitute names.
Establish an identification naming protocol where, e.g., the first four characters identify the producing party; the next nine are reserved to a unique, sequential numeric value (padded with leading zeroes); and the final five include a separator (i.e., hyphen) and a four-digit number reflecting pagination that is required to be embossed only when the file must be printed to paper or reduced to an image format for use in proceedings or as exhibits.
If you include a truncated hash value in the filename (e.g., the first and last four digits of the file’s MD5 hash value), all parties gain a portable, reliable means to confirm the electronic file is authentic, unchanged and properly paired with the right name cum Bates identifier. You can’t do that with printed Bates numbers!