The Electronic Discovery Institute (EDI), a non-profit devoted to e-discovery issues, ran into a snag while conducting an in-depth study on the effectiveness of computer-assisted document review in 2007. The study required participating software vendors to load a control set of data onto their systems to benchmark the effectiveness of their products. However, each participant required unique load file formats for its proprietary platforms.

“The EDI team spent four months and more than $15,000 to structure data in a format that was acceptable
to the study participants’ database platform,” says Patrick Oot, director of e-discovery at Verizon and a member of EDI’s advisory board. “Had the software providers been able to ingest a standardized data set, EDI would have saved significant resources.”

EDI’s problem is one that many legal departments face when transferring data from one type of software to another during e-discovery. What it comes down to is a lack of a common language and vocabulary. The data format for a review platform might be completely different than the format for a production application. This means counsel have no way to seamlessly migrate data from one e-discovery phase to the next, wasting valuable time and money.

But all this might be changing. In late October EDRM, a group of e-discovery thought leaders that includes vendors, consultants and legal departments, released a new standard that may revolutionize the transfer of data from one application to another during discovery. That new standard is based on the programming language XML (extensible mark-up language).

“The idea is that if we have folks that are all capable of delivering material with a single schema, it will make it much easier, faster and less error-prone to move that data along,” says George Socha, e-discovery
consultant and co-founder of EDRM.

One Language
XML is a computer language developed to allow the sharing of structured data across different software and systems.

To accomplish this, XML uses tags to define fields for segments of data. For example, a user could mark someone’s name in a document with an XML tag that signifies that set of text represents someone’s name.
XML tags typically remain hidden from the end user. This means that unless you’re a programmer, you’re not even going to notice them.

In fact many commonly used applications use XML, including Microsoft Outlook. In this application, the date an e-mail was sent is marked off with XML tags like this: January 1, 2008.

The advantage of using XML is that software that understands XML can easily migrate data or pieces of data to another piece of software that understands XML without anyone having to manipulate that data. For example if someone wanted to compile a database of dates that e-mails were sent from Microsoft Outlook, it would be relatively easy to migrate all data tagged as “sent” into an XML-based database.

However there is one problem with XML. It is extensible, which means users can define and name their own tags. This means that even if two programs understand XML, they might not use the same vocabulary.

For example if program X recognizes the date an e-mail was sent with the tag “sent” while program Y recognizes the same information with the tag “date sent,” someone has to intercept the data between program X and program Y to manually change the tags.

To solve this problem, EDRM has created a universal vocabulary for all e-discovery vendors so that all systems tag the same fields within a document with the same tags and attribute the same meanings to the tags. For example under EDRM’s XML standard, vendors will now recognize and tag the date an e-mail was sent with the tag “DateSent.”

“By using the XML schema, we can allow people to build systems that will pass data along without requiring people to get hands-on with the data and make mistakes as they do it,” Socha says. Socha expects at least 20 companies to adopt the standard, which is voluntary, by February.

Freedom of Choice
Reducing the number of mistakes made due to human intervention is one of the top goals of the XML standard project. Because the current method of migrating data necessitates human intervention to convert files from one
format to another, errors frequently occur. The standard would remove this human element, thereby reducing
the number of errors–many of which go undetected.

“The worst-case scenario is that something tagged privileged could lose that tag when information is being exported from one application to another,” says J.R. Jenkins, group product manager for Attenex, a legal technology company involved in creating the XML standard.

The other benefit of the XML standard is that it will allow legal departments more freedom in choosing with which vendors to contract. Currently many vendors try to bundle their software together by making their products easily compatible with one another.

This means legal departments don’t always get the right product for the job or the best product for the lowest cost. But with all software sharing a common language, it might be the lawyers who finally get to call the shots.
“This standard means in-house counsel won’t be held hostage by one platform anymore,” says Michael Harnish, vice president, service delivery at Fios Inc., a legal technology consultancy.