When Pfizer Inc. began putting together an e-discovery strategy in 2003, it didn’t have many models to follow.

“At the time, there was very limited e-discovery going on,” says Laura Kibbe, senior counsel at Pfizer. “To the extent it was happening, parties would just use a simple exchange of keywords to search through relatively low volumes of data.”

But with talk of new federal guidelines dictating the rules of e-discovery, Kibbe foresaw an explosion in the burden and costs associated with searching and retrieving electronic data. And as e-mail continued to generate an ever-larger body of data to cull through, Kibbe realized existing search processes would quickly become obsolete.

So in 2004 she teamed up with SPi, an e-discovery consultancy and solution provider. “We partnered with SPi because of this data analytics capability they were developing,” she says.

Data analytic software is a powerful tool used to make sense of large amounts of information. It can organize vast volumes of data into topical piles so that reviewers only need to search for and read documents in those piles rather than spending hours digging through hundreds of disorganized and potentially irrelevant documents. This not only reduces the time it takes to review a set of documents, but also reduces the amount spent on attorney review.

Since relying on data analytics, Pfizer has reduced the number of documents it sends to review by nearly 70 percent on some of its largest matters.

“If you look at e-discovery today, 70 percent of your costs are dedicated to review,” Kibbe says. “To get that cost down, you have to make sure that the relevant, responsive stuff gets to the review room while the junk doesn’t. Data analytics helps you ensure that you’re sending the right stuff to be reviewed and leaving the wrong stuff out of the funnel.”

Guess Work

To do this, data analytics employs complex statistical analysis to automatically group related documents.

First, an entire body of documents is fed through the software. The software scans each document’s content and compares it to every other document in the system. Documents with statistically similar content are then lumped into a pile together. Based on the shared content, the software automatically names the pile for easy identification.

This is a drastic change from traditional culling methods. Prior to the advent of data analytics in 2003, lawyers had to rely on keyword search tools. This required users to manually type in search terms, a technique that’s inherently flawed.

“With a Boolean search you’ll wind up with things you don’t care about and miss things that you do,” says Stephen Whetstone, vice president, client development and strategy for Stratify Inc., a developer of data analytic solutions. “It’s impossible to come up with a comprehensive list of terms and phrases on the front end of the process that will return only what you want.”

Take a sexual harassment case, for example, in which lawyers have to cull through thousands of documents, the majority of which are completely unrelated to the matter.

Using keyword searches, the lawyer makes educated guesses as to what terms are likely to retrieve the relevant data. In order not to miss any relevant material, the search has to cast a pretty wide net. Therefore they may choose words such as “harassment” and “sex,” which are likely to retrieve loads of irrelevant documents and may miss the more subtle responsive documents, too.

Computerized Culling

Data analytics alters this process, incorporating traditional search methods only after the software has automatically sorted the vast body of information into manageable piles. For example, if a lawyer used data analytics on the same sexual harassment matter it would remove much of the attorney guesswork.

“Data analytics will automatically compare documents on a contextual basis, understanding that words such as ‘explicit’ and ‘abuse’ are related to the word ‘harassment,’” says Michele Lange, staff attorney for legal technologies at Kroll Ontrack, an e-discovery consultancy and service provider. “On top of that, it will put all documents with related topics into folders marked accordingly. So it pre-sorts the data for you.”

What counsel are left with is a number of folders with names based on their contents. So one folder might say “invoices” while another might say “harassment.” Attorneys can then quickly rule out any documents in the “invoices” folder and concentrate their efforts on the contents of “harassment.”

“Rather than attempting to make search lists before the body of data is understood, you can become more informed about what kind of data is there and then start forming your search lists,” Whetstone says.

Once data analytics has sifted through the data, lawyers can search the documents using keywords. But instead of scanning through thousands of documents, they only have to search folders containing a few hundred documents that are likely to be responsive. “Data analytics is really good at separating the wheat from the chaff,” says Mike Kinnaman, vice president of marketing for Attenex Corp., a data analytics vendor. “You can remove all the information that has no bearing, allowing you to concentrate your downstream review efforts on only what is really potentially responsive.”

Vendor Assistance

Although data analytic software is available in an off-the-shelf form, it’s fairly burdensome to install and implement. As a result, a number of vendors have sprouted up over the past couple of years to help legal departments use data analytics for e-discovery searches. These providers charge roughly $300 an hour for their services.

This cost includes assistance with culling and reviewing the information. One of the most useful services they provide, though, is “non-hit” sampling. The vendor basically samples documents not returned from keyword searches to ensure no responsive information was missed.

“It’s good to up that level of defensibility, especially when there’s a high degree of risk,” says Peter McLaughlin, director of review management services at FIOS, an e-discovery consultancy. “So we’ll sample the material the keywords didn’t hit. The attorneys can keep the results in their back pocket if they ever need to present it in court.”

Although provider fees aren’t cheap, Kibbe believes their results make financial sense.

“Where I save money is not by sending 10 documents for review to find out only three are relevant,” she says. “It’s when I’m sending 10 documents into the review room with seven or eight coming out responsive. That’s clearly more cost effective.”