Image: Shane Deleers
E-discovery system scalability and statistical sampling, computer security, and U.S. Supreme Court justices all garnered the attention of legal professionals at LegalTech New York on Tuesday.
Chicago-based kCura Corp. will make large-scale databases the focus of Relativity 8.0 (exhibit hall: 210), which will be announced this spring, CEO Andrew Sieja said. It's vital for kCura to make its document review software grow with the Big Data trend the company has already seen one service provider running 80 million documents in a single workspace, Sieja noted.
To help that customer and future customers, "We're making improvements to our data model. We've got to overhaul things in the [database scheme] to support these larger cases," he said. "We're playing a game of continuing to evolve our software to handle it."
As databases grow, it's equally important to give customers faster data searching tools, Sieja said. "We're working on a new distributed searching index ... and also concurrency," the latter enables customers to execute simultaneous operations against a single index, he explained.
On the processing side, where kCura debuted its software in fall 2012, there have been "a handful of customers who have bought it and are starting to run real data through it," Sieja said. There aren't yet any third-party applications for the processing software, unlike kCura's review ecosystem, which has dozens of aftermarket plug-ins. But the company has had early discussions with partners about that possibility, he said.
It can be brain-bending, but statistical sampling has plenty of value to law firms, not just academics, experts here said.
DiscoverReady senior vice president Maureen O'Neill, Oracle Corp.'s director of e-discovery Pallab Chakraborty, University of Waterloo computer science professor Gordon Cormack, and Wachtell, Lipton, Rosen & Katz counsel Maura Grossman implored their panel audience to apply math to their electronic litigation processes.
"Statistical sampling is a method to estimate a characteristic of a large population by examining only a subset of it," Grossman explained. "Estimate" may sound like a dangerous word to attorneys and judges, but the goal is "a reasonably precise mathematical measurement," she explained.
The results of sampling, in technical terms, aim to calculate the recall, precision, and confidence level of a search result obtained from a set of documents. "There's a big trade-off among all of these things [precision, recall, and confidence]," Cormack said. "You can get any two of them, and the third is going to vary. You can't get all three," Grossman added. Statistics won't get you out of trouble if your document review is missing relevant data, but taking the time for a sampling process will help show clients and judges that your process was reasonable, the panels noted.
It can also be used in early case assessment to determine that privileged documents aren't included, they agreed. Grossman advised her audience that humans are not more accurate than computers, but that software will never achieve perfection. "If you go out on the [exhibit] floor and someone tells you they're going to find 95, 98 percent of the documents, I'd run," she joked. "That's not possible."
Cormack and Grossman are both involved in the U.S. government's annual Text Retrieval Conference Legal Track, known simply as TREC in the e-discovery field. TREC ran each year from 2006 through 2011, but was cancelled in 2012 because a new data set was not ready in time and because 2011 results were delayed. The 2013 edition is also cancelled because of ongoing unspecified problems with the data set, Cormack said after the panel.