ALM Properties, Inc.
Page printed from: Corporate Counsel
Select 'Print' in your browser menu to print this document.
Big Data Technology
Law Technology News
In the course of reporting our October cover story, "Defending Big Data," Law Technology News sent out a request for information to legal technology vendors that participated in LegalTech New York 2012. We acknowledged the emergence of Big Data and the tension that exists in mining, exploiting, and monetizing customer data versus the security and privacy of that data. We then asked the vendors "if they have launched or will be launching products and services addressing Big Data." LTN received a number of responses that show that, for the most part, that legal technologies addressing Big Data are focused on the production of evidence in litigation and government investigations, and not the extraction of customer or consumer preference.
Paul Bond, partner at Reed Smith, said in the cover story that Big Data "is used to characterize the escalating accumulation of data, especially the data sets too large, too raw, or too unstructured for analysis by conventional techniques." Bond's reference to "conventional techniques" is to the use of relational databases which do a great job managing uniform data in data sets that we thought were large, until Big Data, he said.
As the standard for big data sets gave way to Big Data, relational databases mostly couldn't keep pace, said Bond. "The profound coordination that relational databases offered turned from asset to liability. They couldn't scale fast enough [or] adapt quickly enough to the chaos that comprises Big Data." As a result, we see businesses and service providers turning to NoSQL and newer data management programs with less need for unified infrastructure, he said. "It's like we moved from a Newtonian universe of data organization into something more local and relativistic, and the story is just getting out."
As Bond said, Big Data is moving products and services to satisfy "local and relativistic" customer needs. And legal technology customers have different needs in Big Data than manufacturers and distributors of consumer products and services such as Amazon, Netflix, and Target.
Jeremy Pickens, senior applied research scientist at Catalyst Repository Systems, speaking for himself and not Catalyst, believes that electronic data discovery "is not amenable to many of the approaches that are currently in Big Data vogue."
"The whole premise behind Big Data is that you have thousands or millions of users engaged with a particular system," said Pickens. Each user repeating (perhaps with slight variation) the same task, he explained. "From this repetition and variation you can extract information and make predictions about what future users will do."
Brian Kawasaki, executive vice president of technology solutions at Advanced Discovery, which offers investigatory and litigation services, was not sure if his company "gravitated to truly non-EDD" opportunities to handle Big Data. Advanced Discovery manages ESI to output evidentiary data, said Kawasaki.
If Advanced Discovery's e-discovery technologies were reworked to capitalize on Big Data, the company "wouldn't necessarily have a good story there," said Kawasaki. But if Big Data is defined in terms of extracting evidence from volumes of data, said Kawasaki, then Advanced Discovery can build ways to do that and make corporations feel more in control of their data from an e-discovery perspective.
Although Big Data is handled differently in the legal sector than the consumer sector, the technologies used to manage and mine all that data can share the same challenges. According to Robert Miller, of the Rise Advisory Group, there is a growing interest among corporate buyers to understand and manage Big Data. The challenge to date, according to Miller, has been threefold: what technology to use; where to apply it; and whether the organization has sufficient expertise to implement and monitor the software.
Most Big Data products and services view the data from a reactionary position, said Miller. Organizations attempt to make sense of data in the context of an event such as litigation, in investigation. Miller, however, believes that there is an opportunity to move data classification processes to the far left of the Electronic Discovery Reference Model, using machine learning to classify and organize data into predefined business use cases near the point of creation. Then organizations can proactively manage information for "retention, security, audit, legal, and business intelligence purposes."
In LTN's July edition, in "Can Computers Predict Trial Outcomes from Big Data," Tam Harbert profiled Daniel Katz, an assistant professor at the Michigan State University College of Law, who has been exploring how Big Data can be used by corporate law departments to not just predict outcomes of disputes, but also how to craft strategies, and to decide "whether, how, and where" to file lawsuits. Tymetrix, part of Wolters Kluwer Corporate Legal Services, has accumulated $25 billion in legal spending data, and Tymetrix has been using analytics to mine that information. One product is already on the market: the $2,500 Real Rate Report that benchmarks law firm rates, and identifies the factors that drive them, wrote Harbert.
Zylab says its e-discovery and information management software has an application programming interface to tap Big Data analytics, but the company is staying "close to home" [legal industry]. We did, however, receive a number of responses that indicate legal technology can be used to meet Big Data customer needs. (Zylab has been heavily involved with international tribunals, such as the European Human Rights Court that just completed a major upgrade of its public database that included the adoption of SkyDox. See "The Right to Know," also in our October issue.)
E-DISCOVERY MEETS BIG DATA
Denver-based Catalyst, a provider of document repositories and case collaboration systems, manufactures the Insight e-discovery product, said CEO John Tredennick. It is based on the MarkLogic NoSQL Database platform, one of the leading "Big Data" search engines on the market, he continued. Customers routinely handle petabytes of data, he explained, and MarkLogic has tested searches as large as 1.5 million characters and clusters of data can exceed 50 million documents. With MarkLogic, Catalyst's e-discovery platform supports XML, which enables it to combine metadata, tags, and text in a unified, searchable data store. Catalyst's technology promises to combine Big Data with a management and analysis platform, Insight, that was not available in a meaningful way prior to Big Data consolidation, Tredennick said.
Shaheen Javadizadeh is vice president of product management at Datacert, a provider of e-billing products to coporate legal departments and their law firms. Datacert has released two products that address Big Data: Legal Data Warehouse and proactive predictive modeling, he said.
Datacert's Data Warehousing product consolidates case data, timelines, costs, and case outcomes to help law departments better understand the business of law and law office operations and use data to improve results, he explains. The warehousing product also presents law departments with reports and dashboards that aim to help managing partners make better decisions and negotiate better deals with their firms and the business units they serve. Proactive predictive modeling takes historical information and provides the law department staff with years and terabytes of relevant data as cases update and evolve, he continues. By presenting information in real-time, law departments can make decisions that mitigate risk, reduce costs, or improve business results.
Ann Marie Gibbs, national director of consulting at Daegis, characterizes the Big Data problem "as many faceted." In the context of e-discovery, she continued, "we are faced with ever increasing data volumes, an increased diversity of data sources, and the demand to process data rapidly to meet unrelenting deadlines." Add to this the need to extract "meaningful" content from data, added Gibbs, which means identifying what you are obligated to produce and what you need to withhold for privilege or other protection.
A Big Data approach to today's data sets is one that, at its core, tackles problems from a rigorous scientific point of view and relies on statistics to validate results, said Gibbs. This approach is used in Daegis' Technology Assisted Review, which is currently in beta (due out this month). Underlying the TAR environment is a scalable Apache Hadoop platform optimized for demanding real-time calculations, according to Gibbs, which is needed to support the demands of machine learning.
Gibbs views Daegis' TAR product as the first, not the last venture into the realm of Big Data. "We are actively detailing future products and features that will leverage this big data approach to assist our clients," she said. But, Gibbs reserved, "this technology cannot be applied successfully in the absence of a well-thought-out process."
Ted Gary, Exterro's senior product marketing manager, said that a number of enhancements to Fusion, which is an integrated e-discovery, legal hold, and litigation management software platform, are focused on "greater control and visibility into all of the 'big data' that's created on a daily basis in the enterprise." Gary added that the upgrades were in direct response to customers who want to ensure compliance with foreign as well as U.S. federal and state laws and regulations, especially those governing data privacy.
Nathan Swenson, director of software-as-a-service development at HotDocs, said the company's application, called HotDocs Document Services, has some Big Data "tie-ins." The application lets firms put forms online and sends links to customers. The customers click on the links and they are presented with a HotDocs interview where the users can answer questions and the answers are fed back into the system to publish the story.
The current version of HotDocs does not have many analytical features built into the document creation process. However, said Swenson, the version due out this fall will let the firm or content publisher see metrics about how the client filled out the form. A firm can review a report on how far clients got before they abandoned their efforts. Firms will be able see which dialogs took clients the longest to complete. The goal, assured Swenson, "is to let these people see areas that are difficult or painful for clients so they can have tools to improve their forms and workflow process."
According to Jennifer Frost Hennagir, directory of public relations and investor communicatons at Huron Consulting, said the consultancy will soon expand its data analytics offering and open its data storage facility in Charlotte, N.C., to address Big Data challenges that corporations are facing. Huron's data analytics software is designed to further reduce the number of documents in e-discovery and streamline the process that gets at the most relevant documents sooner, rather than later, she said.
At the end of July, StoredIQ announced its newest data intelligence application, DataIQ. Jacqui Galow, director of marketing, said the new product is designed to serve as a Big Data 'start button' for enterprises. The company pursued the new product after hearing from customers who were not comfortable beginning an e-discovery project because they did not know the type and extent of their unstructured data without moving it into a repository. (StoredIQ is designed to provide customers an understanding of their data so they can engage in e-discovery and defensible deletion.)
Attorney Sean Doherty is LTN's technology editor.