In 2006 Maura Grossman was promoted to counsel at Wachtell, Lipton, Rosen & Katz, after seven years as a litigation associate. Most Wachtell counsel develop a specialty, and Grossman was considering legal ethics. Her mentor, partner Meyer Koplow, suggested something else: electronic discovery.

“He said: ‘Trust me on this. Electronic discovery is going to be a big deal,’” Grossman says.

With an atypical background for a lawyer—she has a Ph.D. in psychology—Grossman was accustomed to studying scientific methods and research techniques, and she immersed herself in the science of data retrieval. In 2009 she met Gordon Cormack, a computer scientist at the University of Waterloo, a leading Canadian university for technology, when they were working on a research project at the National Institute of Standards and Technology’s Text Retrieval Conference. Cormack was taking supervised machine-learning algorithms developed for eliminating spam from email and applying them to other information retrieval tasks. Grossman had an insight: “It seemed to me that spam filtering was not all that different from sorting relevant from nonrelevant documents in discovery.”

She and Cormack developed a process they call continuous active learning, in which a computer uses machine learning techniques to get better at identifying the right documents. Simply put, machine learning uses computer algorithms to organize information by analyzing features in data. By showing the machine relevant documents, a person can train the machine to identify others that fit the pattern. Grossman and Cormack got three related patents on the process, they have other patents pending and they’ve applied for a trademark on the term “continuous active learning.” (They’re also engaged to be married.)

Wachtell started using Grossman and Cormack’s method for client matters in 2010, and since then the firm has used this technology—Grossman calls it “the sauce”—on more than 80 matters, mostly litigation and internal investigations. “I haven’t sold everyone at the firm,” says Grossman. “But we have a critical mass.”

In the annals of innovation, Grossman’s story wouldn’t stand out. But in the world of Big Law, her feat is exceptional. While firms have made some progress in their use of technology, most have been slow to explore machine learning, a process that’s transforming how knowledge is managed and used. Wachtell and a few other firms, meanwhile, are taking the initiative to gain a competitive advantage.

“We’re constantly looking for ways to relieve lawyers, and particularly associates, from fairly tedious work,” says Koplow. “As [Grossman and her team] continue to refine and develop their system, it’s going to make it even more possible for lawyers to analyze data, rather than sift through it.”

Cutting into profits?

In the business world, companies are increasingly exploring the possibilities of machine learning. International Business Machines Corp. has committed more than 
$1 billion to its much-publicized Watson technology, a platform that uses machine learning to process natural language and analyze large amounts of data. Others are also making huge investments. Tech giants with access to enormous amounts of data—Google Inc., Microsoft Corporation, Facebook Inc., and Inc.—are more quietly developing machine learning applications.

Meanwhile, Big Law lags far behind. “The most mature apps [using machine learning] are in finance by far,” says Daniel Katz, associate professor at Illinois Tech-Chicago Kent College of Law, who sits on the American Bar Association’s task force on Big Data and the Law and co-founded the legal analytics company LexPredict. “If you have an advantage in trading, you don’t have to convince anyone [of the value]. Machine learning there is light years ahead of where it is in the law.”

Big Law has long been slow to adopt technologies, even among other service providers. In a survey of law firms of all sizes conducted last fall by the Computing Technology Industry Association, only 26 percent described themselves as early adopters, compared with 41 percent for accounting firms and 37 percent for marketing firms.

A few firms beside Wachtell have taken the plunge to develop applications for machine learning technologies. Three years ago Drinker Biddle & Reath hired Bennett Borden, a former analyst at the Central Intelligence Agency, as its chief data scientist. Borden heads the firm’s Tritura Information Governance subsidiary, which does e-discovery and other data analytics. He says the firm has built a “predictive compliance” product using Watson tools that has detected corporate misconduct by reviewing emails and other data. Last year his group was so busy that he hired 12 new technologists; Tritura’s revenue has grown tenfold in two years. “We’re very popular within the firm,” he says.

At Fenwick & West, partner Stuart Meyer is helping to developing a computer tool that uses analytics to identify new patents that might be vulnerable to challenge under the America Invents Act. Says Meyer: “We’re trying to get out in front as more and more knowledge tools become available.”

But Drinker Biddle, Fenwick and Wachtell are exceptions. “Law firms are extremely risk-averse,” says Kyla Moran, senior industry consultant at the IBM Watson Group. “When we say our technology will give you more effective results, [they worry] that could eat into their profits.”

IBM’s Moran knows this from experience. Last May, the company contacted large firms in the U.S. and the U.K. and started offering a 30-day free trial of the tools on its Watson “ecosystem,” which would allow law firms to try to build useful applications. Moran estimates that about 50 firms have logged on to the site to investigate the offer, and some are working on rough prototype solutions. So far, only two small businesses are partnering with IBM through this ecosystem on Watson legal applications: ROSS Intelligence, which is developing an application for bankruptcy matters, and which is partially funded by Dentons; and Legal OnRamp, which is working on a process to analyze contracts.

The chief information officer of one major firm says he didn’t move forward on the Watson applications. “The free part was great, but the opportunity cost was too high for what they would do,” he says. He thought it would take too much time and effort to identify data that might be analyzed by Watson, and figure out a useful application that might emerge. He also suspected that the firm wouldn’t have a large enough data set to allow Watson to work effectively. This led him to wonder about the benefit to his firm, especially if the applications reduced legal work. “It seems IBM’s strategy was to get law firms to train Watson and then sell it to in-house departments,” he says. “That didn’t sound like the most appealing thing from a law firm’s perspective.”

Several law firms have contacted IBM to offer their data to be analyzed by Watson, IBM’s Moran says, hoping to follow the model that IBM struck with Memorial Sloan Kettering Cancer Center. There, the Manhattan medical center turned over historical cancer patient data to IBM (stripped of identifying details), and Watson analyzed it for free to help develop some preliminary medical applications. But the IBM Watson group wasn’t willing to strike a similar deal with law firms. “[The Watson group] wouldn’t do it for free,” says Moran, adding that the Sloan Kettering deal is a special case. “It’s hard to argue with using the technology to help cure cancer,” she says. “That was the motivation for doing that for free.

“The Watson group within IBM is only 2,000 people,” she notes. “We have limited resources. At this point we’re going with those who are building business relationships.”

IBM also offers a “cognitive value assessment,” a service in which IBM helps clients identify opportunities for using Watson for a large fee. One law firm has signed up. “We’re looking to build a first-of-a-kind solution with them,” says Moran. IBM may identify the firm in the first half of this year, she says, depending on the project’s progress; she describes it as an influential firm that “prides itself on being tech-forward.” She won’t reveal the fee paid for this assessment, but notes that IBM typically charges from $250,000 to $450,000.

Law firm consultants Bruce MacEwen and Janet Stanton of Adam Smith Esq. say that most law firms are waiting for IBM or someone else to develop applications. Last spring, the two arranged for leaders from roughly a dozen law firms to see a demonstration at IBM’s Watson showcase center in New York City. Managing partners and law firm technology experts watched a video showing Watson being used to help diagnose illnesses and to solve other problems, but none of the examples involved a legal application. “Law firms are saying that IBM hasn’t invested in law-specific tools,” says MacEwen. “They say, ‘We, the law firms, would have to train people [to develop Watson applications], and we’re not interested in doing that.”

“We encourage law firms to set up R&D programs,” says Stanton. “That is absolute standard operating procedure for other businesses. But law firms strip out all the profits at the end of the year. We don’t see enough experimentation.” She’s come to believe that most firm partners are making so much money that they don’t have enough incentive to change. “On a macro basis the industry is not in enough pain to make change,” she says. Adds MacEwen: “Maybe the motivation will come from clients.”

Fifty times more efficient

At Wachtell, Grossman was allowed to pursue her own R&D project because Wachtell has a markedly different business model than most firms. It handles many matters for (very large) flat or contingency fees, so it isn’t as reliant on hourly billings. And it has just 1.7 associates for every equity partner, so it doesn’t make much money from associates toiling away on relatively mundane work. (Many of its peer firms in New York have three to four associates per partner.) The firm, which consistently leads The Am Law 200 in profits per partner ($5.5 million last year), has crafted a business model centered on charging premium rates for partners’ expertise.

“Many firms have been making lots of money with the old discovery model, using junior associates to do manual review,” says Grossman. “Had I been working at one of those firms, my efforts might have cut into the firm’s revenue stream.” Even at Wachtell, Grossman ran into skepticism. “Lawyers initially didn’t believe it could be done,” she recalls.

In the process developed by Grossman and Cormack, a lawyer provides the computer algorithm with a small set of relevant and nonrelevant documents (the “seed set”) to teach it how to identify each. The computer then ranks all documents in the entire data set from the most likely to be responsive to the least likely. Lawyers take a small group of documents from the “most likely” pile (Grossman likens this to skimming the cream), code those for relevance and feed them back to the computer along with the original seed set, to improve the computer’s ability to identify the right documents. The process is repeated until the computer isn’t finding many more relevant documents in each pass, and lawyers decide that enough documents have been found. (One way it differs from some other machine learning products offered by outside vendors is in how it selects the seed set and how often the machine is “retrained” during each project.) Grossman says she can take 2 million documents on a Friday afternoon and have the vast majority of the relevant items identified by lunchtime on Monday.

The firm also supported Grossman as she and Cormack designed and ran empirical studies that compared sophisticated e-discovery tools. In 2011 they published an influential article showing that technology-assisted review can be 50 times more efficient than human review, meaning that it found just as many relevant documents with humans reviewing just 2 percent of the document collection, compared with humans reviewing 100 percent. They followed up with a 2014 peer-reviewed article indicating that a continuous active learning process like theirs was superior in most cases to other machine learning protocols.

Although discovery is the first area in the law to take advantage of machine learning, these processes still aren’t widely used. “It is still by no means universal that advanced machine learning techniques are used as they could be [in discovery],” says Michael Mills, a former head of technology at Davis Polk & Wardwell and now the CEO of Neota Logic Inc., a company that makes compliance software. “I’m mystified. We’re 10 years into machine learning for e-dis­covery, the benefits are proven, yet both law firms and clients still need persuading.”

“There are many reasons it hasn’t moved ahead as much as one might like,” Grossman says of the adoption of technology. For one thing, not all e-discovery products that claim to use machine learning are equally effective, and some customers have been soured by bad experiences, such as failing to find important relevant documents. Also, there’s the nature of the legal profession. “Lawyers are generally conservative by nature,” she says. “They don’t want to be the first guinea pig out there experimenting with a new technology.”