Catalyst’s Andrew Bye talks machine learning, hype deflation and how TAR has grown over his time in the e-discovery market.
As the e-discovery market matures, new technologies supporting cheaper, quicker and more in-depth analysis have emerged to help companies set themselves apart from the pack. Artificial intelligence processes, process analytics and document review have all found their way into technology-assisted review (TAR) processes, making it sometimes difficult to know what kind of technology a given company is really talking about when they refer to their unique, high-tech strategies.
Andrew Bye, who recently joined Catalyst’s team as director of machine learning and analytics, has been working in the industry long enough to know where machine learning can boost efficiency and where it operates more as a marketing ploy. He discussed his trajectory through the e-discovery market and how he’s seen machine learning grow in TAR processes:
Hometown: I grew up in Newport Beach, California, and have bounced back and forth between southern and northern California since college. I’ve been in San Francisco for about 12 years now and have no intention of leaving.
Why did you join Catalyst?: The people. It was honestly a very easy choice for me. Every single person I talked to from initial phone interviews to in-person meetings in Denver was fantastic. Not only were these people experts in their fields, but I could’ve spent the entire day with any one of them just chatting about our lives and what we’ve been up to recently outside of work. And the more I learned about Catalyst’s amazing technology they’ve developed for Insight Predict, the more appealing the job became.
Entry into the e-discovery world: I was working in natural language search during the initial tech boom, but that industry took quite a beating during the early 2000s. I thought about going to law school and decided to take a job in a law firm to see if it was a career path I really wanted to go down before sinking money into law school. After about a year of litigation support work, I realized that working in a big firm wasn’t right for me, but shortly thereafter I heard that H5 was looking for linguists in San Francisco to help classify documents for e-discovery. It seemed like a pretty good match for me, and I got an amazing education in e-discovery during those early years when I was there.
How have you seen TAR grow and change throughout your career?: Effective machine learning and the steadily falling cost of processing and storing data have really changed the business since I’ve been in the game. Clients don’t have to worry as much about how much data winds up in a matter; platforms with good machine learning can get at the responsive material so quickly, even if there are millions of documents in the collection and the prevalence of relevant data is quite low, and the costs aren’t as prohibitive as they were a decade ago.
Where do you see AI fitting into (or not) TAR strategies in coming years?: There will undoubtedly be all sorts of marketing publicity surrounding hot new AI developments in the e-discovery industry, such as deep learning seems to be getting now, and most of these will probably be proven to be ineffective in a TAR application. I see review teams getting savvier in their utilization of analytics, smarter with validation tactics, and simply becoming more comfortable using TAR with continuous active learning because it’s proven to be so effective.
How do you help firms figure out when and if TAR is a key strategy in a given matter?: I have only encountered one matter in my entire career where TAR wasn’t a good fit, and that’s because the responsiveness criteria were changed in the middle of the review to only produce documents that were hit by very simple keyword terms. For nearly every other situation, no matter what size collection you have, TAR with a reasonable workflow will be more efficient by prioritizing the responsive material to the beginning of the review. Even if you have to review every single document, why wouldn’t you want to get to the good stuff first?
Favorite and least favorite e-discovery jargon words: Favorite: efficiency, sampling, hot docs (because I always hear it as “hot dogs”), “de-dupe” is fun to say. Least favorite: full-family review, TAR 1.0.
One trend you expect to see emerge in coming years around TAR and e-discovery work?:This might be wishful thinking on my part, but I’d like to see the industry view machine learning as a prioritization mechanism, just like keyword searching or reviewing a certain custodian’s data first because that person seems integral to the matter. Machine learning is a very powerful prioritization tool, but it’s just suggesting that you take a look at certain documents first because those are the types of documents you’ve trained it to look for. It’s really not that scary. I’d be far more afraid that the keyword culling I’ve agreed to is eliminating a huge chunk of important data.
Copyright Legaltech News. All rights reserved. This material may not be published, broadcast, rewritten, or redistributed.