Part one of this series introduced the following best-practice strategy for deploying e-discovery software solutions: on-premises software for the left side of the Electronic Discovery Reference Model (EDRM), and cloud-based technology for the right-side processes. In part two, we’ll talk about how this model impacts collection and preservation, and processing.

On-premises left-hand side means using behind-the-firewall technology to build an organization’s customized process for the following: pre-collection analytics, issuing legal hold notices, collecting and preserving electronically stored information (ESI), processing ESI and first-pass review of ESI. Part I discussed the reasons for using on-premises software for the pre-collection analytics and legal hold. Now, we’re taking a look at key issues for collection/preservation and processing.

Collection and preservation

The idea of conducting targeted collection and preservation, rather than collecting everything up-front in a scorched-earth manner, has taken root with most people and organizations. However, the devil is in the details as to how a technology works to accomplish this task.

One methodology that continues to be touted is dubbed the “put everything into a magic box” approach. The general premise is that unstructured data sources, such as laptops, desktops and file shares, are “bad” and that the best way to manage data is by locking it all into a structured repository where theoretically it can be managed, accessed and presumably deleted as needed. Email archiving is a subset of this category that gained early momentum as a solution organizations used to try to manage e-discovery needs. 

While organizations may need some degree of archiving for records management purposes, many have found that rather than storing  just “business data” or even litigation-related data, these data stores begin to be a redundant source of all data because of their inability to separate relevant from non-relevant information. Such an approach requires a constant migration and assessment of data just to be ready “in case” an organization needs to use the repository to retrieve necessary information. Similar issues exist for solutions that propose to index an organization’s entire IT infrastructure. 

The ideal on-premises collection and preservation solution avoids this duplication of data by searching original data stores for relevant ESI without the interim step of either migrating the data to a repository or indexing the data source up-front. Also, collection of ESI should be more of an automated process in which every relevant data source is searched for responsive data at the time the preservation obligation begins—nothing more and nothing less. The organization dives into the original data source, captures and preserves relevant ESI in a defensible manner and then moves on, allowing the end-user to continue business activities without disruption or delay. 

The need to conduct targeted collections from original data sources mitigates in favor of having on-premises software for this capability.


Processing ESI, i.e., de-duplicating and further culling data, has traditionally been the provenance of review and vendor-hosted solutions. We often think of processing as a specific step in the linear process of e-discovery that sits between the collection of ESI and review.

But with an in-house processing capability, this can be an iterative step that can take place at multiple points during discovery. For example, at the onset of litigation, an organization can take a sample set of data, such as data from a key witness’s computer, run some initial processing on the files to isolate the relevant file types and then begin crafting and testing search terms to cull data for the remainder of the collections process from other data sources. 

Another important capability of an in-house solution is the ability to conduct different types of processing at different steps in the e-discovery process to make effective cuts at data when and where it makes the most sense. At the initial stages of a collection, the easiest cuts at data are the ones that do not require substantive review, such as file types and date ranges. Ideally, organizations can make these cuts without having to index the data sets or move them into an archiving or enterprise content management repository. 

A more advanced processing technique is to perform “rolling de-duplication.” This type of enhanced processing moves up processing to the collection stage and allows organizations to compare files that they have already collected with the files that are being scanned for collection. Rather than de-duplicating all files after collection, files are de-duplicated during the collection stage so that organizations only collect a single instance of relevant files while maintaining a record of where the duplicates exist. This technique greatly reduces the number of files collected. In a large case, one organization was able to review ESI for more than 800 custodians and reduce an initial data set of 146 terabytes of data to 17 terabytes using this process. 

Although processing is often necessary at the full-review stage, this capability also needs to be on-premises to maximize its effectiveness in the earlier stages of discovery.


Organizations understand that they need a mix of technology solutions to address the rising costs and risks of producing ESI. What has hindered progress is the tendency to focus on a particular step in the e-discovery process without a broader view of the best way to accomplish all of the other steps in a seamless and effective manner. Fortunately, enough time has passed to develop a uniform approach that allows tailoring of a solution to an organization’s needs and infrastructure at the points most needed on-premises, i.e., the early stages of the e-discovery process, and allows the organization to take advantage of best-of-breed review and production capabilities with the performance, features, security and control needed. There is a role for cloud-based e-discovery solutions, though, and we’ll discuss that in our concluding column.