The Dilemma of Duplicates
Law Technology News
Large-scale electronic discovery projects can waste considerable time and money because as many as 70 percent of e-mails and documents may be duplicates. The same e-mail may exist in sent and received folders, on multiple backup tapes and on the users' hard drives.
The challenge of dealing with all of these duplicates is how to reconcile the three Cs of document review: context, consistency and cost. An effective e-discovery system should resolve these three critical issues. When choosing an electronic data discovery vendor, be sure to discuss the nuances of duplication. Here are some factors to consider.
If you remove duplicates and keep just one copy of an electronic document, you remove the possibility of reviewing that document in context, which can be critical to understanding its true content.
If you keep the duplicates, you run the risk of coding some of the copies differently, perhaps marking one copy privileged while marking another responsive.
If you keep the copies, you must bear the cost of reviewing the same document multiple times.
You soon realize that electronic duplicates are a much bigger problem than you faced with paper documents. It may be difficult even to identify all of the places an e-mail may be stored, and even more difficult to delete all of the copies.
Simply identifying duplicate files can present a challenge. One technique is to examine the metadata. If you count as duplicates those documents that were sent from the same person and have the same date and subject, you will generate unacceptable levels of false positives -- decisions that two documents are identical when they are not. Many e-mails can be sent from the same person on the same date with the same subject, but still have different content.
A much more effective technique is to combine the metadata with additional information about the content of the e-mail and only consider two e-mails to be duplicates if they are identical in all of these regards. Very small differences in a document's content can have important implications. Imagine a contract in which you change one of the 0's to a 9. One character difference can cost you thousands of dollars.
Context is critical in evaluating the communicative intent in an e-mail. Some e-mails may simply say things like "I agree with your assessment." Without knowing the e-mail to which the person was presumably responding, one cannot understand the information value.
E-mails in response to earlier e-mails are part of a thread of communication. Earlier e-mails in the thread provide the necessary background for evaluating a later message's value. Similarly, a sarcastic e-mail taken out of context may be interpreted very differently from the same statements made in the middle of a dozen other sarcastic e-mails.
Even the identity of the person who sent or received the e-mail may affect how it is interpreted. One may be able to read isolated e-mails, but understanding their importance requires context. If deduplication removes the context of a document, evaluating it could become very difficult. Without proper care, reconstructing these contexts may be very difficult or even impossible.
Understanding a document in context also plays an important role in how a document is coded. Whether or not an e-mail is relevant to a case issue may depend on the other documents with which it is associated. A given document may be privileged or not depending on who received it. Alternatively, two reviewers looking at the same document could come to different conclusions about whether or not a document is privileged.
Privilege can be waived if the reviewer examining the sender's mailbox makes one judgment about the privilege status of an e-mail while another reviewer makes a different judgment. Without the proper treatment of duplicates there might be no way to recognize that this e-mail was treated inconsistently.
Inadequate planning for duplicates can not only raise the cost of the discovery process, it can compromise the integrity of the entire case. Duplicates need to be identified with high accuracy. Information about their context must be maintained so that they can be examined in every "virtual" context in which they appear, but, at the same time, reviewers must be aware that they are looking at a duplicate.
Reviewers should also have ready access to the information about all of the locations that the duplicate appeared in and be able to navigate easily to all of those locations. Finally, reviewers should also have access to communication threads to be able to evaluate the context of each document in a chain of communication.
Stephanie Sabatini, based in Austin, Texas, is a litigation support consultant.