Every day I talk with attorneys who must produce e-discovery. Sometimes the conversations are global in scope, other times narrowly focused, but they are all about answering the same question: How can the client produce all of the discovery required in the most cost-efficient and quickest way possible and without leaving itself vulnerable to legal challenges?

My answers differ in their specifics, of course, but they all follow the same approach: If you understand the underlying facts in the matter, have a basic understanding of how electronically stored information is housed, collected, processed and reviewed, use that knowledge to present reasonable positions to the requesting party and seek their cooperation and approval, your e-discovery production will be defensible either because you are following protocols agreed to by the parties or your position will (or, at least, should) prevail if it becomes the subject of litigation.

Here, I will try to summarize, step by step, how to follow that approach. I will discuss pricing and preservation and collection and limiting the scope of processing and production.

Know everything: In his first great book on screenwriting, “Adventures in the Screen Trade,” screenwriter William Goldman, who penned “Butch Cassidy and the Sundance Kid,” “The Princess Bride” and dozens more, made the often-quoted observation about Hollywood, “Nobody knows anything,” meaning that studio executives talk with great confidence about why this or that film in production will be the Next Big Thing, but then it flops and some other film that no one even noticed in production becomes The Hit of the Summer.

With e-discovery, however, the rule must be, “Know everything.” Below, I will discuss what “everything” is, and why it is important to understand it: because that is how you prevail, either when trying to come to agreement with opposing counsel or before the court.

Pricing: First, a few words about pricing. Obviously, you do not need a columnist to tell you to try to get the lowest pricing. But what you may need a columnist for is to explain what you’re being charged for. To illustrate: E-discovery processing must include the removal of systems files and other files not of interest to the reviewer, often referred to as “deNISTing.” Let’s imagine that a 260-gigabyte desktop hard drive has 40 GBs of data on it. Thirty GBs are of systems files, of no interest to the reviewer, while 10 GBs are of emails and e-docs. Vendor One charges $100 per GB to process. Vendor Two charges $250 to extract the emails and e-docs from the 40 GBs of data and $300 per GB to process. Obviously, $100 per GB is much lower than $300 per GB. However, Vendor One will charge $100 per GB for all 40 GBs, for a total of $4,000, while Vendor Two will charge $250 to “deNIST” and $3,000 — $300 per GB times the 10 GBs of emails and e-docs extracted — for a total of $3,250. So, when you are going for pricing, make sure you understand not just the price per unit but how the vendor gauges units. In other words, know everything.

Preservation and collection: To know what to preserve, you must know your case to know who is involved in it, and you must know your client’s IT infrastructure to know where those custodians kept their ESI. Did they store it locally, i.e., on their desktops or laptops? Did they store it on email and file servers? Are there project folders or other shared folders and where are they? Are databases involved and, if so, where do they reside?

You must get this information from your client’s IT staff and from the users, and you must find out whether the answers provided to you are policy answers, practical answers or both. For example, many companies have policies that files must be stored only on the servers, but some companies are much better at enforcing those policies than others. If the policy is to store on servers only but the reality is that users store locally, you must preserve and collect from both locations.

Clients go back and forth about whether to “preserve in place,” i.e., send out litigation hold memos until data must be collected. The great benefit of preservation in place is that if the case goes away before collection, you save on that expense, or if the case changes or simply is so defined that the universe to be collected shrinks over time, you will spend less collecting only a fraction of what you thought at first you had to collect.

There are, however, several drawbacks to this strategy.

First, it takes a great deal of effort. The greater the effort the larger the number of users involved. A “no deletion” policy can quickly lead to IT issues, with mailboxes growing too large and system performance slowing down.

Second, proving that the policy was not simply issued and forgotten about, but actively monitored, is not easy. Even a “banner” that states the terms of the litigation hold, and which every user must accept before gaining access to the company’s network, is not particularly persuasive: Every computer user — and most judges are — knows what it’s like to accept a multipage terms-of-service agreement in a second without reading a word of it.

Third, the strategy is premised on what is more a hope than a probability, i.e., that the case will go away. Sometimes cases do, but rarely before discovery begins, and usually cases go away when, and because, you are prepared, not when you have put off your preparation.

This last point leads to the second-biggest drawback with the strategy: If you put off collection, you put off both knowing what your case is all about and how you can position yourself to produce discovery in the cheapest way possible.

Your ESI will tell you what your case is about or, at a minimum, what it will look like to the trier of fact. Zubulake, the mother of all e-discovery cases, illustrates this point perfectly. No one would dispute that UBS Warburg’s attorneys would have proceeded differently had they had, prior to or when they spoke to the employees involved, the benefit of all of the emails and e-docs subsequently uncovered through the years of judicial wrangling.

Moreover, knowing your e-discovery will give you the information needed to make the arguments you will want to make, as Pippins et al. v. KPMG, 2012 U.S. Dist. LEXIS 1768, 81 Fed. R. Serv. 3d (Callaghan) 955 (S.D.N.Y. 2012), teaches so well.

In Pippins, the plaintiffs, who sought class certification, alleged violations of the Fair Labor Standards Act and related statutes. KPMG moved for relief from having to retain 2,500 hard drives (on which evidence of the violations was thought to reside), at a cost of $1.5 million, proposing that they retain only 100 randomly chosen hard drives. KPMG, however, refused to share any sample drives with the plaintiffs or do any sampling on its own to establish that the 100 hard drive set would be representative of the 2,500 hard drive set. The court, then, was forced to deny KPMG’s motion. Had KPMG taken the initiative and looked into its own ESI, it could have supported its motion with facts and prevailed.

With preservation, as with everything else, informed positions tend to win, and uninformed ones almost always lose.

Finally, the biggest drawback with preservation in place is that it exposes the client to potential litigation regarding spoliation. Despite best efforts, data preserved in place will be deleted and overwritten, and even more of it will be accessed so that key metadata dates will change.

If challenged, the case that data has been preserved will boil down to one or more witnesses, whose attitude toward production presumptively ranges from indifference to hostility, telling the court, “Trust me.” The cost of litigating such a motion will eclipse the cost of forensic data collection with no guarantee that you will prevail on the motion. So, forensic collection as preservation can be thought of as spoliation insurance.

There are middle-ground choices between preservation in place and forensically imaging everything. For example, if your client has a standard IT infrastructure in which users use their desktops or laptops to access email and file servers, your client’s IT staff can make backup tapes of the servers to preserve all of the data on them and then not have to worry about preservation in place.

The prime drawback with that strategy is that it does not account for preservation of data on the desktops and laptops, which will pose a problem if your clients save data locally or, at a minimum, cannot convince the court that they do not save data locally. If, however, it is demonstrably true that they save data only to the servers, running special backup tapes can preserve everything easily without having to deal with the problems of preservation in place. Once again, if you know everything, you can advise wisely as to how to proceed, and present strong arguments if challenged.

Limiting the scope of processing and production: As KPMG illustrates, limiting the scope of what you have to collect/process/review/produce is how you limit costs. But KPMG is an outlier case because there, the cost of preservation and collection was astronomical. Usually, the cost of collection is a relatively small percentage of the total cost of producing e-discovery. It does not cost a lot to collect terabytes of data. Usually, the issue is whether to collect the ESI for the 10 custodians who were deeply involved in the matter or to play it safe and collect from the additional 10 whose involvement ranged from somewhat to peripheral.

What does cost a lot is to process the collected data into a searchable database format so that you can review it. The processing should include the removal of any file types not of interest to a reviewer, such as systems files (deNISTing), deduplication and, if you so choose, searching using keywords and search strategy variations, such as proximity searching (“X” within five words of “Y”), Boolean searching (“X” but not “Y”) and other strategies familiar to anyone who has searched LexisNexis or any other large database.

How to reduce processing costs, then, is one of the two big issues in e-discovery productions; the second, and perhaps most important, is how to reduce the volume of data that has to be reviewed. The two are intertwined.