Virtual servers, desktops and storage are reaching the point of ubiquity. If you haven’t already encountered your first electronic data discovery collection involving virtualization, it’s only a matter of time until you do.
When you engage in collection in a virtualized environment, much of your job as an attorney remains the same: assess the situation, determine what needs to be collected and processed, and talk with your client and EDD vendor about what needs to get done. But realize that the underlying targets for discovery are radically different. If you stick to your old habits, you could be blindsided by high costs, hidden data and preservation issues associated with virtual machines. This article presents the information you need to ensure an efficient, complete and virtually painless collection.
Virtualization can affect your case in three main ways:
1. Increase costs and collections: Virtualization means the end of the “one computer per box” generation. If you get a rough estimate of an electronically stored information collection by merely counting the physical computers or servers, virtualization can throw your estimates way off. It is now commonplace for multiple computers to run on the same hardware that used to be reserved for one.
2. Cause you to overlook evidence: If certain forms of virtualization have been implemented, and an examiner is not made aware of it, they might miss crucial evidence. When searching a user’s hard drive, for instance, certain files contained within encapsulated virtual machines may not respond to keyword searches. The virtual machine files may have to be “opened” prior to the search to ensure accurate results.
3. Increase the risk of collection issues and/or spoliation: Virtualization involves separating computers and data storage from its physical hardware. This new technology brings with it new features that may increase the possibility of losing or destroying ESI. Examples include the ability to:
- “roll back” a computer to a previous “snapshot” and inadvertently lose newer data;
- move computers and data from one piece of physical hardware to another and accidentally misplace or compromise data; and
- delete entire machines with a single click and completely erase data.
Your ability to overcome these issues in any case involving virtualization relies on how well you do the following:
1. Understand what virtualization is and how it is used in corporate environments.
2. Ask the right virtualization questions when meeting with your client.
3. Effectively communicate what you need from your client and e-discovery vendor.
“It is not sufficient to notify all employees of a litigation hold and expect that the party will then retain and produce all relevant information. Counsel must take affirmative steps to monitor compliance so that all sources of discoverable information are identified and searched.” Zubulake v. UBS Warburg LLC, No. 02 Civ. 1243 (SAS), 2004 U.S. Dist. LEXIS 13574 (S.D.N.Y. July 20, 2004).
The Zubulake opinions hold attorneys accountable for understanding all the intricacies of ESI involved in legal holds and collections. Understanding how companies use new technologies, the hottest of which right now is virtualization, is the first step in ensuring proper compliance and collections.
Before I define virtualization and explain its different uses, note that there are many types of virtualization out there (including a few that are not covered here). Unfortunately, when people refer to virtualization technologies, they often use different terms to talk about the same thing. I will try to use the terms that most precisely describe the technology while still helping you learn the industry buzzwords. Although this technology may seem complicated, the ideas and uses behind it are very simple.
Virtualization, broadly, is the abstraction of computers from the hardware on which they run. In many cases, this means sliding a software layer between an operating system and its hardware. This layer (called a hypervisor or virtual machine monitor) acts as a middleman, managing access to hardware. Because this middleman has the ability to allocate hardware resources to the computers running on it (called virtual machines or VMs), new technological opportunities arise. Here are just a few examples of what virtualization can help accomplish:
- move entire computers (virtual machines) from one physical piece of hardware to another;
- reboot machines without actually shutting down the hardware;
- share physical pieces of hardware with other computers. See Figure 1, below.
Figure 1. Basic virtual machine architecture; three virtual computers “sharing” one physical computer. Click image to enlarge.
Now that you have a general idea of what virtualization is, let’s look a little more closely at the different types of virtualization and how they are used.
Server virtualization is what the current buzz is all about and it is probably enjoying the most use out of any virtualization technology in the corporate world today. This type of virtualization involves consolidating multiple servers onto one single piece of server hardware. The main reason for doing this is that most server hardware is not being used to its full potential. Today’s server hardware is much more powerful than current server software requirements. By putting multiple operating systems and services on one physical machine, companies can save money on server hardware and conserve space and energy.
Desktop virtualization encompasses two different types of virtualization. The first type, known as virtual desktop infrastructure (VDI), involves “delivering” desktop operating systems (such as Windows XP) over a network to your computer. Instead of the operating system being stored on your personal computer, it’s stored as a VM on a server, where most of the actual “computing” is done. Your computer (which, depending on the setup, may be called a “thin” or “dumb” client because of its lack of computing power) merely communicates what needs to be done to the server, which does the actual work. You might also hear this sort of idea referred to as server-based computing. SBC is similar to VDI, but uses a different technology.
VDI users do not notice a significant difference in how their operating system is running — it is as if they were running a standard desktop or laptop computer. VDI saves money on unnecessarily powerful hardware at the desktop level, and saves time because properly implemented virtualized desktops are easier to create, maintain and centrally manage.
The second type of virtualization encompassed by the term “desktop virtualization” doesn’t really have a universal name. It’s better defined by an example. Let’s say, for instance, that you’re running Windows Vista on your laptop, but you also need to run Windows XP. Traditionally, you’d need to do one of two things: either create a “dual-boot” system (a computer that, on startup, lets you select which operating system to run until the next boot-up sequence), or just use two, physically separate systems. With virtualization, however, you can merely run one operating system within the other. You can, for instance, run XP within Vista nearly the same as you’d run a regular program within Vista. This way, you could run both operating systems (the XP “guest” OS and the Vista “host” OS) at the same time, and you wouldn’t have to buy a new physical computer to do so. Whether the job is to test code, experiment with how an update or virus affects an operating system, or because a user needs the functionality of multiple operating systems, people can save a great deal of time (and hardware) by using this “guest-host OS virtualization.”
Storage virtualization occurs when a computer is abstracted from the medium on which it stores data. A group of networked hard drives can be combined or “pooled,” for instance, and space on that group of hard drives doled out to different users as needed. Virtualized storage can also be moved whenever necessary without disrupting users’ abilities to access their data. The reasons for using this type of virtualization technology include better utilization of storage hardware, better backup abilities and more centralized management of data.
ASKING THE RIGHT QUESTIONS
As in any collection, communicating with your client to get the “big picture” is very important. Getting an overview of how and where they store their ESI will cut down on the money you have to spend to keep your EDD vendor on site, and time that you spend worrying about where data is located and whether you’ve collected everything or not.
Now that you’ve got an overview of the common virtualization technologies, here’s a list of questions for you to bring to your next collection to help accomplish these goals.
Does the organization use any type of virtualization?
Start the “virtualization conversation” with this simple question. If they do use virtualization, get ready to take a few notes.
How does the organization use virtualization?
Get a basic understanding of how they’re using virtualization. You don’t need to know all the specifics, but knowing the basics on the common types of virtualization will help you understand what you are dealing with. With this knowledge, you can begin to see whether virtualization will have an effect on your collection.
If the client is using server virtualization, which servers are virtualized?
Finding out which servers are virtualized will help your EDD vendor when they’re speaking with your client for the first time. It can also obviate the need for further questions. If you’re only interested in the e-mail server, for instance, and they’ve only virtualized their file and Web server, you’ve just saved yourself from having to worry about server virtualization altogether.
If they are using VDI, which employees or groups use virtual machines? Where is user data stored?
While many companies are making the move to implement different types of virtualization within their organization, very few are completely virtualized. Your client, for instance, may have only a few employee groups using VDI. This could save you time if only a small group of irrelevant employee computers are virtualized.
It is also important to find out where user data for their virtualized desktops is stored. Many times, especially with less-powerful computers that access virtualized desktops stored on servers, the data may not be stored locally, but on a remote server or storage device.
If the client is using guest-host OS virtualization, which employee groups use VMs? For what purpose do they use them? Where is user data stored?
Like other virtualization types, you should note the limited use of guest-host virtualization and see if that use affects your collection. It may just be the software engineers using this type of virtualization. If they are not of interest, that’s another item you can remove from your list of things to think about.
After finding out the purpose for which these types of VMs are used, you can begin to determine if the user data stored on these VMs is worth collecting. If users are solely testing code in the virtual environment, for instance, and you are interested only in Web surfing or e-mail habits, these machines may not be relevant.
Just as with other forms of virtualization (and general computing, for that matter), data may not be stored in the first place you’d think. Find out whether user data for the VM is stored within the guest OS, the host OS or on remote storage.
Also keep in mind that guest-host desktop operating system virtualization makes it relatively easy to conceal or destroy user habits and data. If a user has full control over the VMs on their computer, deleting a virtual machine’s container file (the file or few files that contain all the VM information and possibly user data) may destroy the user-created files and traces of user habits that it contained. All of the actions the user performed (e.g., Internet use, document editing, etc.) on the VM will be lost if the data is stored within the VM. There will be no obvious signs that anything is missing, since the host operating system will continue to boot up normally. Of course, there may be traces of a virtual machine evident to computer forensic examiners, but obvious evidence pointing to a mass deletion of files may not exist.
Finally, just as with any other case, you’ll want to find out where data created within the virtual machine is stored so that it can be properly collected. This data is usually stored on the same hard drive on which the VM is located, but it can be stored elsewhere.
What virtualization software does the organization use?
Common virtualization vendors include VMware, Microsoft and Citrix. Each vendor has a multitude of different product offerings, so be sure to ask which specific product they’re using and for what purpose, if possible.
This is useful information for your EDD vendor for planning purposes; take down as much as you can and pass it along to the vendor with the other information you’ve gathered. Aside from assisting you with your collection, this information can also help you decide which EDD vendor to hire.
EFFECTIVELY COMMUNICATING WITH YOUR CLIENT AND EDD VENDOR
When working with your client’s IT staff to determine what data needs to be collected, IT may be helpful in determining what it actually is that you want. Often, however, IT may not believe that you need to be told about certain systems. This omission may occur due to a number of reasons, some malicious, others not. While on site, I routinely stumble across computer systems that IT failed to mention to legal counsel because they did not regard those systems as “important.” When informed about the systems, however, attorneys often have a different view of what is important.
Another motive for IT’s reticence is a desire not to have their equipment “messed with,” especially critical systems that must have zero downtime, or those they’ve spent hours tweaking to get just right. The bottom line is that if you don’t ask questions about the technology being used, your IT guide may not mention it. Your case depends on how well you communicate your collection and preservation needs. Keep in mind that a good EDD vendor can help you translate legalese to “tech” if you are experiencing disconnect.
Also ensure that your client (and your client’s IT department) understands the implications of a legal hold, if applicable. Regarding virtual machines specifically, there are many possibilities for inadvertent or malicious destruction of ESI. Virtual machines can be rolled back (reverted to a previously saved version, or snapshot, of that machine), which may result in the loss of all changes made to the machine since the date of the previous snapshot. Note that snapshots are not exclusive to virtualization and that many technologies and IT departments use them — so keep the concept in mind when performing all collections.
Virtual machines can also be moved from one physical system to another with relative ease. See Figure 2, below. This can result in VMs (and data) being misplaced, compromised or accidentally destroyed. You must ensure that the organization understands what ESI needs to be preserved and the consequences of failing to preserve it.
Figure 2. Moving a virtual machine. Click image to enlarge.
Remember that a competent EDD vendor will be doing almost all of the technical legwork regarding the actual collection of data. They can also help with any questions you have regarding client technology and how it affects ESI. Attorneys are not expected to be computer experts. They need to focus on: deciding what systems should be marked for collection, ensuring that clients understand their responsibilities regarding preservation and informing the EDD vendor of the basic information you’ve gathered from your client set out above.
The time it takes you to ask the right questions on virtualization and get an overview of system use will be worth it. Understanding these new technologies allows you to provide a value-added service, improves the chances of a positive case outcome and ensures the most complete ESI collection possible in a virtualized environment.
Jason Briody is an associate at Jones Dykstra & Associates, a Maryland-based consulting firm. Jones Dykstra & Associates specializes in e-discovery, computer forensics, expert witness testimony and computer intrusion response services. Jason graduated summa cum laude from Champlain College in 2008, where he received a B.S. in Computer & Digital Forensics. He can be reached at email@example.com.