Sometimes people want to compare the degree of clustering or spread between two sets of numbers that have completely different units: banana weights and heights of trees, say. A sophisticated (and slightly hypothetical) manager might want to know how the various sizes of all the law firms used by a law department compare in their distribution to the third-year annuity costs it pays for all its patents. How do you compare two unrelated data sets? A statistical measure called the coefficient of variation (CV) does the trick.
The CV is the ratio of the standard deviation to the average of a set of numbers. You divide the standard deviation of a set of numbers (which Excel does instantly) by the average of those numbers (ditto).
For example, the 549 law departments that have so far provided data for the General Counsel Metrics benchmark survey total 14,951 lawyers, with a standard deviation of 84. That means about 66 percent of them are on one standard deviation on either side of the average of 27.3. (Bear in mind that the lowest number of lawyers in a department is zero, but the largest departments are near 1,000, so the distribution skews far to the right.)
When you divide the standard deviation by the average, you have the coefficient of variation: 3.1. That same calculation for paralegals and for all other legal staff is 2.3 and 2.5, respectively. The higher the CV, the greater the dispersion in the variable. Their spreads differ by more than one third. But how does the lawyer CV match spending inside, which uses dollars and lots of them, not people?
The CV for inside spending by these law departments is 2.5. Thus, even though the units and absolute amounts vary hugely (the average inside spend is $7.7 million), the CV tells us that the degree of dispersion is quite similar to the staffing figure dispersions.
The standard deviations of two variables, while both measure dispersion in their respective variables, cannot be compared to each other in a meaningful way to determine which variable has greater dispersion because they may vary greatly in their units and the averages about which they occur. The standard deviation and mean of a variable are expressed in the same units, so taking the ratio of these two allows the units to cancel.
The CV is useful because the standard deviation of data must always be understood in the context of the average value (mean) of the data. In the example, it is expressed in terms of lawyers, paralegals or other legal staff as against inside spending budgets.
The CV is independent of the unit in which the measurement has been taken, so it is what is called a dimensionless number. For comparison between data sets with different units or widely different means, it is better to use the coefficient of variation instead of the standard deviation.