When reporting recently on the battle of the citation manuals, I alluded to the difficulties of citing articles on the ever-changing World Wide Web. The manuals take the reasonable position that the writer should indicate the last date upon which the cited material was noted by the writer at the particular logical Web address, the Uniform Resource Locator, or URL. After the column was published, it was suggested there may be a better way, the Wayback Machine.

TRAVEL THROUGH TIME ON THE WEB

Astronomers, looking at light sources billions of light years away, tell us they are looking, as if through a time machine, at things that happened billions of years ago. The Internet Archive Wayback Machine is, in at least that sense, a time machine for the World Wide Web. The facility, not yet a year old, begins with the Alexa Web crawler that pulls in a Web page and stores it at a particular URL on its own site, noting the specific date upon which the page was gathered. If you later want to retrieve that page, you can, whether or not it is still on the original Web site.

If you want to see what was on President Bush’s Web Site on Sept. 12, 2001, go to and enter www.whitehouse.gov into the search box. Check the dates for which the site was archived, and click on the closest subsequent date, Sept. 13, 2001, to go to the version of the site archived on that day. The archived page has at least mostly live links and has been assigned a ‘permanent’ address, web.archive.org/web/ 20010913035103/www.whitehouse.gov., independent of the original URL. If you ever want to cite and link to that page as it existed on that date, you can do so. Days or years later, your reader will be able to paste that URL into the address box of his Web browser and surf directly to that page and read the material that you cited.

Or at least that’s the theory. In practice, things may not work quite that way. First, the Web publisher that originally published the page you want to access may not wish to have the material archived. It could have been published on a paid subscription basis, and therefore not be generally accessible to the Web crawlers, or even if it was originally free, the publisher may deem the page in question to be saleable on an archive basis. (Major newspapers do a good business in charging for archive material on Lexis, for example, while giving the same stuff away on a current basis.) Or, presumably, the page in question may be embarrassing, given events occurring since its original publication. Unless the material on the page is in the public domain, the copyright owner merely has to notify the Internet Archive folks to not archive the page, or to pull the page from the archive if it has been already incorporated into the archive, and the copyright owner’s request will be honored. (The Web publisher can also place a file into its Web space that tells the Web crawler software to not pull anything on the site.) Further, the Internet Archive site itself notes that the further a Web page departs from plain HTML, the more difficult the page may be to archive correctly.

Or something else may be going on. The White House site, for example, has many listings for Clinton-era material that doesn’t seem to be quite there. Further, that site — and a fast, very unscientific review of a dozen other sites chosen not particularly at random — shows a last archive date of Jan. 24, 2002, and nothing since then. An Internet Archive representative told me that they get information from Alexa only every six months, and that a new data load is expected in the immediate future. Apparently, the computers and mass storage to house what may be 50 terabytes or more of data have yet to arrive, also, but they are “on order.”

It is clear that the Wayback Machine can be useful; it is also clear that it is a source that is going to have to prove itself. Bookmark the URL, and if you need information it may have, give it a try. And if it turns out to have archived an article that you need, make a note of the archived URL as well as the original one. But it still is a good idea to note the date when you visited either the archive or original site, just in case.

When you visit, by the way, take a look at the listings of e-books, pre-1964 motion pictures and other material archived at the same site. It will be worth the trouble.

ALEXA

A couple of years ago I downloaded and tested an Alexa toolbar for Microsoft’s Internet Explorer that was supposed to revolutionize my research on the Web. I really don’t remember exactly what the software did, but I was underwhelmed, deleted it from my computer, and didn’t bother to mention it in a column. But while researching the Wayback Machine, I took another look at the Alexa Web site. Alexa is now “powered by Google,” and, among other things, lets the user check out details about “every site on the Web.”

I decided to test Alexa with the site of large, Chicago-based international firm Sidley Austin Brown & Wood, www.sidley.com. According to Alexa, the Sidley Austin site has an “Avg. Traffic Rank” of 120,127, and people who visited the site also visited the Web sites of Skadden, Arps, Slate, Meagher & Flom; McDermott, Will & Emery; Clifford Chance; Heller Ehrman White & McAuliffe; Fredrikson & Byron; Dorsey & Whitney; Cozen O’Connor; and Baker & McKenzie to name those at the top of the list. That sounded reasonable, but I was more skeptical of Alexa’s statement that people interested in the Sidley Austin firm also bought Stone Cold Steve Austin Skateboards and a variety of Austin Powers memorabilia, and I wondered if the “other sites visited” information was real, or merely a listing of other well-known law firms.

In addition, Alexa reports daily traffic over the past year, sites that link to the site in question and user “reviews” of the site. (As of this writing, Sidley.com had no reviews.)

The information presented by Alexa is fascinating, answering such important questions as: How does your site compare with that of similar firms? Fascinating, that is, until you read the background information published on the site and realize that all of the statistics gathered refer to users of the Alexa toolbar. The same one, I would guess, that I tossed away. Is the sample of Alexa users representative of all Web users? Probably not, meaning that the “rankings” and such should be taken, if at all, with several grains of salt. But they are sort of fun to look at, and surely provide at least a rough comparative guide as to what is going on with various sites.

NEWS FROM GOOGLE

Then there’s Google, at www.google.com, and yet another new feature, a beta edition of Google’s version of the news. It seems that Google now “continually” monitors 4000 new sources. Computer algorithms digest the “most relevant” stories, generate a headline and display a summary, plus links to what seems to be related articles in other publications, and the number of such links. Each reference states when the information was gathered — 17 hours ago, or five minutes ago. The site itself notes the number of minutes since the particular page was “Auto-Generated.”

If you would like to see articles on major topics — World, U.S., Business, Sci/Tech, Sports, Entertainment or Health — click on that heading. A search box lets you enter keywords, to search for articles currently in the news database. Try “court” and the state in which you practice, or simply the name of your favorite client, for potentially interesting results. The default hit list is ranked for “relevance,” but you can sort by date, if you wish, to bring the latest article to the top.

I previously used Google several times a day, as I surfed the Web. With its new News facility, Google becomes even more valuable.

SUMMARY

The Internet Archive Wayback Machine, webdev.archive.org/index.php, can let you look at a Web site as it existed at some time in the past, servicing as a more or less permanent Web archive. It may not have what you need, but then again, it’s certainly a good place to find things that are no longer where they used to be.

Alexa, www.alexa.com, boasts a series of interesting facts and statistics about sites on the World Wide Web, as mostly compiled from users of its downloadable Internet Explorer Tool Bar.

And finally, try the news option from www.google.com for the most useful compilation of current news reports that you’ve ever seen.


Barry D. Bayer practices law and writes about computers from his office in Illinois. You may send comments or questions to him at [email protected] or write c/o Law Office Technology Review, P.O. Box 2577, Homewood, IL 60430.