Three Tools for Web Research

Barry Bayer tests several of the options available to researchers who want to find information on the Internet: Alexa, which features users' rankings of other sites on the Web; Google's recently debuted news site; and the Wayback Machine, for archived site data as it existed "then." Which is the best site? It all depends on what you're looking for.

When reporting recently on the battle of the citation manuals, I alluded to the difficulties of citing articles on the ever-changing World Wide Web. The manuals take the reasonable position that the writer should indicate the last date upon which the cited material was noted by the writer at the particular logical Web address, the Uniform Resource Locator, or URL. After the column was published, it was suggested there may be a better way, the Wayback Machine.

TRAVEL THROUGH TIME ON THE WEB

Astronomers, looking at light sources billions of light years away, tell us they are looking, as if through a time machine, at things that happened billions of years ago. The Internet Archive Wayback Machine is, in at least that sense, a time machine for the World Wide Web. The facility, not yet a year old, begins with the Alexa Web crawler that pulls in a Web page and stores it at a particular URL on its own site, noting the specific date upon which the page was gathered. If you later want to retrieve that page, you can, whether or not it is still on the original Web site.

If you want to see what was on President Bush’s Web Site on Sept. 12, 2001, go to and enter www.whitehouse.gov into the search box. Check the dates for which the site was archived, and click on the closest subsequent date, Sept. 13, 2001, to go to the version of the site archived on that day. The archived page has at least mostly live links and has been assigned a ‘permanent’ address, web.archive.org/web/ 20010913035103/www.whitehouse.gov., independent of the original URL. If you ever want to cite and link to that page as it existed on that date, you can do so. Days or years later, your reader will be able to paste that URL into the address box of his Web browser and surf directly to that page and read the material that you cited.

Or at least that’s the theory. In practice, things may not work quite that way. First, the Web publisher that originally published the page you want to access may not wish to have the material archived. It could have been published on a paid subscription basis, and therefore not be generally accessible to the Web crawlers, or even if it was originally free, the publisher may deem the page in question to be saleable on an archive basis. (Major newspapers do a good business in charging for archive material on Lexis, for example, while giving the same stuff away on a current basis.) Or, presumably, the page in question may be embarrassing, given events occurring since its original publication. Unless the material on the page is in the public domain, the copyright owner merely has to notify the Internet Archive folks to not archive the page, or to pull the page from the archive if it has been already incorporated into the archive, and the copyright owner’s request will be honored. (The Web publisher can also place a file into its Web space that tells the Web crawler software to not pull anything on the site.) Further, the Internet Archive site itself notes that the further a Web page departs from plain HTML, the more difficult the page may be to archive correctly.

Or something else may be going on. The White House site, for example, has many listings for Clinton-era material that doesn’t seem to be quite there. Further, that site — and a fast, very unscientific review of a dozen other sites chosen not particularly at random — shows a last archive date of Jan. 24, 2002, and nothing since then. An Internet Archive representative told me that they get information from Alexa only every six months, and that a new data load is expected in the immediate future. Apparently, the computers and mass storage to house what may be 50 terabytes or more of data have yet to arrive, also, but they are “on order.”

It is clear that the Wayback Machine can be useful; it is also clear that it is a source that is going to have to prove itself. Bookmark the URL, and if you need information it may have, give it a try. And if it turns out to have archived an article that you need, make a note of the archived URL as well as the original one. But it still is a good idea to note the date when you visited either the archive or original site, just in case.

When you visit, by the way, take a look at the listings of e-books, pre-1964 motion pictures and other material archived at the same site. It will be worth the trouble.

ALEXA

A couple of years ago I downloaded and tested an Alexa toolbar for Microsoft’s Internet Explorer that was supposed to revolutionize my research on the Web. I really don’t remember exactly what the software did, but I was underwhelmed, deleted it from my computer, and didn’t bother to mention it in a column. But while researching the Wayback Machine, I took another look at the Alexa Web site. Alexa is now “powered by Google,” and, among other things, lets the user check out details about “every site on the Web.”

I decided to test Alexa with the site of large, Chicago-based international firm Sidley Austin Brown & Wood, www.sidley.com. According to Alexa, the Sidley Austin site has an “Avg. Traffic Rank” of 120,127, and people who visited the site also visited the Web sites of Skadden, Arps, Slate, Meagher & Flom; McDermott, Will & Emery; Clifford Chance; Heller Ehrman White & McAuliffe; Fredrikson & Byron; Dorsey & Whitney; Cozen O’Connor; and Baker & McKenzie to name those at the top of the list. That sounded reasonable, but I was more skeptical of Alexa’s statement that people interested in the Sidley Austin firm also bought Stone Cold Steve Austin Skateboards and a variety of Austin Powers memorabilia, and I wondered if the “other sites visited” information was real, or merely a listing of other well-known law firms.

In addition, Alexa reports daily traffic over the past year, sites that link to the site in question and user “reviews” of the site. (As of this writing, Sidley.com had no reviews.)

The information presented by Alexa is fascinating, answering such important questions as: How does your site compare with that of similar firms? Fascinating, that is, until you read the background information published on the site and realize that all of the statistics gathered refer to users of the Alexa toolbar. The same one, I would guess, that I tossed away. Is the sample of Alexa users representative of all Web users? Probably not, meaning that the “rankings” and such should be taken, if at all, with several grains of salt. But they are sort of fun to look at, and surely provide at least a rough comparative guide as to what is going on with various sites.

NEWS FROM GOOGLE

Then there’s Google, at www.google.com, and yet another new feature, a beta edition of Google’s version of the news. It seems that Google now “continually” monitors 4000 new sources. Computer algorithms digest the “most relevant” stories, generate a headline and display a summary, plus links to what seems to be related articles in other publications, and the number of such links. Each reference states when the information was gathered — 17 hours ago, or five minutes ago. The site itself notes the number of minutes since the particular page was “Auto-Generated.”

If you would like to see articles on major topics — World, U.S., Business, Sci/Tech, Sports, Entertainment or Health — click on that heading. A search box lets you enter keywords, to search for articles currently in the news database. Try “court” and the state in which you practice, or simply the name of your favorite client, for potentially interesting results. The default hit list is ranked for “relevance,” but you can sort by date, if you wish, to bring the latest article to the top.

I previously used Google several times a day, as I surfed the Web. With its new News facility, Google becomes even more valuable.

SUMMARY

The Internet Archive Wayback Machine, webdev.archive.org/index.php, can let you look at a Web site as it existed at some time in the past, servicing as a more or less permanent Web archive. It may not have what you need, but then again, it’s certainly a good place to find things that are no longer where they used to be.

Alexa, www.alexa.com, boasts a series of interesting facts and statistics about sites on the World Wide Web, as mostly compiled from users of its downloadable Internet Explorer Tool Bar.

And finally, try the news option from www.google.com for the most useful compilation of current news reports that you’ve ever seen.

Barry D. Bayer practices law and writes about computers from his office in Illinois. You may send comments or questions to him at [email protected] or write c/o Law Office Technology Review, P.O. Box 2577, Homewood, IL 60430.

Featured Firms

Law Offices of Gary Martin Hays & Associates P.C.

75 Ponce De Leon Ave NE Ste 101

Atlanta, GA 30308

(470) 294-1674

www.garymartinhays.com

Law Offices of Mark E. Salomone

2 Oliver St #608

Boston, MA 02109

(857) 444-6468

www.marksalomone.com

Smith & Hassler

1225 N Loop W #525

Houston, TX 77008

(713) 739-1250

www.smithandhassler.com

Presented by BigVoodoo

More From ALM

When you put profits first, you create a structure that allows success to flow throughout your firm.

Revenue, Profit, Cash: Managing Law Firms for Success

Brought to you by Juris Ledger
Download Now

There is no one-size-fits-all model for legal remote work, but there are some considerations that are important for every firm and legal department to address before choosing whether to adopt or update a remote work framework.

The Ultimate Guide to Remote Legal Work

Brought to you by Filevine
Download Now

Gain a comprehensive understanding of the Corporate Transparency Act (CTA) and its implications for law firms and their clients.

Law Firm Operational Considerations for the Corporate Transparency Act

Brought to you by Wolters Kluwer
Download Now

Premium Subscription

With this subscription you will receive unlimited access to high quality, online, on-demand premium content from well-respected faculty in the legal industry. This is perfect for attorneys licensed in multiple jurisdictions or for attorneys that have fulfilled their CLE requirement but need to access resourceful information for their practice areas.
View Now

Team Accounts

Our Team Account subscription service is for legal teams of four or more attorneys. Each attorney is granted unlimited access to high quality, on-demand premium content from well-respected faculty in the legal industry along with administrative access to easily manage CLE for the entire team.
View Now

Bundle Subscriptions

Gain access to some of the most knowledgeable and experienced attorneys with our 2 bundle options! Our Compliance bundles are curated by CLE Counselors and include current legal topics and challenges within the industry. Our second option allows you to build your bundle and strategically select the content that pertains to your needs. Both options are priced the same.
View Now

From Data to Decisions

Dynamically explore and compare data on law firms, companies, individual lawyers, and industry trends.

Exclusive Depth and Reach.

Law.com Compass includes access to our exclusive industry reports, combining the unmatched expertise of our analyst team with ALM’s deep bench of proprietary information to provide insights that can’t be found anywhere else.

Big Pictures and Fine Details

Law.com Compass delivers you the full scope of information, from the rankings of the Am Law 200 and NLJ 500 to intricate details and comparisons of firms’ financials, staffing, clients, news and events.

BenefitsPRO Broker Expo 2024

April 29, 2024 - May 01, 2024
Aurora, CO

The premier educational and networking event for employee benefits brokers and agents.

Learn More

Pennsylvania Legal Awards 2024

May 15, 2024
Philadelphia, PA

The Legal Intelligencer honors lawyers leaving a mark on the legal community in Pennsylvania and Delaware.

Learn More

Consulting Leaders in Technology 2024

May 16, 2024
Dallas, TX

Consulting Magazine recognizes leaders in technology across three categories Leadership, Client Service and Innovation.

Learn More

Legal Writing & Research Professional

Atlanta s John Marshall Law School is seeking to hire one or more full-time, visiting Legal WritingInstructors to teach Legal Research, Anal...

Apply Now ›

Labor & Employment Associate CT

Shipman is seeking an associate to join our Labor & Employment practice in our Hartford, New Haven, or Stamford office. Candidates shou...

Apply Now ›

Associate General Counsel

Evergreen Trading is a media investment firm headquartered in NYC. We help brands achieve their goals by leveraging their unwanted assets to...

Apply Now ›

MELICK & PORTER, LLP

04/15/2024
Connecticut Law Tribune

MELICK & PORTER, LLP PROMOTES CONNECTICUT PARTNERS HOLLY ROGERS, STEVEN BANKS, and ALEXANDER AHRENS

View Announcement ›

Apruzzese, McDermott, Mastro & Murphy, P.C.

04/11/2024
New Jersey Law Journal

Professional Announcement

View Announcement ›

Robbins Alloy Belinfante Littlefield

04/08/2024
Daily Report

Daily Report 1/2 Page Professional Announcement 60 Days

View Announcement ›

Three Tools for Web Research

Share with Email

Thank you for sharing!

Trending Stories

Featured Firms

More From ALM

Subscribe to Law.com