RSS - Circle of 13

Tuesday, November 20, 2007

The Wayback Machine

The Time Machine
19 Nov 2007

In 1996, I opened Libtech, the UK's main exhibition for librarians. I had an urgent message to deliver. I'd noticed that web sites were not just appearing in growing numbers, they were disappearing, too. People were updating pages all the time and nobody was keeping any sort of record - not even the people running the sites.

Libraries had preserved much of our past on paper, but our digital history was almost blank, and could never be retrieved. Who would save it, if not librarians?

Fortunately someone else had not only thought about it, he had decided to do it. In 1996, Brewster Kahle of Alexa Internet in San Francisco started keeping snapshots of the web to create the Internet Archive. In 2001, he launched the Wayback Machine, making about 10 billion archived pages publicly available.

Even at this early stage, Kahle's data took up about 100 terabytes, making it about five times as large as a digital US Library of Congress. Today, it has 85 billion pages and takes up 1.5 petabytes.

Anytime and forever

Kahle's ultimate aim is even more ambitious. In a CNet interview in Second Life, he said: "We're out to help build the Library of Alexandria version 2, starting with humankind's published works, books, music, video, web pages, software, and make it available to everyone anywhere at anytime, and forever."

Kahle is also working to make this a global system, starting with another copy of the archive at the Library of Alexandria in Egypt.

As a result, anyone can now go to the Wayback Machine, paste in a web address, click the button that says Take Me Back, and see what it looked like in the otherwise dim but not so distant past.

What you get is a list of the dates when a snapshot was taken, arranged by year. Clicking a link brings up a cached copy of the page. The record is necessarily incomplete, but it's enough to track things like changes in design.

In the Guardian's case, the first page is dated a month after my talk. There are only three snapshots for 1997, and four for 1998. There are 202 for this year, so far.

History in the making

In some cases, the archive's curators have put together packages that cover historical events. These Web Collections include US elections, hurricanes Katrina and Rita, and the December 2004 Asian Tsunami.

The Wayback Machine can be useful. The web is a boulevard of broken links, and so it is the only way you can see what a site looked like before the creator lost interest, the company went bust, or a once-popular address was taken over by spammers. But even when it doesn't have what you want, it's a wonderful place to browse.

It may be frustrating when pages or vital pictures are missing, or the Wayback Machine seems to be taking an hour off. But this is a not-for-profit organisation heroically tackling an impossible task. If Kahle had left it to publishers, librarians, corporations or even governments, it wouldn't exist at all.

~ Link ~

~ The Wayback Machine ~



No comments: