We explore the need for digital archives like the Wayback Machine, and why they're crucial to build a record of facts on the web
View in browser
Project Liberty Foundation Logo
Recurso 31

February 20, 2024 // Did someone forward you this newsletter? Sign up to receive your own copy here.

Image from Project Liberty

Image from Project Liberty

Digital archives: a time machine for the web

 

In the summer of 2023, the New York Times ran an article titled “Ways You Can Still Cancel Your Federal Student Loan Debt.” 

 

The article outlined six ways to cancel student debt, with the final being:

"Death

This is not something that most people would choose as a solution to their debt burden."

At least that was the sixth reason until the New York Times revised it with a stealth edit. When you read the article today, choosing death as a solution to a debt burden has been replaced, but there’s no mention that this article was revised. The timestamp is still the day it was originally published.

 

If not for Internet Archive’s Wayback Machine, this discrepancy wouldn’t have been caught. The Wayback Machine is a digital archive of the internet, and as such, it captured multiple previous versions.

 

The internet is constantly being revised in ways that allow history to be rewritten and a shared sense of truth to be questioned. With AI-generated disinformation, the potential to exert control over the future by rewriting the past has never been greater.

 

This week we’re exploring how digital archives are crucial in developing a record of truth in an ever-changing web.

 

 

 

// The need for digital archives

Mark Graham, Director of the Wayback Machine, spoke with the Project Liberty Foundation and shared the key reasons why there’s an even greater need for digital archives:

Mark Graham
  • The importance of the internet. So much of what humanity publishes and makes available lives only on the internet. Given how much time we spend online, the internet has become a central medium of human expression, history, and culture.
  • The fragile and ephemeral nature of the internet. Graham shared two stats that underscore how fragile today’s internet is:
    1. A study found that of the two million hyperlinks in New York Times articles from 1996 to 2019, 25% of all links were broken (described as link rot).
    2. The Wayback Machine has fixed 20 million broken links in Wikipedia articles with the correct ones.

“The web itself is a living thing. Webpages change. They go away on quite a frequent basis. There's no backup system or version control system for the web,” Graham explained. That is, except for archives like the Wayback Machine.

//

The Wayback Machine has archived over 866 billion webpages.

//

// The Wayback Machine

The Wayback Machine is a “time machine for the web,” in Graham’s words. It allows users to trace the evolution (or disappearance) of a webpage over time, enabling them to establish a record of what happened on the internet.

  • For example, the Apple.com URL has been archived 539,000 times since its first archived page in October 1996.
  • The Wayback Machine has archived over 866 billion webpages in its 28-year history. Today, it archives hundreds of millions of webpages every day and has become one of the most important archives of online content in the world.

// How it works

    • The Wayback Machine “crawls” the web and downloads publicly accessible information. Webpages, documents, and data are stored with a time-stamped URL.
    • For information that’s not publicly accessible, Internet Archive offers web archiving services through Archive-It for 1,200 organizations in 24 countries around the world (from libraries to research institutions).
    • The Wayback Machine supports everyday people to help it archive the internet. Anyone can go to Save Page Now to archive a webpage or article.
    • The Wayback Machine partners with 1,200 fact-checking organizations globally to help it reference material on the web that was the source of disinformation. It has built a library of more than 200,000 examples where a claim has been made, and the Wayback Machine has provided additional context on if that claim is true (known as a review of the claim). 

    // Archive of facts

    Fixing links, archiving webpages, and fact-checking digital articles are part of a deeper, more important project to chronicle digital history and establish a record of facts.

    • Last month, the archive of press releases from a sitting member of Congress, New York’s Elise Stefanik, vanished after she came under scrutiny. The Wayback Machine documented this erasure and provided a time-stamped record of past versions of her website and press releases.
    • In 2018, a US Appeals court ruled that the Wayback Machine’s archive of webpages can be used as legitimate legal evidence.
    • The Internet Archive has countless examples of when the press have referenced the Wayback Machine to correct disinformation and dispel rumors. In one example from last year, the Associated Press relied on the Wayback Machine to set the record that the CDC did not say the polio vaccine gave millions of Americans a “cancer virus.”

    With the rise of AI-generated disinformation, there’s reason to believe such attempts at rewriting history (even if that history is just yesterday) will become more prevalent and the social contract that has governed web crawlers is coming to an end.

     

    // A citizen-powered web

    Building digital archives is a bulwark against those attempting to rewrite history and spread misinformation. An archived, time-stamped webpage is not just unimpeachable evidence, it’s a foundational building block of a shared sense of reality.

     

    In 2014, when Malaysia Airlines Flight 17 went down over Ukraine, the Wayback Machine captured evidence that a pro-Russian group was behind the missile attack. But it wasn’t the Wayback Machine’s algorithms that captured the evidence by crawling the internet; it was an individual who found an obscure blog post from a Ukrainian separatist leader touting the shooting down of a plane. That individual identified the blogpost as important enough to be archived, and it became a critical piece of evidence, even after that post disappeared from the internet.

     

    As Graham said, “You don't know what you got until it's gone. If you see something, save something.”


    What pages can you help archive? Archive them with the Wayback Machine on Save Page Now.

    Project Liberty Foundation roles

    // Project Liberty Foundation is seeking a Research and Governance Program Manager to cover for six months from March 2024 - September 2024. Learn more and apply here.

    Other notable headlines

    // 🏛 An article in Tech Policy Press asks, can democracy survive artificial general intelligence?

     

    // 🕵 The New York Times reported that hackers working for China, Russia, and North Korea have used OpenAI’s systems in the creation of their cyberattacks.

     

    // 🗳 According to an article in the Wall Street Journal, a new era of AI deepfakes will complicate the 2024 elections.

     

    // 🤖 What will happen when AI starts training itself? An article in The Atlantic explored the implications of AI training on AI.

     

    // 📝 An article in WIRED explored what would happen if 26 words in Section 230 were removed.

     

    // 🖥 An article in the Wall Street Journal explored how AI will lead to the end of the internet as we know it.

     

    // 🦺 AI doesn’t have to be a job destroyer. It could help rebuild the middle class, according to an article in Noema Magazine.


    // 🧑‍🤝‍🧑 It won’t be long before you know someone with an AI significant other. An article in Fast Company explored the rise of romantic chatbot apps.

     

    // 🇪🇺 Big Tech companies signed an accord in Europe to combat AI-generated election disinformation, according to an article in Euro News.

    Partner news & opportunities

    // Virtual event on online governance

    February 28th at 12pm ET

    Nathan Schneider, founder of the Media Economies Design Lab at University of Colorado, is releasing a book, Governable Spaces: Democratic Design for Online Life. In a book launch seminar with Metagov, he will explore why governance in our everyday online spaces matters. Register here.

     

    // Data Empowerment Fund: $50,000 & $100,000 grants available

    Data Empowerment Fund is open to proposals. The goal of the fund, which is powered by the Omidyar Network and other partners, is to support initiatives that enable greater individual agency or community control over data. Learn more and apply here.

    / Project Liberty Foundation is advancing responsible development of the internet, designed and governed for the common good. /

     

    Thank you for reading.

    Facebook
    LinkedIn
    X Logo (formerly Twitter)
    Instagram
    PLslashes_logo_green

    501 W 30th Street, Suite 40A,
    New York, New York, 10001
    Unsubscribe  Manage Preferences

    © 2023 Project Liberty