Tedium - Make Digital Preservation Easier 💾

Rethinking corporate motivations for preserving digital content.

Hunting for the end of the long tail • April 07, 2021

Today in Tedium: Look, I’m not going to tell you that Yahoo Answers was the height of cultural artifacts. But the thing is, it had value. And the reason why it had was because of the amount of time that it was online, the sheer number of its answers, and its public-facing nature. But sites do not stay stationary, encased in amber, and there is significant financial motivation for large companies to only play the hits. After all, it’s why Top 40 radio isn’t all Dishwalla, all the time. But after seeing yet another situation where a longstanding Yahoo-owned website is shutting down, I’m left to wonder if the problem is that the motivations for maintaining sites built around user-generated content simply do not favor preservation, and never will without outside influence. How can we change that motivation? Today’s Tedium, in a follow-up to the post we wrote as Yahoo Groups was getting shut down, ponders the issue from the corporate perspective. — Ernie @ Tedium

Today’s Tedium is sponsored by Refind. More from them below.

“I understand your usage of groups is different from the majority of our users, and we understand your frustration. However, the resources needed to maintain historical content from Yahoo Groups pages is cost-prohibitive, as they’re largely unused.”

A statement sent to an archivist in 2019 as Verizon took steps to shut down the vast majority of the existing Yahoo Groups, the last major element of Yahoo’s user-generated content apparatus that was dismantled, with Groups meeting its maker a little over a year ago. It’s worth keeping in mind that at the scale Verizon works—making billions of dollars per year, on average—the costs of continuing to host such content would have been relatively minimal—especially given the fact that, uh, it owns a big chunk of the network through which that content is distributed.

Denny muller 1q L31aac APA unsplash

(Denny Muller/Unsplash)

The problem with corporate motivations is that they aren’t the same as the user’s, even when the user made the content

Whether Google, Verizon, Disney, Nintendo, or Sony, the corporate motivations for keeping content available online for long periods differ greatly from the motivations that drive external visitors.

Users very much have an expectation of permanence just as they did with physical media, but in the context of online distribution, these companies have competing interests driving their decision-making that discourage them from not taking steps to protect historic or vintage content.

And in the case of user-generated content, there might be outside considerations at play. Perhaps they are concerned that something within an old user agreement might come to bite them if they leave a website online past its sell-by date, opening up to liabilities. Perhaps the concern is old, outdated code that may look novel on the outside but is effectively a potential attack surface in the wrong hands. After all, if they’re not keeping an eye on it, who’s to say someone can’t take advantage of that?

And then there are reasons that are a little more consumer-hostile. Nintendo recently ended sales for a bunch of old Mario content in both digital and physical form. It evokes the old gating of home video releases that Disney used to do in an effort to keep its old content fresh and make more money from that old content.

When it comes to websites, though, much of that content is user-generated, even if a technology company technically maintains it. I have to imagine that there’s an expectation that a company only has limited capability for maintenance costs, and the motivation for doing so is limited.

But on the other hand, as digital preservationist David Rosenthal has pointed out, in the grand scheme, preservation is not really all that expensive. The Internet Archive has a budget—soup to nuts—of around $20 million or less per year, around half of which goes to pay for the salaries of the staff. And while they don’t get all of it (in part because they can’t!), they cover a significant portion of the entire internet, literally millions of websites. They have a fairly complex infrastructure, with some of its 750 servers online for as long as nine years and petabyte capacity in the hundreds, but given that they are trying to store decades worth of digitized content—including entire websites that were long-ago forgotten—it’s pretty impressive!

So the case that it costs too much to continue to simply publicly host a site that contains years of historically relevant user-generated content is bunk to me. It feels like a way of saying “we don’t want to shoulder the maintenance costs of this old machine,” as if content generated by users can be upgraded in the same way as a decade-old computer.

One thought I have is that this issue repeatedly comes up because the motivations for corporations naturally lean in favor of closure when the financial motivation has dried up. Legislation could be one way to manage this to sort of right the axis in favor of preservation—but legislation could be difficult to pass. (This was the crux of my case for trying to make the legislation for the National Register of Historic Places apply to websites.)

In my frustration about this issue last night on Twitter, I found myself arguing for legislation that balances liability in favor of preservation of public-facing content. But I’m a realist—a law like that would have many moving parts and may be a tough sell. So, if we can’t encourage a law, maybe we need to build strategies to make maintaining a historic website easier to lift.

Refind

Refind — Get a little bit smarter every day. You're eager to learn new things, but overwhelmed with too much content? Give Refind a try. Get a daily selection of links that move you forward, tailored to your interests. The best from all around the web, curated by experts and our algorithm. Sign up for free here.

2012

The year that the genealogy platform Ancestry.com launched a new site, Newspapers.com, to offer paid archives of newspapers to interested parties. The company, which charges about $150 per year for access to the archive, has helped maintain access to the historic record for researchers who need it. (I’m a subscriber and it is worth it.) With the exception of paid services for Usenet like Giganews, this model has not really been tried for vintage digital-only content, which seems like a major missed opportunity for companies raising concerns about financial costs for maintaining old platforms, like Yahoo/Verizon. Certainly I would prefer it to be free, but if I had to have a choice between free and non-existent, I’d pay money to access old content. Just throwing that out there.

Ethan hoover e IVJ Akj1u Cs unsplash

(Ethan Hoover/Unsplash)

A middle ground: An “analog nightlight” mode for websites

In some ways, I think that part of the motivation for taking down old or outdated websites is the expectation that the internal systems must also stay online.

But I think archivists and historians would be more than happy if public-facing content—that is, content that appeared on search engines, or was a part of the main experience when logged in at a basic level—was prioritized and protected in some way, which would at least keep the information alive even if its value was limited.

There’s something of a comparison here that I’d make: When the U.S. dropped the vast majority of its analog signals in favor of digital tuning, it led to something called the “analog nightlight,” in which very minimal, basic information was presented on analog stations was presented during the period before it was turned off. A TV host parlayed basic information to viewers about the transition, and told them what to do next. It didn’t entirely work—TV stations in smaller markets didn’t actually air the analog nightlight—but it helped give a sense of continuity as a new medium found its footing.

This approach, to me, feels like a path forward that could minimize the crushing pain of a loss of historic content while taking away much of the risks that come with continuing to host a site that may no longer be popular in the modern day but still continues to have value in a long-tail sense.

In the case of an “analog nightlight” equivalent for websites, the goal would be to essentially to shut down any sort of attack surface through good design and planning. Before the site is taken offline in its original form, users are given the chance to download their old content or remove it from the website over a period of, say, 60 days. This is not too dissimilar to the warnings that site operators offer when they shut down currently—and looks like what Yahoo Answers is doing.

But once the deadline is hit, the site operators launch a minimal version of the original platform, with no way to log in or comment. The information is static, and there’s no directly accessible backend. That’s actually the important part of this—the site needs to be untethered from its original content-management system so no new content can be added. Instead, the content would be served up as a barebones static site (perhaps with advertising, if they roll that way), so as to minimize the “attack surface” left by a site that is not actively being maintained.

This reflects relatively recent best practice in the content-management space. Platforms like Netlify have gained popularity in recent years because they actively separate the form of distribution from the means of production, meaning that security risks are minimized. This is a great approach for live-production sites, but for sites that are intentionally meant to stay static, it removes one of the biggest risk factors that might discourage a content owner from continuing to maintain the work.

As far as liability concerns go, language could be included on the page to allow for users to remove old content if they so choose, along the lines of the “right to be forgotten” measure of the European Union’s General Data Protection Regulation (GDPR), though that measure includes a carve-out for purposes of historical research, which an archived version of a website would presumably cover. But the thing is, sites that are driven by user-generated content are generally protected by Section 230 in the United States anyway, so the onus for liability for the content itself falls onto the end user.

And if, even after these steps, a company still feels uncomfortable about hosting a dead website, they should reach out to librarians and archivists to donate the collection for maintenance purposes—perhaps with a corresponding donation to said nonprofit so they can cover the hosting costs. The Internet Archive actually offers a service like this!

The one site that makes me think that a model like this could work is Gawker. The news and gossip site, which was taken offline by the combination of a lawsuit and a corporate asset sale that specifically excluded it, remains online nearly five years after its closure in a mode very similar to this. Comments are closed and not visible to end users, which is a true shame as those comments often fed into the writing. But the content—the part that was truly valuable and important—is still out there, accessible and readable, even if you can’t do anything with it other than read it.

There are no ads. It’s a shrine to a platform that a lot of people cared about, even if others found it controversial. And there’s no reason what Gawker did couldn’t work in an equivalent way for Yahoo Answers.

Look, I’m going to be the first to fully admit that the motivations for protecting publicly accessible user-generated content simply remain only if the owner of that content feels “nice” about it.

And even then it feels like a bit of a surprise.

Space Jam Website

It’s still online, but it moved.

Over the weekend, Warner Bros. got a little bit of flak for replacing its long-online Space Jam website, which dated back a quarter-century in its original form, with site for the sequel. But I think what the company did was actually shockingly noble. They not only left the old site online, but they made it accessible from the new one. The work done to maintain this was not perfect—I think they should do archivists a solid by putting in 301 redirects on the old URLs of the vintage site, so they go to the new place—but the fact that they showed the initiative at all is incredibly impressive given what we’ve seen of corporate motivations when it comes to preservation.

Honestly, part of this was a result of people who were associated with the website’s creation still being at the company years later and being willing to speak up for preserving it—a 2015 Rolling Stone article explains that the site actually briefly was taken down after it went viral in 2010, only for employees involved in the creation of the site (now with leadership roles in the company) to swoop in and save it after some executive made the call to shut it down.

“If we had left the company, the site probably would not exist today,” said Andrew Stachler, one of the employees involved with saving the effort. “It would’ve gone down for good at that time.”

But imagine if they weren’t there. We’d be telling a different story right now.

And perhaps that’s what many companies need—someone who is willing to go to bat for the purposes of archival and protection of historic content.

In the digital age, preservation is the act of doing nothing but minimal upkeep and being comfortable with that fact. As proven time and time again, companies are more than comfortable with killing services entirely rather than leaving well enough alone.

Perhaps the way to save user-generated content is by making it as painless as possible to keep the status quo.

--

So yep, another rant from me on preserving internet history. Find this one an interesting read? Share it with a pal!

And thanks again to Refind for sponsoring.

Share this post:

follow on Twitter | privacy policy | advertise with us

Copyright © 2015-2021 Tedium, all rights reserved.

Disclosure: From time to time, we may use affiliate links in our content—but only when it makes sense. Promise.

unsubscribe from this list | view email in browser | sent with Email Octopus

Older messages

Bridge To Nowhere ♠️♥♦♣

Friday, April 2, 2021

Why nobody understands bridge anymore. Here's a version for your browser. Hunting for the end of the long tail • April 02, 2021 Today in Tedium: As you probably know about me, I used to spend years

They Might Be Trailblazers 🪗

Wednesday, March 31, 2021

How They Might Be Giants turned tech into music. Here's a version for your browser. Hunting for the end of the long tail • March 31, 2021 Hey all, Ernie here with a piece from David Buck about one

No Shortage of GPUs Here 🖥

Friday, March 26, 2021

Understanding what makes a GPU a GPU. Here's a version for your browser. Hunting for the end of the long tail • March 26, 2021 Hey all, Ernie here with a piece from Andrew Egan, who has a story to

Novell Cooperation 💾

Wednesday, March 24, 2021

The company that nearly brought MacOS to the PC in the '90s. Here's a version for your browser. Hunting for the end of the long tail • March 24, 2021 Today in Tedium: “Whatever is good for our

Newsletter, Untethered ✍️

Friday, March 19, 2021

You don't need Substack to send a great newsletter. Here's a version for your browser. Hunting for the end of the long tail • March 19, 2021 Today in Tedium: Recently, I got an opportunity to

You Might Also Like

📧 Introduction to Distributed Tracing With OpenTelemetry in .NET

Saturday, April 20, 2024

​ Introduction to Distributed Tracing With OpenTelemetry in .NET Read on: m​y website / Read time: 5 minutes BROUGHT TO YOU BY ​ Shesha: The .NET Open-Source Low-Code Framework ​ Introducing Shesha, a

a16z’s Infrastructure team gets a new general partner

Friday, April 19, 2024

Post News is shutting down and Wall Street isn't feeling a Salesforce-Informatica pairing View this email online in your browser By Christine Hall Friday, April 19, 2024 Image Credits: Andreessen

New Roundtable! Additive for Mass Production Applications

Friday, April 19, 2024

The Outlook for the Future View this email in your browser engineering.com Roundtable - Additive for Mass Production Applications: The Outlook for the Future 6 Considerations for Choosing the Right

📷 What to Know About Macro Photography — Why You Should Buy a Budget Motherboard

Friday, April 19, 2024

Also: How to Automatically Highlight Values in Excel, and More! How-To Geek Logo April 19, 2024 📩 Get expert reviews, the hottest deals, how-to's, breaking news, and more delivered directly to your

Is the wind going out of the AI sails?

Friday, April 19, 2024

Rippling vacuums up venture capital and Ramp bags more millions View this email online in your browser By Haje Jan Kamps Friday, April 19, 2024 Image Credits: Getty Images / Carol Yepes Welcome to

Llama 3 is out - Weekly News Roundup - Issue #463

Friday, April 19, 2024

Plus: brand-new, all-electric Atlas; AI Index Report 2024; Microsoft pitched GenAI tools to US military; Humane AI Pin reviews are in; debunking Devin; and more! ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Daily Coding Problem: Problem #1417 [Easy]

Friday, April 19, 2024

Daily Coding Problem Good morning! Here's your coding interview problem for today. This problem was asked by Wayfair. You are given a 2 x N board, and instructed to completely cover the board with

Charted | How Hard Is It to Get Into an Ivy League School? 🎓

Friday, April 19, 2024

We detail the admission rates and average annual cost for Ivy League schools, as well as the median SAT scores required to be accepted. View Online | Subscribe Presented by: Discover the motivations

Dark Matter & Tortured Poets

Friday, April 19, 2024

New music releases aren't what they used to be -- for good and bad. Dark Matter & Tortured Poets By MG Siegler • 19 Apr 2024 View in browser View in browser New music releases in 2024 are a

Impact of AI on Product Management

Friday, April 19, 2024

​ Impact of AI on Product Management The rise of the AI Product Manager. Product managers have always championed customer's needs. However, with AI, the job requires new technical and ethical