Web Archiving Framework

Overview

The purpose of the web archiving program at Bowling Green State University Libraries ("UL") is to digitally capture, preserve, and provide access to websites documenting the history and culture of BGSU and other subjects falling within the collecting missions of the UL’s Special Collections units.

The UL's web archiving program especially focuses on the following content:

  • Websites within the BGSU web domain;
  • Websites outside the BGSU domain that document the history and culture of BGSU; and
  • At-risk websites outside the BGSU domain that document other subjects within the collecting missions of the Special Collections units. "At-risk" websites can include those of time-limited interest or purpose, of unknown ownership, published by defunct organizations, no longer maintained or managed, or subject to government censorship.

Procedures

Responsibilities

The UL's Special Collections department is responsible for administering the UL's web archiving program, with specific web capture procedures falling to select staff members within each Special Collections unit.

Tools

The UL uses the Internet Archive's Heritrix web crawler, administered through its Preservica digital preservation software tenancy, to capture content for its web archives. The UL provides public access to its archived websites through Preservica's Universal Access module.

Notifications

To the fullest extent possible, the UL will notify website owners of its intent to capture their sites for its web archives.

The UL will obey website-specific instructions concerning archiving, whether expressed in machine-readable format using the robots exclusion standard or in reasonably discoverable human-readable text. In cases where these directives would prevent the archiving of content, the UL will seek permission from the site owners before proceeding.

Specific categories of websites are not subject to the notification and public access embargo requirements. These include websites hosted by BGSU or the United States Federal Government, public domain sites, sites under an open copyright license, and sites governed by individual agreements. The UL may also collect websites not falling into these excepted categories without first notifying the site owners if the sites are deemed to be particularly at-risk. A subsequent notification will still be provided before public access can be enabled.

Preservation

All websites captured under the UL's web archiving program are preserved in accordance with the UL's digital preservation framework. To the fullest extent possible, specific preservation procedures will comply with those outlined in the Curation Lifecycle Model, including but not limited to ingest, metadata creation, storage, preservation management, migration, access and use, and transformation.

Takedown Requests

For all websites selected for inclusion in the UL’s web archives (and outside the excepted categories noted above), the UL will provide an opportunity for site owners to opt-out of inclusion in the UL’s web archives. Site owners may alternatively request that public access to archived versions of their sites be disabled.

The UL acknowledges that website owners have agency over their content. If you believe the UL may have harvested your website in error, or that a site maintained in our web archives does not adequately reflect your organization, please contact the appropriate Special Collections unit.

Acknowledgements

Parts of this framework are adapted from web archiving policies and procedures at Montana State University and Stanford University.