Preserving Digital Materials
Ross Harvey and Jaye Weatherburn
ROWMAN & LITTLEFIELD
Lanham • Boulder • New York • London
Published by Rowman & Littlefield
A wholly owned subsidiary of The Rowman & Littlefield Publishing Group, Inc.
4501 Forbes Boulevard, Suite 200, Lanham, Maryland 20706
Unit A, Whitacre Mews, 26-34 Stannary Street, London SE11 4AB
Copyright © 2018 by Rowman & Littlefield
All rights reserved. No part of this book may be reproduced in any form or by any electronic or mechanical
means, including information storage and retrieval systems, without written permission from the
publisher, except by a reviewer who may quote passages in a review.
British Library Cataloguing in Publication Information Available
Library of Congress Cataloging-in-Publication Data
Names: Harvey, D. R. (Douglas Ross), 1951–, author. | Weatherburn, Jaye, author.
Title: Preserving digital materials / Ross Harvey and Jaye Weatherburn.
Description: Third edition. | Lanham, Maryland : Rowman & Littlefield, 2018. | Includes bibliographical
references and index. | Description based on print version record and CIP data provided by publisher;
resource not viewed.
Identifiers: LCCN 2017047110 (print) | LCCN 2017047496 (ebook) | ISBN 9781538102985 (electronic) |
ISBN 9781538102961 (hardcover : alk. paper) | ISBN 9781538102978 (pbk. : alk. paper)
Subjects: LCSH: Digital preservation.
Classification: LCC Z701.3.C65 (ebook) | LCC Z701.3.C65 H37 2018 (print) | DDC 025.8/4—dc23
LC record available at https://lccn.loc.gov/2017047110
The paper used in this publication meets the minimum requirements of American National Standard
for Information Sciences—Permanence of Paper for Printed Library Materials, ANSI/NISO Z39.48-1992.
Printed in the United States of America
This book is dedicated to the collaborative and generous
international digital preservation community,
and to the diverse new participants
staking a welcome claim in this field.
List of Tables
Part I: Why Do We Preserve Digital Materials?
Chapter 1 Preservation in the Digital Age
What Exactly Are We Trying to Preserve?13
How Long Are We Preserving Digital Materials?14
Reshaping Preservation Practice14
Chapter 2 The Need for Digital Preservation
Why Preserve Digital Materials?17
Expanding the Pool of Stakeholders22
How Much Data Have We Lost?25
Current State of Awareness28
Needs and Responsibilities30
Part II: What Digital Materials Are We Preserving?
Chapter 3 Digital Artifacts, Digital Objects, Storage
Modes of Digital Death38
Digital Storage Media41
Digital Objects—More Than Digital Artifacts50
Commercial or Cultural Heritage Imperatives?52
Chapter 4 Selection for Preservation
Professional Practice and Selection for Preservation57
Traditional Selection Criteria59
Intellectual Property Rights, Context, Stakeholders, and Lifecycle Models62
Developing Selection Frameworks for Digital Materials66
How Much to Select?68
The Ever-Changing Nature of Selection69
Chapter 5 Requirements for Successful Digital Preservation
Digital Objects, Technology, and Data73
The Importance of Preserving Context77
The OAIS Reference Model78
The Role of Metadata80
Trustworthy Digital Repositories90
The Essence of Digital Materials92
Part III: How Do We Preserve Digital Materials?
Chapter 6 Digital Preservation Strategies I
Historical Overview of Digital Preservation Strategies101
Categorizing Digital Preservation Strategies102
Procedures and Guidelines105
“Preserve Technology” Approaches107
“Preserve Objects” Approaches110
Interim Measures for Long-Term Solutions114
Chapter 7 Digital Preservation Strategies II
Digital Archaeology and Digital Forensics119
Standard Data Formats126
Combining Strategies for Effective Digital Preservation137
Chapter 8 Case Studies
Research Data Curation143
Personal Digital Archiving156
Preservation of New Media Art160
Part IV: Collaboration and the Future
Chapter 9 Digital Preservation Initiatives
A Brief History of Digital Preservation Initiatives172
Increasingly Collaborative and International191
Chapter 10 The Future of Digital Preservation
What Have We Learned So Far?197
Digital Preservation is Maturing206
About the Authors
Storage and Handling of Digital Storage Media
Open-Source Compared to Proprietary Software
Open and Proprietary Formats
Table 9.1Initiatives Noted in Chapter 9 of Each Edition of
Preserving Digital Materials173
The first edition of Preserving Digital Materials was published at a time when there was too little
recognition and too little urgency about the threats posed to digital materials. It was immensely
welcome at that time; however, the need has changed because the good guidance provided in
the first edition and efforts from organizations like the Digital Preservation Coalition (DPC) mean
that we are closer than we were then to providing meaningful solutions for digital preservation.
That is to be celebrated. But the digital preservation challenge is not static. The root causes of
obsolescence are embedded within enduring structures of the information technology sector, so
as technology continues to emerge, the challenges we face have become more complicated. We
also have to recognize that solutions can become inadvertent barriers, as new jargon and acronyms mean we now too frequently speak an obscure dialect that can stand in the way of progress.
All of this means our messages have become more subtle and more complex, and thus more
difficult to communicate. And all of this means that the third edition of this book is as welcome
and as necessary as ever.
The immediate audience for this book is that brave but growing readership who take upon themselves—whether by choice, statute, or business need—the delivery of digital preservation on a
practical basis. These people are found in many different sectors and different types of organizations: global corporations, national and local memory institutions, higher education and research
institutions, broadcasters, strategic investors and funding agencies, and professional bodies. The
digital preservation community continues to grow in diversity and in geographical range, but each
segment of this growing community brings its own requirements. We run the risk of failing to
communicate effectively about the risks that have already been faced and the solutions that have
already been developed. With an emerging community, there is a significant challenge of fragmentation. This growth in numbers and diversity is welcome, but the weight of numbers does not
mean that the digital preservation challenge is getting easier. It can also have the perverse effect
of making the challenge greater.
A recent analysis of digital preservation agencies undertaken in the context of the DPC’s strategic plan gives insight into the context where this book will be most effective. The context has
many components, and some might recognize the basis of a PESTLE analysis1 as I describe the
1. Digital preservation happens in a constantly dynamic context of political volatility. The coming
years are likely to be a period of political instability in the United Kingdom in particular, but also
for DPC members around the world who may experience different but equally disruptive challenges. Governments and senior policy makers may have little regard for digital preservation, so
we must continue to advocate for its importance. Awareness raising is a constant and fluctuating
challenge (this is true inside as well as outside of large organizations). As a result, advocacy for
digital preservation should be seen as a process, not an event. So we must be ready to explain
in clear terms the value of what we do, and the impact of our actions, as a contribution to the
effectiveness of an organization.
2. Digital preservation is not immune to economic pressures. And if there is economic uncertainty,
then the opportunities to develop digital preservation solutions will be uncertain too. But if we
could only be better at explaining it, digital preservation has a good story to pass on. Global business and industry seek to refine smarter and more responsive products and services to deliver
higher value results on fixed or reduced budgets. The right data to the right people at the right
time in a format they can use—that not only sounds like digital preservation, but it’s hard to see
how anyone could dispute the return on investment.
3. A significant change to our context in recent years has been not simply dealing with the “data deluge”
but recognizing how the proliferation of sources has created the conditions for the erosion of authority
in previously trusted sources. So our communities face a new challenge to ensure that the continuously expanding digital legacy can be validated as an authentic and authoritative record.
4. Our generation is digital by default. Whereas the first edition of this book was published in
2005—in a decade that saw a transition from paper to electronic recordkeeping—the “digital
first” transition has now been completed in many sectors. Where digital is the norm, digital preservation needs to become a mainstream activity. Thus, digital preservation skills need to be embedded across entire workforces, and digital preservation actions need to be embedded within
5. The last decade has seen a growth in commercial digital preservation tools and services, delivering
great benefit in ensuring the longevity of digital resources. So whereas previously the DPC emphasized the need for solutions, we now need to ensure that solutions are tightly coupled to needs,
both now and in the future. The growth of the digital preservation community has been a significant success, but it has been at the price of fragmentation. Tool developers can find it hard to
reach a market, meaning their solutions are underdeployed and their investment underexploited;
standards developers can struggle to achieve consensus, meaning their approaches are underconsulted or ignored; and problem owners can find it hard to locate fitting solutions, increasing
the short-term risks of data loss and the long-term costs of deployment.
6. The proliferation of data and associated challenges of capture, management, access, and security are
set against fixed or falling budgets. David Rosenthal has shown that the data bubble means that
organizations will barely be able to retain the vast amounts of data they create. So, on one hand,
digital preservationists need to work smarter year on year simply to stand still, and, on the other,
a disposal strategy is empowered when preservation is assured.
7. Digital preservation happens in an increasingly sophisticated regulatory environment. The upward trends in data creation and consumption are further complicated by changing requirements in information governance, as politicians, security services, and courts adapt to emerging
expectations about data retention, privacy, and intellectual property. The publication of this
book coincides in particular with the implementation in Europe of the General Data Protection
Regulations. There is a risk that ill-informed interpretation of these regulations could provoke
ill-judged disposal. However you view this trend, it will be a game-changer in how data are produced and retained.
8. The last decade has seen significant progress toward reduced carbon consumption. Although information and communication technology is only a small contributor to overall carbon emissions,
greener information technology based on consolidated data storage is not a threat to digital preservation, but an opportunity.
Data and systems now form a distinctive element of corporate and personal identity. In the context of burgeoning digital resources, a determined effort to identify, document, and retain data of
enduring value means that the right data are available to the right people, at the right time, in the
right format: such an effort brings efficiencies of scale and scope to corporations, agencies, and
individuals. It enables planned disposal and deletion. Digital preservation enables the consolidation of legacy systems: without it, agencies are forced to maintain and repair a profusion of redundant systems, which adds cost and reduces effectiveness. In the last decade, we have learned
that digital preservation is not simply an investment in data: it is an investment in distinctiveness,
competence, and competitiveness.
executive director, Digital Preservation Coalition
1.The mnemonic PESTLE signifies Political, Economic, Social, Technological, Legal, and Environmental.
An analysis of an environment from these perspectives is applied during planning processes to keep
track of relevant factors. For further information, see “What Is PESTLE Analysis? A Tool for Business
Analysis,” 2017, pestleanalysis.com/what-is-pestle-analysis.
Digital preservation comes with tights and a cape; it’s here for the greater good.1
The preservation of digital materials is one of the most critical issues to emerge in an increasingly
digital age. As digital ways of living, working, and playing increase in every aspect of our lives,
the need to preserve digital materials is becoming more pressing and relevant to a wide range of
people. Equally pressing is the importance of understanding why we need to preserve digital materials and understanding the best methods for doing so, especially with an increase in complex
digital materials that operate with many technical dependencies.
In this book we offer an overview of the digital preservation landscape. Digital preservation has
grown significantly since 2005, when the first edition of Preserving Digital Materials was published. The basic concepts and the practices that were developed in the 1990s are being challenged and sometimes found wanting, and new groups of people—no longer just information
professionals, but increasingly the general public—are participating in digital preservation. New
approaches are developing and being applied.
This book provides a framework for digital preservation, one that helps readers make sense of a
rapidly expanding, increasingly diverse field. We preserve some historical elements from earlier
editions to help readers appreciate how the field has evolved, and we have updated the contents
to include a wider range of national and international activities. This edition is heavily referenced
to provide a survey of the literature of this field. But we do not claim comprehensiveness with
these references: instead, we point to further sources and topics for readers to investigate. This
book can be used as a guide for teaching digital preservation and for informing practice. Our synthesis of current information, research, and perspectives about digital preservation from a wide
range of sources, across many areas of practice, will be of interest to a wide range of information
professionals and to others interested in learning more about digital preservation.
The first edition of Preserving Digital Materials was published in 2005,2 and a second edition appeared in 2012.3 A second edition was needed because of the significant changes in the field
since 2005, as practitioners expanded their practice and researchers presented significant new
findings. The second edition had the same aims as the first: to provide an introduction to the
preservation of digital materials in order to inform practice in cultural heritage institutions, and
to provide a framework within which to reflect on digital preservation issues. It was intended for
an audience similar to that of the first edition—information professionals who sought a reference
text, practitioners who wanted to reflect on the issues, and students in the field of digital preservation. The second edition differed from the first by providing a more international perspective,
including examples from areas other than libraries and recordkeeping organizations, updating the
text to reflect activities and changes since 2005, and taking account of significant publications
since 2005. This third edition is different again:
Significant publications since 2012 are taken into account.
We have used social media (especially Twitter and blogs) to inform our thinking.
We showcase the significant international expertise available in digital preservation in case
studies and examples throughout the book.
We note the increasing importance of new stakeholders and the general public in digital
preservation: this theme runs throughout the book.
The digital preservation field is maturing, so we note the reflections that are appearing, take
stock of past achievements, and consider where the field of digital preservation is heading.
Both the library community and the recordkeeping community, as well as an increasing number
of other groups, are energetically seeking solutions to the challenges of digital preservation.
Over the last decade the outcomes of research and practice have been shared more and more.
Developments in one community have considerable potential to assist practice in other information and heritage communities. This book goes some way toward addressing this need by
providing examples from different communities.
The range of stakeholders who have an interest in maintaining digital materials for use into the
future is wide. A 2003 report begins with these words:
The need for digital preservation touches all our lives, whether we work in commercial or public sector
institutions, engage in e-commerce, participate in e-government, or use a digital camera. In all these
instances we use, trust and create e-content, and expect that this content will remain accessible to
allow us to validate claims, trace what we have done, or pass a record to future generations.4
These words remain as relevant now as when they were written fifteen years ago. No longer
does digital preservation solely involve traditional heritage keepers, the information professionals
working in the GLAM (Galleries, Libraries, Archives, and Museums) sector, in research institutions, and in public, business, and government contexts. The preservation of digital materials
increasingly concerns everyone working with and depending on digital modes of creation. Digital
materials encompass born digital objects and also the information about them that is needed to
manage them long term, in a wide variety of contexts and situations, involving simple or complex combinations of software and hardware. Digital materials can also be digitized materials for
which there may or may not be a physical representation. The act of preserving digital materials
is a constant and enduring one and requires time, effort, and resources devoted to it, for as long
as the materials are required.
While information about digital preservation is available in print and, increasingly, on the web,
and the quantity is increasing significantly, many people do not have the time to evaluate and
synthesize it. Information about the need for digital preservation is becoming more widely
available for general audiences, often driven by social change agendas, and by new stakeholders
bringing these to the fore. Issues such as trust, authenticity, and context are highlighed in journalistic pieces, using language from the past to help paint a picture of a new digital future—
seen in titles such as “Meet the Digital Librarians Saving Social Media Posts to Protect Human
Rights.”5 As we increasingly turn to digital-only creation of important societal resources, an
inbuilt awareness of what is entailed to keep digital materials must become the norm. Bringing
journalists and mainstream media on board to help raise awareness will continue to increase
the public’s understanding about the importance of digital preservation and the many issues
associated with it. Books such as this one aim to be available for people to turn to when seeking
We believe there is significant value in publishing single-volume overviews and surveys of the
field of digital preservation, and in particular publishing single-authored (or in this case jointauthored) texts to provide a cohesive result that multiauthored collections can sometimes lack.
Books about aspects of digital preservation have appeared with increasing frequency since the
second edition of this book was published in 2012. However, most of them focus on specific kinds
of digital materials or are practical manuals: for example, Gillian Oliver and Ross Harvey’s Digital
Curation (2016)6 is based on the Digital Curation Centre’s Digital Curation Lifecycle Model, and
Adrian Brown’s Practical Digital Preservation (2013)7 aims to demonstrate that the practices can
be implemented in all institutions. Our book is different because it provides a conceptual framework for reflection on digital preservation, not a practically oriented how-to manual or a book
focused on a specific genre. It achieves this by taking stock of what we know about the principles,
strategies, and practices that prevail and by describing the outcomes of recent and current research. The book is structured around this framework.
The field of preserving digital materials continues to change rapidly. We write this third edition
combining two views of the field: as an established academic with over thirty years’ experience
observing the field, and as a new practitioner entering an increasingly complex arena, seeking to
learn from the history of the field and its rich research and development activities and bringing an
outsider perspective from a previous career to apply to new challenges. We hope this book goes
some way toward building a bridge from our collective experience and knowledge of the field to
the lived experience of new stakeholders and their increasing needs—including those working in
digital spaces, those capturing digital information around social change agendas, and those collecting family materials and other personal collections that have digital dependencies. Because
efforts to date have not completely succeeded in encouraging more effective digital preservation
by individuals who assemble personal collections, the institutions who will in time acquire these
collections will inherit their preservation issues—it is this situation, among others facing individuals in the digital age, that we must strive to improve.
We acknowledge that this book presents a Western view of preservation, based on material in the
English language read and understood from Anglo-American and European perspectives. We recognize that this view is not necessarily embraced by all cultures. Despite this perspective, we keep
in mind the words in Article 9 of the UNESCO Charter on the Preservation of the Digital Heritage:
The digital heritage is inherently unlimited by time, geography, culture or format. It is culture-specific,
but potentially accessible to every person in the world. Minorities may speak to majorities, the individual to a global audience. The digital heritage of all regions, countries and communities should be preserved and made accessible, creating over time a balanced and equitable representation of all peoples,
nations, cultures and languages.8
Finally, please note that unless specified otherwise, National Archives refers to the British National
1.William Kilbride, “RAG 2017: Change Is Here to Stay—What Does It Mean and What Do We Need to
Do about It?,” YouTube, posted August 7, 2017, by ExLibrisLtd, 8:42, www.youtube.com/watch?v=2Ox
2.Ross Harvey, Preserving Digital Materials (Munich: K. G. Saur, 2005).
3.Ross Harvey, Preserving Digital Materials, 2nd ed. (Berlin: De Gruyter Saur, 2012).
4.NSF-DELOS Working Group on Digital Archiving and Preservation, Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation (National Science
Foundation and the European Union, 2003), i.
5.Anthony Funnell, “Meet the Digital Librarians Saving Social Media Posts to Protect Human Rights,” ABC
News, August 29, 2017, web.archive.org/web/20170830205657/http://www.abc.net.au/news/2017
6.Gillian Oliver and Ross Harvey, Digital Curation, 2nd ed. (Chicago: ALA Neal-Schuman, 2016).
7.Adrian Brown, Practical Digital Preservation: A How-to Guide for Organizations of Any Size (London: Facet,
8.UNESCO, Charter on the Preservation of the Digital Heritage (Paris: UNESCO, 2004), portal.unesco.org/
The idea for a third edition of Preserving Digital Materials developed because of our perception
that the field of digital preservation has matured and changed significantly since the previous
edition in 2012. Charles Harmon from Rowman & Littlefield was enthusiastic about the proposal
we submitted, so we began the task of updating this book, aiming to produce a single cohesive
text that reflects on past achievements in the field and captures its many dynamic developments.
We thank Charles for his initial enthusiasm and for his assistance throughout the process.
We owe a great deal of gratitude to many people who helped us get this book together. Many
thanks go to Michelle Borzi, for tirelessly working to improve the flow and make sense of our
digital preservation terminology, providing a welcome outsider perspective; to Katherine Howard,
Gillian Oliver, and Peter Neish for reading and improving various sections for us; to Paul Wheatley,
for his erudite reflections; and to William Kilbride, for producing pithy contemplations that capture the big picture. The book is much richer for these contributions.
We thank the many authors, presenters, practitioners, researchers, bloggers, tweeters, and
conference-goers who constantly expand our understanding of digital preservation. The digital
preservation community is a vibrant, collaborative one that we are grateful to be part of. We also
acknowledge the value of mentorship and all that it can enable: this joint authorship has proven
to be a rewarding journey, and we hope that we have delivered a useful contribution to the digital
preservation community as a result.
Jaye Weatherburn gives special thanks to the powerhouse of guidance and generosity that is
Michelle Borzi; Ross Harvey, for taking a chance on an excitable neophyte; her archives women
(Annelie de Villiers, Michaela Hart, Nicola Laurent, Rachel Tropea) for the encouragement, wine,
and laughs; Gavan McCarthy, Leo Konstantelos, and Donna McRostie, for the opportunity to participate in implementing a digital preservation strategy; Michaela Hart, for the Fortitude Stash of
Baked Goods; and Buster, for all that he does.
Ross Harvey would especially like to acknowledge that the first and second editions of this book
indicate his indebtedness to many people, and these debts still remain.
This third edition of Preserving Digital Materials provides a survey of the digital preservation landscape and a conceptual framework for reflection on digital preservation. We intend it as a concise
handbook and reference for anyone who wants to learn how digital preservation works. It updates
the first and second editions (published in 2005 and 2012) to indicate the changes in this dynamic, expanding, and increasingly diverse field. The preservation of digital materials has traditionally been a domain overseen by information professionals working in the Galleries, Libraries,
Archives, and Museums (GLAM) sector, in research institutions, and in public, business, and
government contexts. They remain the primary audience for this book, but we also intend that
this survey of the field is of interest to a wider audience—to anyone who wants to participate in
digital preservation or needs to understand how preservation works in the digital world.
It is important to note what this book does not do. It is not a how-to manual. It is also not a resource
for learning how to apply the technical procedures of digital preservation. It makes little distinction
between information that is born digital and information that is digitized from physical media.
We have intentionally structured the first three parts of the book as questions, to prompt a reflective state of mind, before looking to the future in the fourth part:
Part I (Chapters 1–2): Why Do We Preserve Digital Materials?
Part II (Chapters 3–5): What Digital Materials Are We Preserving?
Part III (Chapters 6–8): How Do We Preserve Digital Materials?
Part IV (Chapters 9–10): Collaboration and the Future
In the first part of the book (Why Do We Preserve Digital Materials?) we investigate the reasons
preservation is valued, where it takes place, and why it needs to be reconceptualized in this
Chapter 1 sets the scene for the book by examining key definitions of preservation and their
relationship to ways of thinking about digital preservation. This chapter explores the effect of
digital information processes on “traditional” librarianship and recordkeeping paradigms, noting
changing preservation models in an environment that is dynamic and has many more stakeholders, often with competing interests.
Chapter 2 considers some of the reasons preservation is a strong professional imperative for
librarians, recordkeepers, scholars, and scientists, demonstrates why it is also of importance to
individuals, and indicates the extent of the preservation problem for digital materials. We also
examine why we preserve digital materials and who is responsible for preserving them. The established, traditional preservation roles of libraries and archives are giving way to contributions
from diverse new audiences and stakeholders, driven by increasing awareness of preservation
problems by the general public.
The second part (What Digital Materials Are We Preserving?), consisting of chapters 3 through
5, tackles questions about the nature of digital materials, how we decide which of them to preserve, what information about them we need to keep, and what structures and mechanisms have
developed to assist in their preservation.
Chapter 3 examines why a digital preservation problem exists and why it is difficult to preserve
digital materials—for example, media deteriorate because of structural and manufacturing reasons, storage, and handling. Technological obsolescence and its effects are noted. Preservation
storage is examined: information about storage has changed significantly since the second edition, so we address the increasing use of cloud computing and cloud storage.
Chapter 4 considers the selection of digital materials for preservation, attempting to cover questions such as what selection is and why it is important. We consider the factors that need to be
considered when selecting digital materials and the importance of preserving context. The effect
of intellectual property ownership law is sometimes complex for digital materials, often because
of differences in legislation across jurisdictions, and we note these challenges. We consider why
stakeholder input is essential for selecting materials for long-term digital preservation, and we
comment on how much data to select and why keeping everything is not feasible.
Chapter 5 considers the attributes of digital materials we need to preserve. The OAIS Reference
Model, a key standard, and other models are examined, as are trustworthy digital repositories,
the importance of metadata, and persistent identifiers. We also explore the characteristics of
digital materials that we want to keep, such as significant properties, and the implications and
importance of this for maintaining authentic and trusted digital materials.
The third part (How Do We Preserve Digital Materials?), which includes chapters 6, 7, and 8,
covers how we preserve digital materials.
The principles, strategies, and practices applied in digital preservation are explored in chapter
6, as is the role of policies, procedures, and guidelines. Common approaches and methods—
categorized as “nonsolutions,” “preserve technology,” and “preserve object”—are retained from
the previous edition, but we have updated the discussion about them to highlight the increasing
need for combining principles, strategies, and practices, particularly for complex digital objects,
which are proliferating.
Chapter 7 describes specific strategies. Noted are the main strategies applied in digital preservation: digital archaeology and digital forensics, emulation, standard data formats (file-format registries, standardizing formats, restricting the range of formats, archival file formats), and migration.