Tải bản đầy đủ

Big data shocks

Big Data Shocks

Marta Mestrovic Deyrup, Ph.D.
Acquisitions Editor, Library Information and Technology Association, a division of
the American Library Association
The Library Information Technology Association (LITA) Guides provide information and
guidance on topics related to cutting-edge technology for library and IT specialists.
Written by top professionals in the field of technology, the guides are sought after by
librarians wishing to learn a new skill or to become current in today’s best practices.
Each book in the series has been overseen editorially since conception by LITA and
reviewed by LITA members with special expertise in the specialty area of the book.
Established in 1966, LITA is the division of the American Library Association (ALA) that
provides its members and the library and information science community as a whole with
a forum for discussion, an environment for learning, and a program for actions on the
design, development, and implementation of automated and technological systems in the
library and information science field.

Approximately 25 LITA Guides were published by Neal-Schuman and ALA between
2007 and 2015. Rowman & Littlefield took over publication of the series beginning in late
2015. Books in the series published by Rowman & Littlefield are:
Digitizing Flat Media: Principles and Practices
The Librarian’s Introduction to Programming Languages
Library Service Design: A LITA Guide to Holistic Assessment, Insight, and Improvement
Data Visualization: A Guide to Visual Storytelling for Librarians
Mobile Technologies in Libraries: A LITA Guide
Innovative LibGuides Applications
Integrating LibGuides into Library Websites
Protecting Patron Privacy: A LITA Guide
The LITA Leadership Guide: The Librarian as Entrepreneur, Leader, and Technologist
Using Social Media to Build Library Communities: A LITA Guide
Managing Library Technology: A LITA Guide
The LITA Guide to No- or Low-Cost Technology Tools for Libraries
Big Data Shocks: An Introduction to Big Data for Librarians and Information

Big Data Shocks
An Introduction to Big Data for
Librarians and Information Professionals
Andrew Weiss

Lanham • Boulder • New York • London

Published by Rowman & Littlefield
An imprint of The Rowman & Littlefield Publishing Group, Inc.
4501 Forbes Boulevard, Suite 200, Lanham, Maryland 20706
Unit A, Whitacre Mews, 26-34 Stannary Street, London SE11 4AB
Copyright © 2018 by American Library Association
All rights reserved. No part of this book may be reproduced in any form or by any
electronic or mechanical means, including information storage and retrieval systems,
without written permission from the publisher, except by a reviewer who may quote
passages in a review.
British Library Cataloguing in Publication Information Available
Library of Congress Cataloging-in-Publication Data Available
ISBN 9781538103227 (hardback : alk. paper) | ISBN 9781538103234 (pbk. : alk. paper) | ISBN
9781538103241 (electronic)
The paper used in this publication meets the minimum requirements of American
National Standard for Information Sciences Permanence of Paper for Printed Library
Materials, ANSI/NISO Z39.48-1992.

Printed in the United States of America

To Akiko, Mia, and Cooper for their love, support, and patience






Preface: Big Data Shocks




Part I: First Shocks
1 What Is Data?
2 The Birth of Big Data
3 Approaches and Tools for Analyzing and Using Big Data: The
Application of Data in Real-Life Situations


Part II: Reality Shocks
4 Privacy, Libraries, and Big Data
5 Big Data and Corporate Overreach
6 Liberty and Justice for All: The Surveillance State in the Age of
Big Data
7 The Shock of Information Overload and Big Data


Part III: Library Shocks
8 Big Data, Libraries, and Collection Development
9 Data Management Planning Strategies for Libraries in the Age
of Big Data
10 Academic Disciplines, Their Data Needs, and How Libraries
Can Cater to Them








Part IV: Future Shocks
11 Libraries and the Culture of “Big Assessment”
12 Building the “Smart Library” of the Future




About the Author



Fig. 1.1

Fig. 1.2
Fig. 1.3

Fig. 1.4

Fig. 1.5

Fig. 1.6

Fig. 2.1
Fig. 2.2

Detail of a page excerpted from Sidereus
Nuncius (published in 1610), aka The Starry
Messenger, showing Galileo’s data created to
visualize his planetary observations.


Turning a Chinese character into a code


A diagram of the data cycle, demonstrating data
as discrete objects to be used and reused over an
unspecified time period.


A sample information hierarchy (similar to
Maslow’s hierarchy), with data as the rawest
form of information and moving upward to
arrive at wisdom.


The new model of library, research, and data
development; parallel life cycles: scholarly and
information life cycles.


Screenshot from Google’s Ngram viewer
showing the prevalence of the terms goggle,
google, and googol from the years AD 1800 to


Graph depicting the capacity of information
storage and growth of telecom capacity.


Strange attractors or the “butterfly effect”
demonstrating a visualization of small factors


Fig. 2.3

Fig. 3.1
Fig. 3.2

Fig. 4.1

Fig. 4.2

Fig. 5.1

Fig. 5.2

Fig. 7.1

Fig. 7.2
Fig. 8.1


aggregating to impact later developments but in
clear, predictable ways.


“Intelligence Is in the Connections” diagram;
depicts the development of the web over time as
a function of improved connections between
information resources and between people.


Visualization of flu data showing four countries’
variations in flu frequency.


Frequency of the term “Tiananmen” in both
Roman letters and 天安門 in Chinese characters
from 1950 to 2000, showing the impact of the
Tiananmen Square incident in 1989.


Digitized book, Meiji Kyoikushi by Yoshio
Noda, showing card with patron’s name and
checkout date.


Screenshots showing the comments section in
Forbes magazine online for the article “Privacy
Is the New Money, Thanks to Big Data.”


Chart showing the differences in internet use
among various groups in the United States
broken down by gender, race/ethnicity, age,
economic class/income, education, and locale.


Screenshot of a cookie notice appearing in the
header of a web page; clear information is
provided about the use of cookies and the
rationale for it; people are also allowed to opt
out or accept.


Ranking system of wine, by letting users
evaluate information in meaningful ways,
reducing stress and anxiety about too many


Decline of decision accuracy as a function of
information load.


Comparison of collections in HathiTrust digital
library to US Census data; comparisons show a
skewed representation of books (4.5 percent of
collection) in Spanish compared to speakers of
the language (12 percent).



Fig. 8.2


Screenshot of Google Analytics, providing the
number of users, sessions, and duration of time
spent at CSUN’s Institutional Repository at


Fig. 9.1

Agency-funder mandates.


Fig. 9.2

An open-access citation advantage is also
visible when data sets are provided online.


The data life cycle as divided by drivers and
advocacy, and activities and stakeholders.


Adoption of open access by STEM discipline;
the black bars show the amount of first-tier
open-access journals compared to all openaccess journals in a discipline.


The growth of chemical structure data from
1972 to 2016 increased from almost nothing to
nearly 900,000 items.


A typical diagram that demonstrates the concept
of “closing the loop.”


This data visualization produced by Utah State
University depicts all the possible categories of
student activity in one online course.


One method of calculating the return on
investment (ROI) for an academic library
providing support for grant awards.


A visualization of how “fog computing”
intermediary services provide more responsive
and comprehensive data gathering.


Fig. 9.3
Fig. 10.1

Fig. 10.2

Fig. 11.1
Fig. 11.2

Fig. 11.3

Fig. 12.1


Table 2.1

“5V” criteria for defining and identifying big



Big Data Shocks

It is said that modernity stems from a state of mind, from a sense that we are
somehow different—and cut off as a result—from those who lived before us.
Modernity is more than just an accumulation of the technological advancements we incorporate into our daily lives. Instead, modernity is always existing in the fragile here and now, dependent upon purely subjective perceptions and a lingering feeling of separation. It can be argued that the concept
of modernity has existed in the Western world for the past several hundred
years, since the age of the Enlightenment (Kirsch, 2016). Perhaps that is
why, despite living in this supposedly postmodern world that no longer always speaks directly to futurism or the hope of science and technology
(leaving us, it is argued, with fragmented collections of nonuniversal facts, or
in the hands of fundamentalism of all stripes), the word modernity nevertheless persists and even resists the irony of its supposed demise. So, where else
to begin a discussion of the impact of “big data” than with the concept of
modernity itself and the shock and sense of lost connection to the past that is
often associated with it?
The title of this book, Big Data Shocks, speaks to that moment when we
become aware of such changes. These shocks play out across the world in
different patterns—some in violence, some in internal change, some in decline of traditional values, and some in a reactionary adherence or even
reversion to fundamentalist values. We speak—when we speak of such
shocks—in the language of culture clash, of tradition and modernity, of new
worlds and stone ages, of utopias and dystopias, and of Luddites and futurxv



ists. These and more are inherent in the concept of change, and the shocks
they portend affect all of us.
The amount of information now generated in the digital age is mind boggling. According to David Farrier (2016), “Humans created 5 billion gigabytes of digital information in 2003; in 2013 it took only 10 minutes to produce
the same amount of data.” It is indeed one of the great shocks of our time.
But even though the scale and rapidity of the information age has grown in
size beyond the human capacity to visualize it, the fundamental shock
brought on by the growth of information has always been with us.
Scholars and writers have wrestled with the problem of information overload since the written word became commonplace. Even those in our earliest
written histories and literature, as noted in Ecclesiastes, have known about
the dangers of excessive information: “And further, by these, my son, be
admonished: of making many books there is no end; and much study is a
weariness of the flesh” (Ecclesiastes 12:12). This is not an idea isolated to
religious dogma either. Advances in writing have always come with a price
that modern society seems to see as a minor issue. The problem with writing,
according to Plato, was that it discouraged memorization. The mind could no
longer be made stronger through memorization techniques common in the
preliteracy era and in our oral traditions. Instead, scholars and scribes would
come to control the facts of a republic, causing problems in a society’s
collective memory and giving rise to the erasure of facts, the revision of
experiences, and tyranny itself. No longer would collective memory be humanized. Instead, it would become externalized and abstracted from the human condition. The human in us would be exiled from the information that
resided outside of the “memory palaces” the mind had constructed.
The dilemma remains to this day. It may even be exacerbated by the
proliferation of mobile and social media technology, which promises that we
never need to remember anything so long as it is stored in a smartphone,
tablet, or other wearable device. Our own sense of memory becomes more
malleable, and facts less reliable as anchors to reality as they are infinitely
edited and altered. Additionally, the mind is able to remember what it deems
to be necessary so long as the scale is manageable, yet these days the increased rapidity and scale of the data we encounter daily all but disqualifies
memorization as a viable mechanism for incorporating new information.
Indeed, information has perhaps moved beyond the human scale and may be
useful anymore only to machines designed to parse it.



Ultimately, as we move to an era of information that exists and functions
beyond our own capacities and rely ever more on machinery to cope, will we
lose something? Will we lose our humanity even more? Will the benefits
outweigh the drawbacks, and how will we know this? Such concepts and
essential questions will be examined in depth in each of the four main sections of this book, entitled “First Shocks,” “Reality Shocks,” “Library
Shocks,” and “Future Shocks.” In some ways, it is hoped that the reader
comes away with as many questions about the impact of big data on our lives
as answers. Since the era of big data is really just taking off, much of what is
discussed is in flux and may change by the time the book is published.
Section 1: First Shocks
The first section of this book, entitled “First Shocks,” delves into the definition of data itself, tracing how it shifted from a medieval term related to
biblical exegesis to the contemporary usage of computational-based numerical quantities of information that provide digital and computer modeling for
many disciplines. But the initial shock factor of big data begins with scale.
As a result, an outline of the transformation from data to big data will be
examined. The volume and rapidity of data generation also meets the realization that this is just the beginning: the exponential growth of information will
surely continue for decades to come. This section will also look at some of
the approaches and tools being developed to harness the growth in data—
especially through data mining and tracking—and the positive impacts that
these have on various disciplines. Finally, the section will round out discussion by looking at how big data is utilized specifically in the wider segments
of our society, including government and politics, education, the media,
STEM (science, technology, engineering, and mathematics) fields, the humanities, and of course libraries.
Section 2: Reality Shocks
The second section, “Reality Shocks,” examines the issues of big data as they
relate to the world at large. There is a growing realization that privacy can be
easily lost, despite being a core tenet of librarianship, and our own expectations of privacy have been altered and compromised by the rise of big data.
Additionally, the source for this large amount of data is user online behavior
monitored through social media, the growing internet of things, and voluntary opt-in surveillance accepted for token benefits. This overall erosion of
privacy has led to a greater awareness of the widespread practice of political
monitoring and spying. Information overload, another issue related to the



proliferation of big data, will also be examined. As the scale of data and
information increases, we need to analyze how that scale can impact human
cognitive ability. Ultimately, it comes down to developing and preserving
methods, strategies, and resolutions necessary to ensure that privacy and
personal freedom endure even as the tools for big data are utilized for public
and private good.
Section 3: Library Shocks
The third section looks at how big data directly impacts libraries. In particular, the chapters examine the growing area of open science, public funding
mandates for open data, and the potential impact that widespread access to
this information might have. Funder mandates have the potential to positively impact societies, especially in the relationship to climate science, public
health, and the problems that might be solved through crowd sourcing or
crowd sharing. Data management is also important as data becomes generated at large scale by more researchers and students. Harnessing it as well as
preserving it becomes priorities for modern libraries. The humanities have
embraced the era of big data as well. As digital humanities and libraries have
a lot of important overlapping areas and goals, information literacy becomes
ever more important as well. Finally, the section will examine a number of
library case studies related to information literacy, data management and
assessment of student data for the sake of examining important student learning outcomes.
Section 4: Future Shocks
Finally, the fourth section speculates on what’s to come for libraries as they
adapt to the rigors and promises of the big data era. The first part will
examine how libraries can develop tools, policies, and methods for harnessing big data and, to coin a new term, “big assessment,” which would help
them track users, especially students, and monitor their progress. The chapter
will also examine the proficiencies that all current and future librarians need
to understand as big data and big assessment become more prevalent in
theory and practice. The second chapter will examine the future of the library
itself, speculating upon the feasibility of “smart libraries” and how they
would fit within the development of big-data-driven smart societies.
On a final note, although no definitive answers exist for many of the
issues raised by this book, clear trends do seem to be opening up for us. The
technology is still new, and the amount of growing data is still in its early
stages, but it is hoped that enough light will be shed on these problems to
help readers anticipate future problems and arrive at viable solutions.



Farrier, D. (2016). Deep time’s uncanny future is full of ghostly human traces. Aeon. https://
Kirsch, A. (2016). Are we really so modern? New Yorker. http://www.newyorker.com.


The author would like to acknowledge the following people and organizations for their generous cooperation in the creation of this book: Dr. Martin
Hilbert, Dr. Nova Spivak, Dr. Courtney Stewart, Dr. Elisabetta Poltrioni,
Katy McKen, Suzanna Ward at the Cambridge Structural Database, and Emily Rogers at the Franklin Institute.
Special acknowledgement also goes to the editorial staff at Rowman &
Littlefield for their help in shaping and developing the book, especially Katie
O’Brien, Darren Williams, and, “last-but-not-least,” Charles Harmon.
Among colleagues at CSUN, many thanks to Mark Stover, dean of the
Oviatt Library, for allowing me to freely pursue subjects and areas of inquiry, wherever they might lead; to Ahmed Alwan and Eric Garcia for their
discussions regarding assessment and diversity that have ultimately helped to
shape parts of this book; to Luiz Mendes for his conscientious guidance and
encouragement; and to Steve Kutay for his always interesting and thoughtprovoking conversations that spurred me to write about many of the topics in
this book.
Last, a special acknowledgement to Akiko, for keeping me focused on
what matters most.


Part I

First Shocks

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay