For libraries, archives, and museums,
access to collections, in varying degrees, is a primary function and with the
advent of the digital age the amount of access has been growing. While the availability of digitally
accessibly objects is on the rise, there is one barrier in the form of
information – and a lot of it. The
written information of the past cannot be easily read or recognized by optical
recognition software, so these items take an incredible amount of work to
transcribe, collect the metadata, and transfer it into the collection
management system and online database.
Quite recently institutions have begun to look to their user base to
assist in this laborious process. These
crowd sourcing efforts have brought the availability of historic, cultural, and
scientific collections to people around the world. In this post we will examine two crowd
sourced projects – one successfully completed and the other ongoing and
expanding - that cover the varied collections within the LAM profession. We will examine the usability of the project
sites, over all participation and quality, sustainability, and how well the
respective projects fulfill their institutional mission.
The CalBug project began in 2010,
with a grant from the National Science Foundation, as an effort to begin
digitizing the entomological collections of the natural history museums across
California. The Essig Museum at UC
Berkeley took the lead with a group of eight other organizations across the
state with a goal to digitize one million specimens and make them available
online within five years. Each
institution was allowed to establish its own workflow to accommodate for
staffing issues. The Essig was the best
equipped and had one full time person with four part-time assistants; the groups
completed output was 5,500 digitized and transcribed specimen cards every two
weeks. At that pace the project would
only reach about seventy-five percent completion in five years. The digitization wasn’t the fly in the
ointment as they were producing twelve thousand images a week; it was the
transcribing of metadata from all the cards each bug was pinned to. Therefore in 2012, the lead at the Essig
Museum began to discuss a crowd sourcing effort with a group named Zooniverse
that had successfully done natural history crowd source projects in the UK and
southeast United States. In 2013 the
CalBug project became part of the Notes on Nature crowd sourcing effort that
included Herbarium to ID plants, Seafloor Explorer to locate various creatures
in photos of the deep, and even an opportunity to search high resolution images
of space to search for planets! Notes on
Nature encourages “citizen scientists” to participate in aiding research
efforts and students or professionals to
learn more about existing collections around the country by transcribing the
specimen card fields. Once the CalBug project joined Notes on Nature
registration and participation skyrocketed, and so did their database. Approximately four thousand registered
participants completed half of the one million item goal within the first few
months. The CalBug project goal was
completed a year ahead of schedule and they have even done another batch of
hundreds of thousands of images between late 2014 and now.
Specimen Sample from Essig Museum
The CalBug project was actually my
first experience in participating with a crowd sourced project by an employee
that saw it as a “productive waste of time.”
While I do not agree with his assessment I am glad I participated in it
for a few months. The interface for
Notes on Nature is very easy to use and most importantly it gives you the
choice if you want to register are just transcribe a few cards anonymously. This fact makes it hard to track distinct
users, but ultimately 8,000 users participated in some respect on CalBug. The advantage to signing up is as you
transcribe more and more you receive badges of distinction. These badges have no real value other than
making one feel accomplished and also acting as a lure to keep a participant going;
this is a wonderful plan. To determine
the quality of entries there are four transcriptions done for each item and a
sequence alignment algorithm helps to weed out the “gaps” that exist between
the records. The CalBug site on Notes
for Nature also contained a small write up about the collection and why it was
important. This I feel is a key
characteristic as you are adding more value to what the participant is
doing. You say you have a bunch of
pictures of bugs? I’ll pass. Tell me
that I am doing this to further education for people of all ages around the
world, that researchers and scientists can use the nearly two billion natural
history samples around the world to study climate and ecological change to help
save the future of our planet, and I get a neat but ultimately useless badge? SIGN
ME UP! The idea of added value cannot be
overstated, but some in the field feel it is making a game out of our work and
there should be only incentive in the work itself. This will be a key point in the other project
we discuss.
The Project itself has a goal but no
specific mission statement so I compared it to the goals of the largest
participating institution – The Berkeley Natural History Museums, who “aim to
lead and excel in creative, innovative ways in these areas: education and
public programs, building knowledge networks for scientific research, digitization
and access of museum specimens and ancillary materials, research.” The CalBug project complies with all of those
directives. It will also remain a viable
and sustainable effort as long as it remains in the crowd source partnership
with Zooniverse and Notes on Nature.
They are able to remain in the agreement without an active collection to
work on which gives them time to build up a batch of images or find funding to
build up a batch. With the continued
success and visibility of CalBug I would not be surprised if this led to a
completely digitized collection in the next few years.
Notes on Nature Date Entry Form
I
also want to close with a note about Notes on Nature. This is a brilliant site that clearly explains
and leads the user through the transcription process no matter what the
collection subject matter. The entry
site for each, along with the description/importance, is a difficulty level and
average time per record so the user can pick a collection within their comfort
zone. Currently there are three
collections with over 110,000 images up and all are over fifty percent
complete.
The second project come out of the
University of Iowa libraries and, while still an ambitious project, has a much
smaller scope than the CalBug project.
What is now known as the DIY History Project began in the spring of 2011
when the Library decided to do a small, crowd source pilot program to digitize
and transcribe their collection of Civil War diaries and letters for accessibility
to celebrate the then approaching sesquicentennial. By the Fall of 2012 the Civil War Diaries and
Letters Transcription project had completed fifteen thousand items from the
Universities collection using a simple web form that the Digital Library staff
had quickly programmed. The project
still involved a heavy amount of staff time as the simplicity of the web form
meant that Library staff still had to quickly check for quality then move the
metadata over to Contentdm. However, the
project was a success and warranted some more funding to grow and address other
parts of the University’s diverse collections.
With more funding and staff time dedicated to the project the
collections staff was able to work with the University Digital Scholarship
& Publishing Studio and Information Services to construct a new interface
and directly link the CMS with the completed web transcription form. They were able to power the new endeavor with
Omeka 2 and customized Scripto plug-ins to make the display more ergonomic and
navigable. Another benefit of this crowd
source project was the added ability to comment and tag the Digital Library
image collection. The head of Special Collections stated that the goal was
the
improvement of access and at the same time raise participation from those on
campus and far removed.
The new expanded transcription project,
now called DIY History, was launched in October of 2012 and a post on the site
Reddit entitled “TIL [Today I Learned] how to participate in history while
sitting on my ass by transcribing Civil War diaries online” lead to such
interest that the library servers crashed and were down for a few days. While this may seem like a promising sign
there is evidence that shows all that traffic was general interest and not
participation. I say this because as of
now there are only a thousand registered users to date, closer to two thousand
if you count the 900 that provided an optional email when registration wasn’t
required. Also, if you want to
transcribe you must register. The
Digital Scholarship Librarian, Jennifer Wolfe, says that it is a small
dedicated user group that is responsible for most of the work; one user has
over 1300 pages to their credit. This
user base is also responsible for the quality control as well. Once an item is fully completed, the same or another user can
proof read and determine the entries validity and worthiness. As of now the project has been able to
successfully transcribe over sixty-six thousand pages of documents on a wide
range of Iowa and national history. The
most recent collection that has garnered much attention has been the
University’s extensive collection of recipe books from the 18th
century to the present.
The usability of the DIY History
Project is easy and very straight forward; the biggest hassle being that
registration is required. This has been
seen as a hindrance to many crowd source projects and is a driving reason why
New York Public Library on similar projects has kept entries anonymous. Some see registration as a barrier to access
and I have to agree, especially considering that the University of Iowa does
not give a reason for mandating this other than tracking how much one user
does. I was able to navigate the system
quite easily, but I can see another reason why the participation numbers are
low – most of the good stuff is already done.
A registered user can cherry pick which documents they want to do, which
usually will be the easiest/most recent/interesting. It should be no surprise that the German
collection and Medieval Manuscripts have less than ten percent finished while
the World War, Women’s history, and Cookbooks are all nearly complete. However
these numbers could change as the more “popular” collections are completed and
potentially less interesting items are added in the future. This is plausible since this is a very
sustainable project and the interest of the University in supporting it is very
clear; it was started as a cheap pilot program with little support and now it’s
a growing database of nearly fifty thousand items. And why shouldn’t the University continue its
support? The mission of the University
of Iowa Libraries “advances direct engagement in learning, research, creative
work, and clinical care through staff expertise and exceptional collections on
our campus and worldwide.” The DIY
History Project fulfills all of these to “T” and its reach is evident in the
fact that their largest contributor is a
retired gentleman in the United
Kingdom.
DIY History Data Entry Screen
These two projects cover very
different topics, and naturally each project has its distinct differences, but
they also share much with other crowd sourced initiatives. One is the idea of the dedicated user and the
demographic that makes that group up.
With these two projects, and others around the world like the Your
Paintings Tagger in the UK. The base
tends to be over 55 years old and is motivated overwhelmingly by a general
interest in the subject matter (for the Your Paintings project it makes up
eighty-five percent of the motivation).
This is important to note because a large debate with these crowd source
issues is the motivation and reward systems.
The projects I covered each took different approaches; DIY History
shares the view of the NYPL, the Your Painting Tagger, and the majority of projects
in that it believes the reward is the work itself and the contribution being
made. The “gamification” of crowd source
projects worries some in the profession as it appears to make light or
entertainment out of materials folks in the field take very seriously. It’s this gravity of the information that
some professionals hold on to that can be a serious impetus to progress. Projects like these of course need a user
base to be interested, but there also needs to be professionals that are
willing to trust the people. The
Archival Commons wrestled with this “imagined radical professional
transformation” and found that the profession needs to view more as any other
profession – one that is evolving and in constant flux.
The idea of allowing people outside
of the profession to transcribe and be responsible for presenting correct
information that we are the caretakers of can be daunting. It is true that some in the field view these
crowd sourced projects as unnecessary or untrustworthy, but the reality is that
they are immensely helpful. In the days
of instant demand, more product/less processing, and evolving technology, crowd
sourced projects will be a necessary part of the library, archive, and museum
profession.
Project
Links
References
"2013
Award for Access." Center for Research Libraries. Spring 2013. Accessed
April 11, 2016. http://www.crl.edu/focus/article/9265.
Ball
Damerow, Joan. "Catching the Bug." The Berkeley Science Review.
November 22, 2013. Accessed April 11, 2016.
http://berkeleysciencereview.com/article/catching-the-bug/.
Ball
Damerow, Joan. "Checking Notes from Nature Data." Notes from Nature.
January 14, 2014. Accessed April 11, 2016.
https://blog.notesfromnature.org/2014/01/14/checking-notes-from-nature-data/.
"Berkeley
Natural History Museums - About." Berkeley Natural History Museums. 2016.
Accessed April 11, 2016. http://bnhm.berkeley.edu/about/.
Dick_long_wigwam.
"TIL How to Participate in History While Sitting on My Ass by
Transcribing
Civil War Diaries Online. • /r/todayilearned." Reddit. 2012. Accessed
April 11, 2016. https://my.reddit.com/r/todayilearned/comments/humy3/til_how_to_participate_in_history_while_sitting.
Eccles,
Kathryn and Andrew Greg. “Your Paintings Tagger: Crowdsourcing Descriptive
Metadata for a National Virtual Collection.” In Crowdsourcing our Cultural Heritage, ed. Mia Ridge. New York:
Routledge, 2014.
Eveleigh,
Alexandra. “Crowding Out the Archivist?: Locating Crowdsourcing within the
Broader Landscape of Participatory Archives.” In Crowdsourcing our Cultural Heritage, ed. Mia Ridge. New York: Routledge,
2014.
Ferro,
Shaunacy. "Decode Darwin's Handwriting To Help Science." Popular
Science. May 24, 2013. Accessed April 11, 2016. http://www.popsci.com/science/article/2013-05/decode-darwins-handwriting-help-science?dom=PSC.
Lascarides,
Michael and Ben Vershbow. “What’s on the Menu?: Crowdsourcing at the New York
Public Library.” In Crowdsourcing our
Cultural Heritage, ed. Mia Ridge. New York: Routledge, 2014.
Lee,
J. Hannah. "Researchers Enlist Public to Sort More than 1 Million Bugs
Online | The Daily Californian." The Daily Californian. May 29, 2013.
Accessed April 11, 2016.
http://www.dailycal.org/2013/05/29/researchers-enlist-public-to-sort-over-one-million-bugs/.
Miller,
Greg. "Enormous Museum Collection of Insects Needs Your Help."
Wired.com. May 24, 2013. Accessed April 11, 2016. http://www.wired.com/2013/05/museum-collection-insects/.
Saylor,
Nicole, and Jen Wolfe. "Experimenting with Strategies for Crowdsourcing
Manuscript Transcription." Research Library Issues, No. 277 (Dec. 2011)
Page 10. December 2011. Accessed April 11, 2016.
http://publications.arl.org/rli277/10.
Schwartz,
Meredith. "Cooking Up a Crowdsourced Digitization Project That
Scales." Library Journal. October 22, 2012. Accessed April 11, 2016.
http://lj.libraryjournal.com/2012/10/academic-libraries/cooking-up-a-crowdsourced-digitization-project-that-scales/#_.
"The
University of Iowa Libraries." Strategic Plan 2015-2018. Accessed April
11, 2016. http://www.lib.uiowa.edu/about/strategic.
Zou,
Jie Jenny. "Post Navigation." Wired Campus Civil War Project Shows
Pros and Cons of Crowdsourcing Comments. June 11, 2011. Accessed April 11,
2016.
http://chronicle.com/blogs/wiredcampus/civil-war-project-shows-pros-and-cons-of-crowdsourcing.
No comments:
Post a Comment