LIS432 Cultural Heritage Informatics SPR16: CalBug and DIY History: Crowd sourced transcriptions

For libraries, archives, and museums, access to collections, in varying degrees, is a primary function and with the advent of the digital age the amount of access has been growing. While the availability of digitally accessibly objects is on the rise, there is one barrier in the form of information – and a lot of it. The written information of the past cannot be easily read or recognized by optical recognition software, so these items take an incredible amount of work to transcribe, collect the metadata, and transfer it into the collection management system and online database. Quite recently institutions have begun to look to their user base to assist in this laborious process. These crowd sourcing efforts have brought the availability of historic, cultural, and scientific collections to people around the world. In this post we will examine two crowd sourced projects – one successfully completed and the other ongoing and expanding - that cover the varied collections within the LAM profession. We will examine the usability of the project sites, over all participation and quality, sustainability, and how well the respective projects fulfill their institutional mission.

The CalBug project began in 2010, with a grant from the National Science Foundation, as an effort to begin digitizing the entomological collections of the natural history museums across California. The Essig Museum at UC Berkeley took the lead with a group of eight other organizations across the state with a goal to digitize one million specimens and make them available online within five years. Each institution was allowed to establish its own workflow to accommodate for staffing issues. The Essig was the best equipped and had one full time person with four part-time assistants; the groups completed output was 5,500 digitized and transcribed specimen cards every two weeks. At that pace the project would only reach about seventy-five percent completion in five years. The digitization wasn’t the fly in the ointment as they were producing twelve thousand images a week; it was the transcribing of metadata from all the cards each bug was pinned to. Therefore in 2012, the lead at the Essig Museum began to discuss a crowd sourcing effort with a group named Zooniverse that had successfully done natural history crowd source projects in the UK and southeast United States. In 2013 the CalBug project became part of the Notes on Nature crowd sourcing effort that included Herbarium to ID plants, Seafloor Explorer to locate various creatures in photos of the deep, and even an opportunity to search high resolution images of space to search for planets! Notes on Nature encourages “citizen scientists” to participate in aiding research efforts and students or professionals to learn more about existing collections around the country by transcribing the specimen card fields. Once the CalBug project joined Notes on Nature registration and participation skyrocketed, and so did their database. Approximately four thousand registered participants completed half of the one million item goal within the first few months. The CalBug project goal was completed a year ahead of schedule and they have even done another batch of hundreds of thousands of images between late 2014 and now.

Specimen Sample from Essig Museum

The CalBug project was actually my first experience in participating with a crowd sourced project by an employee that saw it as a “productive waste of time.” While I do not agree with his assessment I am glad I participated in it for a few months. The interface for Notes on Nature is very easy to use and most importantly it gives you the choice if you want to register are just transcribe a few cards anonymously. This fact makes it hard to track distinct users, but ultimately 8,000 users participated in some respect on CalBug. The advantage to signing up is as you transcribe more and more you receive badges of distinction. These badges have no real value other than making one feel accomplished and also acting as a lure to keep a participant going; this is a wonderful plan. To determine the quality of entries there are four transcriptions done for each item and a sequence alignment algorithm helps to weed out the “gaps” that exist between the records. The CalBug site on Notes for Nature also contained a small write up about the collection and why it was important. This I feel is a key characteristic as you are adding more value to what the participant is doing. You say you have a bunch of pictures of bugs? I’ll pass. Tell me that I am doing this to further education for people of all ages around the world, that researchers and scientists can use the nearly two billion natural history samples around the world to study climate and ecological change to help save the future of our planet, and I get a neat but ultimately useless badge? SIGN ME UP! The idea of added value cannot be overstated, but some in the field feel it is making a game out of our work and there should be only incentive in the work itself. This will be a key point in the other project we discuss.

The Project itself has a goal but no specific mission statement so I compared it to the goals of the largest participating institution – The Berkeley Natural History Museums, who “aim to lead and excel in creative, innovative ways in these areas: education and public programs, building knowledge networks for scientific research, digitization and access of museum specimens and ancillary materials, research.” The CalBug project complies with all of those directives. It will also remain a viable and sustainable effort as long as it remains in the crowd source partnership with Zooniverse and Notes on Nature. They are able to remain in the agreement without an active collection to work on which gives them time to build up a batch of images or find funding to build up a batch. With the continued success and visibility of CalBug I would not be surprised if this led to a completely digitized collection in the next few years.

Notes on Nature Date Entry Form

I also want to close with a note about Notes on Nature. This is a brilliant site that clearly explains and leads the user through the transcription process no matter what the collection subject matter. The entry site for each, along with the description/importance, is a difficulty level and average time per record so the user can pick a collection within their comfort zone. Currently there are three collections with over 110,000 images up and all are over fifty percent complete.

The second project come out of the University of Iowa libraries and, while still an ambitious project, has a much smaller scope than the CalBug project. What is now known as the DIY History Project began in the spring of 2011 when the Library decided to do a small, crowd source pilot program to digitize and transcribe their collection of Civil War diaries and letters for accessibility to celebrate the then approaching sesquicentennial. By the Fall of 2012 the Civil War Diaries and Letters Transcription project had completed fifteen thousand items from the Universities collection using a simple web form that the Digital Library staff had quickly programmed. The project still involved a heavy amount of staff time as the simplicity of the web form meant that Library staff still had to quickly check for quality then move the metadata over to Contentdm. However, the project was a success and warranted some more funding to grow and address other parts of the University’s diverse collections. With more funding and staff time dedicated to the project the collections staff was able to work with the University Digital Scholarship & Publishing Studio and Information Services to construct a new interface and directly link the CMS with the completed web transcription form. They were able to power the new endeavor with Omeka 2 and customized Scripto plug-ins to make the display more ergonomic and navigable. Another benefit of this crowd source project was the added ability to comment and tag the Digital Library image collection. The head of Special Collections stated that the goal was the improvement of access and at the same time raise participation from those on campus and far removed.

The new expanded transcription project, now called DIY History, was launched in October of 2012 and a post on the site Reddit entitled “TIL [Today I Learned] how to participate in history while sitting on my ass by transcribing Civil War diaries online” lead to such interest that the library servers crashed and were down for a few days. While this may seem like a promising sign there is evidence that shows all that traffic was general interest and not participation. I say this because as of now there are only a thousand registered users to date, closer to two thousand if you count the 900 that provided an optional email when registration wasn’t required. Also, if you want to transcribe you must register. The Digital Scholarship Librarian, Jennifer Wolfe, says that it is a small dedicated user group that is responsible for most of the work; one user has over 1300 pages to their credit. This user base is also responsible for the quality control as well. Once an item is fully completed, the same or another user can proof read and determine the entries validity and worthiness. As of now the project has been able to successfully transcribe over sixty-six thousand pages of documents on a wide range of Iowa and national history. The most recent collection that has garnered much attention has been the University’s extensive collection of recipe books from the 18^th century to the present.

The usability of the DIY History Project is easy and very straight forward; the biggest hassle being that registration is required. This has been seen as a hindrance to many crowd source projects and is a driving reason why New York Public Library on similar projects has kept entries anonymous. Some see registration as a barrier to access and I have to agree, especially considering that the University of Iowa does not give a reason for mandating this other than tracking how much one user does. I was able to navigate the system quite easily, but I can see another reason why the participation numbers are low – most of the good stuff is already done. A registered user can cherry pick which documents they want to do, which usually will be the easiest/most recent/interesting. It should be no surprise that the German collection and Medieval Manuscripts have less than ten percent finished while the World War, Women’s history, and Cookbooks are all nearly complete. However these numbers could change as the more “popular” collections are completed and potentially less interesting items are added in the future. This is plausible since this is a very sustainable project and the interest of the University in supporting it is very clear; it was started as a cheap pilot program with little support and now it’s a growing database of nearly fifty thousand items. And why shouldn’t the University continue its support? The mission of the University of Iowa Libraries “advances direct engagement in learning, research, creative work, and clinical care through staff expertise and exceptional collections on our campus and worldwide.” The DIY History Project fulfills all of these to “T” and its reach is evident in the fact that their largest contributor is a

retired gentleman in the United Kingdom.

DIY History Data Entry Screen

These two projects cover very different topics, and naturally each project has its distinct differences, but they also share much with other crowd sourced initiatives. One is the idea of the dedicated user and the demographic that makes that group up. With these two projects, and others around the world like the Your Paintings Tagger in the UK. The base tends to be over 55 years old and is motivated overwhelmingly by a general interest in the subject matter (for the Your Paintings project it makes up eighty-five percent of the motivation). This is important to note because a large debate with these crowd source issues is the motivation and reward systems. The projects I covered each took different approaches; DIY History shares the view of the NYPL, the Your Painting Tagger, and the majority of projects in that it believes the reward is the work itself and the contribution being made. The “gamification” of crowd source projects worries some in the profession as it appears to make light or entertainment out of materials folks in the field take very seriously. It’s this gravity of the information that some professionals hold on to that can be a serious impetus to progress. Projects like these of course need a user base to be interested, but there also needs to be professionals that are willing to trust the people. The Archival Commons wrestled with this “imagined radical professional transformation” and found that the profession needs to view more as any other profession – one that is evolving and in constant flux.

The idea of allowing people outside of the profession to transcribe and be responsible for presenting correct information that we are the caretakers of can be daunting. It is true that some in the field view these crowd sourced projects as unnecessary or untrustworthy, but the reality is that they are immensely helpful. In the days of instant demand, more product/less processing, and evolving technology, crowd sourced projects will be a necessary part of the library, archive, and museum profession.

Project Links

CalBug - http://calbug.berkeley.edu/index.html

DIY History Project - https://diyhistory.lib.uiowa.edu/

References

"2013 Award for Access." Center for Research Libraries. Spring 2013. Accessed April 11, 2016. http://www.crl.edu/focus/article/9265.

Ball Damerow, Joan. "Catching the Bug." The Berkeley Science Review. November 22, 2013. Accessed April 11, 2016. http://berkeleysciencereview.com/article/catching-the-bug/.

Ball Damerow, Joan. "Checking Notes from Nature Data." Notes from Nature. January 14, 2014. Accessed April 11, 2016. https://blog.notesfromnature.org/2014/01/14/checking-notes-from-nature-data/.

"Berkeley Natural History Museums - About." Berkeley Natural History Museums. 2016. Accessed April 11, 2016. http://bnhm.berkeley.edu/about/.

Dick_long_wigwam. "TIL How to Participate in History While Sitting on My Ass by

Transcribing Civil War Diaries Online. • /r/todayilearned." Reddit. 2012. Accessed April 11, 2016. https://my.reddit.com/r/todayilearned/comments/humy3/til_how_to_participate_in_history_while_sitting.

Eccles, Kathryn and Andrew Greg. “Your Paintings Tagger: Crowdsourcing Descriptive Metadata for a National Virtual Collection.” In Crowdsourcing our Cultural Heritage, ed. Mia Ridge. New York: Routledge, 2014.

Eveleigh, Alexandra. “Crowding Out the Archivist?: Locating Crowdsourcing within the Broader Landscape of Participatory Archives.” In Crowdsourcing our Cultural Heritage, ed. Mia Ridge. New York: Routledge, 2014.

Ferro, Shaunacy. "Decode Darwin's Handwriting To Help Science." Popular Science. May 24, 2013. Accessed April 11, 2016. http://www.popsci.com/science/article/2013-05/decode-darwins-handwriting-help-science?dom=PSC.

Lascarides, Michael and Ben Vershbow. “What’s on the Menu?: Crowdsourcing at the New York Public Library.” In Crowdsourcing our Cultural Heritage, ed. Mia Ridge. New York: Routledge, 2014.

Lee, J. Hannah. "Researchers Enlist Public to Sort More than 1 Million Bugs Online | The Daily Californian." The Daily Californian. May 29, 2013. Accessed April 11, 2016. http://www.dailycal.org/2013/05/29/researchers-enlist-public-to-sort-over-one-million-bugs/.

Miller, Greg. "Enormous Museum Collection of Insects Needs Your Help." Wired.com. May 24, 2013. Accessed April 11, 2016. http://www.wired.com/2013/05/museum-collection-insects/.

Saylor, Nicole, and Jen Wolfe. "Experimenting with Strategies for Crowdsourcing Manuscript Transcription." Research Library Issues, No. 277 (Dec. 2011) Page 10. December 2011. Accessed April 11, 2016. http://publications.arl.org/rli277/10.

Schwartz, Meredith. "Cooking Up a Crowdsourced Digitization Project That Scales." Library Journal. October 22, 2012. Accessed April 11, 2016. http://lj.libraryjournal.com/2012/10/academic-libraries/cooking-up-a-crowdsourced-digitization-project-that-scales/#_.

"The University of Iowa Libraries." Strategic Plan 2015-2018. Accessed April 11, 2016. http://www.lib.uiowa.edu/about/strategic.

Zou, Jie Jenny. "Post Navigation." Wired Campus Civil War Project Shows Pros and Cons of Crowdsourcing Comments. June 11, 2011. Accessed April 11, 2016. http://chronicle.com/blogs/wiredcampus/civil-war-project-shows-pros-and-cons-of-crowdsourcing.

LIS432 Cultural Heritage Informatics SPR16

Monday, April 11, 2016

CalBug and DIY History: Crowd sourced transcriptions

No comments:

Post a Comment