Dan Cohen and Sean Takats of the Center for History and New Media (CHNM) at Mason have been awarded a $100,000 grant from the National Endowment for the Humanities as part of their Digging into Data Challenge competition.
Cohen and Takats’s project, “Using Zotero and TAPoR on the Old Bailey Proceedings: Data Mining with Criminal Intent,” will develop tools and models for comparing, visualizing, and analyzing the history of crime, using the Old Bailey Online, which contains extensive court records of more than 197,000 individual trials held over a period of 240 years in Great Britain.
The two are part of a team that includes scholars from the University of Hertfordshire in the U.K. and the University of Alberta in Canada. The team was awarded a total of approximately $300,000 for their project.
The Old Bailey Proceedings comprise 120 million words of structured text, representing the largest body of printed descriptions of behavior ever published, either in print or online. Scholars have gravitated toward this collection as a rare and expansive storehouse of human activity, made more important by its unique record of the activities of ordinary citizens who are otherwise absent from the official record.
Given its emphasis on crime, and hence on behaviors thought unacceptable and deserving of punishment, the Old Bailey Proceedings are filled with unusual and compelling stories. Historians have picked through these stories to illustrate the social, cultural, and intellectual history of a given era; half a dozen book-length academic case studies have been based on events most fully documented in single trials recorded in the proceedings.
And yet, despite this wealth of scholarship, Cohen says, the use of these legal records by historians remains little different to how a solicitor or attorney might have approached them centuries ago. “Scholars still sift through these documents one at a time, hoping to discover an unusual or indicative case, or relying on rough, partially manual keyword hit counts to discern patterns of criminal behavior that might support a thesis,” says Cohen.
The project will use tools such as Zotero, developed by the Center for History and New Media, to analyze the types of language used in court and how they changed over time. They will also compare these “data mined” patterns to those found in tagged data to create a whole new way of charting change in crime reporting and prosecution and benchmark a new methodology for the consistent discovery of related descriptions.
“The significance of this project therefore runs beyond the discipline of the history of crime, and addresses historical scholarship more broadly,” says Cohen. “Through this project we will show that there are not only new insights to be gained, but also a greater historical rigor to be achieved by moving from the anecdotal to the comprehensive; by moving from the single trial or narrow run of relevant examples to an analysis of statistically significant textual patterns found in this source as a single, massive whole.”
The team was one of eight international research teams that were awarded grants for projects that promote innovative humanities and social science research using large-scale data analysis. Four leading research agencies sponsor the international competition: the Joint Information Systems Committee from the United Kingdom, the National Endowment for the Humanities and the National Science Foundation from the United States, and the Social Sciences and Humanities Research Council from Canada.
“Trying to manage a deluge of data and turn bits of information into useful knowledge is a problem that affects almost everyone in today’s digital age,” said NEH Chairman Jim Leach. “With this international grant program, NEH is hoping to seed projects that will not only benefit researchers in the humanities, but also lead to shared cultural understanding.”