Viral Texts Team, Northeastern University (Ryan Cordell, Elizabeth Maddock Dillon, David Smith)
Reconstructing Nineteenth-Century Virality from Unstructured Newspaper Archives
The Viral Texts project aims to elucidate several important aspects of nineteenth-century print culture: the fluid working relationships between authors, editors, and others involved in print production; the malleability of genres; and the close connection of publishing to partisan politics and to demographic changes in reading publics. To do so, the project draws on scanned newspaper archives that lack many of the modern conveniences scholars have come to expect in textual editions, such as divisions of issues into stories, author attributions, collations of variants, or even accurate transcriptions of the printed text. We will discuss case studies of how, given a set of reasonable assumptions, an inductive, bottom-up approach can recover not only information about particular texts but can also make inference about entire systems of publication and readership. Using statistical language modeling, sequence alignment, and network and geospatial analysis, we can begin to construct systematic catalogues and narratives of culture. Indeed, by jettisoning many of the expectations afforded by edited textual editions, these methods foreground the bibliographic messiness inherent to nineteenth-century newspapers. Identifying (partially) duplicated texts is both a computational challenge—in which we experiment iteratively to identify texts of different lengths, structures, or kinds—and a literary-historical challenge—as we confront questions about what constitutes “a text” through series of reprints, quotations, parodies, and other textual engagements.
Matthew Wilkens, University of Notre Dame
Literature and Economic Change in the Twentieth Century
Literary critics have long posited a central role for the market in shaping literary production. They’re surely right, but what does this relationship look like in practice? Does literature respond to changes in economic output in real time? Is literary attention a leading or lagging indicator of economic growth? How closely does the fictional world align with the shifting geography of the global economy? Computational analysis of 10,000 twentieth-century novels suggests that literary-geographic attention is highly and surprisingly stable across both space and time, even in the face of major economic realignments. This fact has lent recent fiction a notable implicit conservatism as new texts continue to reflect the economic world as it was in the 1950s more closely than as it is today. Fiction’s relative unresponsiveness to economic developments at the level of geographic content also raises important questions about the nature of the broader relationship between economic and cultural production.
Global Literary Networks Team, U. of Chicago & Cornell (Hoyt Long, Tom McEnaney, Richard Jean So)
Patterns Taken for Wonder: Computational Approaches to Global Modernism
This paper represents part of a larger book project that aims to provide a new history of modernism by combining computational and critical methods. A major claim of this project is that quantitative modeling of large (and small) corpora of modernist texts allows us to discern latent aesthetic qualities and recover the sociological patterns of authorship, meaning, and readership in which they were embedded. In this paper, we focus specifically on the narrative technique of “stream of consciousness” and its patterns of circulation across generic and national boundaries. Critics and writers find the technique at once easy to recognize, but difficult to consistently define and taxonomize. To better understand this paradox, we consider what SOC looks like when modeled as a quantifiable set of stylistic and grammatical features. We hypothesize that these features correspond to larger textual patterns that are statistically distinct from other kinds of writing. This analysis then facilitates a broader historical and sociological inquiry into how and why this form became so popular and influential, diffusing not only through America and England, but also to Japan, where it had a brief, transformative impact.
Natalie Houston, University of Houston
Material Form: Towards a Sociological Poetics
The new nineteenth-century archive should be understood as encompassing material artifacts, their digital surrogates, and the information generated in preserving, creating, and analyzing them. My work draws on the large scale view of Victorian poetry’s cultural field now available in this archive to develop a computational sociological poetics. Printed poetry is distinguished from prose in its bibliographic materiality, its visual design, and its linguistic style. Examining these three kinds of signifying codes contributes to a new understanding of poetry’s production, circulation, and cultural function in the Victorian period. This talk takes Pre-Raphaelite poetry as its case study, discarding its traditional biographical definition in order to explore an algorithmic aesthetics.
Meredith Martin, Meagan Wilson, Jean Bauer (Center for Digital Humanities), Princeton University
Princeton Prosody Archive
The Princeton Prosody Archive (PPA) charts the unstable enterprise of poetic discourse in the long nineteenth century. Collecting digitized monographs from 1570-1923, the PPA allows users to search and display the full text and page images of more than 5,500 works across four broad categories, limited mostly to English language and literature: (1) Poetry: containing versification manuals and scholarly introductions to single author studies that elucidate some aspect of the author’s versification; (2) Grammar, containing grammar books and discourse about grammar (prosody was a subset of grammar until the late nineteenth century); (3) History, containing histories of both language and literature; and (4) Speech, containing guides to rhetoric, oratory, elocution as well as guides to phonetics, linguistics, and phonemics. Like many other large archive projects in collaboration with the HathiTrust Research Center, tracking and eliminating duplicates as opposed to reprints and editions has been a main challenge, as well as thinking in the broadest possible sense about how these texts – already fundamentally devoted to analyzing and quantifying culture in the form of poetry – present a limit case to theories of poetic genre, authorial importance, and the history of the study of poetry and linguistics. In analyzing a federated library catalog, how far can we push algorithmic deduplication, and when should we (re)turn to bibliographic histories? Can we develop full-text searches that allow us to reconstruct scholarly conversations before the widespread use of footnotes but including complicated, footnote-style discursive notes? Can we treat visual representations of meter as an early form of markup, and if so, how do we extract those visual elements from the OCR to make them computationally available?
Ted Underwood, University of Illinois, U-C
The Unreasonable Influence of Poetry
Causality is a vexed topic in literary history; cases of direct literary influence are often hard to disentangle from less direct structural causes. Quantitative methods don’t exactly solve this problem, but they may at least sharpen our understanding of it. The puzzling, dubiously-causal pattern I’ll struggle with here is that changes in poetic diction (1700-1900) turn out to be unreasonably effective as predictors of changes that will occur in fiction about twenty years later. In other words, there’s evidence of what economists call “Granger causality” between these genres. This evidence struck me initially as unreasonable because my materialist training makes me skeptical of the once-popular notion that poets lead the vanguard of linguistic change. In an attempt to discover alternative explanations, I look more closely at the demographics of authors and the distribution of literary prestige in both genres.
Marisa Gemma, Max Planck Institute
Narrative Talk: A Historical Study of Speech-Based Forms in Fictional Narration
Historians of English and literary critics have broadly agreed that over the course of the nineteenth century, the “standard” idiom of written English down-shifted into a more colloquial, speech-based register: diction became more concrete, sentences shorter, syntax less complex. While the historical and ideological reasons for this shift have been extensively explored, its concrete impact on fictional style has remained understudied, in part due to the fact that literary scholars’ traditional methods—focusing on a small handful of texts— did not lend themselves to examining such large-scale historical shifts. This paper will offer empirical and quantitative data about this shift, based on stylistic analysis of a corpus of over 1200 English and American novels. Specifically, I will focus on “narrative talk,” or speech-based forms used in fictional narration. How do such forms, traditionally associated with dialogue and represented speech, behave in fictional narration—how does “narrative talk” change over time?
Matthew Jockers, University of Nebraska-Lincoln
A Computational Morphology of Plot
Jockers will discuss his most recent research charting plot structures in a corpus of 50,000 narratives. He’ll discuss how he leveraged tools and techniques from natural language processing, sentiment analysis, signal processing, and machine learning in order to develop an R package that extracts latent plot arcs from narrative fiction. The presentation will include an overview of the method, the “syuzhet” R package, and the major conclusions of the research thus far.
Andrew Piper, McGill University
This project starts from a question that appears to have achieved something of a highpoint in the 1970s and has assumed renewed currency today: what makes a text “fictional”? If we understand “fictionality” in its narrowest sense as writing that is not true, how do we know when a text isn’t “true”? Are there unique properties to untrue texts versus true ones? A great deal of recent computational work has started to identify what distinguishes fictional texts from non-fictional ones. The aim of my project is to draw on this longer history of theoretical inquiry into the nature of fiction to specify with greater clarity the differences that define “fictionality” as a form of textual untruth (or irreality). Can we identify the ways in which works of fiction are more or less true, more or less real, rather than just different from “non-fiction”? How do ideas of “realism” and “fictionality” intersect? How might we think about classifying novels according to a scale of their proximity/distance to the “world” (however that may be defined)? Using a series of small, curated data sets ranging from the nineteenth century to the present, my aim is to better understand the nature of fictionality in all of its potentially diverse aspects.
Stanford Literary Lab, Stanford University (Mark Algee-Hewitt & Ryan Heuser)
Between Suspense and Fear: Two Digital Approaches to Recovering Traces of Textual Affect
Among the accusations that have been leveled against the computational study of literary texts is that the approach is rigidly formalistic, leaving little room for critical interventions into other ways of understanding texts, such as affect theory. Yet, we argue, the emerging methods of this field are uniquely able to investigate affect in Literature through a combination of algorithmic and social scientific approaches. In this paper, we present two projects of the Stanford Literary Lab that attempt to recover the conditions of possibility for readerly affect through Digital Humanities methodologies. In the first, The Emotions of London, we combine natural language processing with crowdsourcing techniques to recover the affective experience of place in eighteenth and nineteenth-century novels. In the second, Suspense: Language, Narrative, Affect, we seek to leverage social psychological methods with data mining to uncover the narrative patterns that are shared between texts that create the experience of suspense. Between these two projects, we seek to open up a new area of study in our field, one that combines the richly formalistic approach of quantitative analysis with a deeper understanding of the text as a uniquely affective communicative medium.
Tanya Clement, University of Texas-Austin
HiPSTAS: High Performance Sound Technologies for Access and Scholarship
Scholars have limited chances to do new kinds of research (what Jerome McGann calls “imagining what you don’t know”) with oral text collections. At this time, even though we have digitized hundreds of thousands of hours of culturally significant audio artifacts and have developed increasingly sophisticated systems for computational analysis of text, there is very little provision for discovering how prosodic features change over time and space or how tones differ between groups of aindividuals and types of speech, or how one poet or storyteller’s cadence might be influenced by or reflected in another’s. In response to this lack, and in order to encourage new research opportunities with sound collections, the High Performance Sound Technologies for Analysis and Scholarship (HiPSTAS) project has been exploring using spectral analysis and machine learning to analyze folklore recordings from the University of Texas Folklore Center Archives and poetry from PennSound. This talk will explore the evolution of the HiPSTASproject, from first interviews with poets and scholars to assess what, given the perfect tools and all points of access, they would want to do with sound collections to recent results using software to locatepatterns across PennSound based on date, speaker, and venue and across the folklore collection based on genre (instrumental, spoken, and sung). Developing systems based on the “close listening” practices that literary scholars and scholars interested in folk and poetry collections employ alongside the “distant listening” practices that automated discovery, classification and visualization encourage is essential for building critically productive tools for scholars.