ViTA: Visualization for Text Alignment

A preliminary outcome of our Commonplace Cultures Digging into Data project, we have developed a web-based visual analytics system called ViTA: Visualization for Text Alignment. Hosted by the Oxford e-Research Centre at the University of Oxford, ViTA is a web-based visual analytics interface that enables domain experts to construct a text alignment pipeline, visualize the components and connections for any given method (i.e., an alignment model) using image processing techniques, and then test assumptions about the corresponding inputs and outputs. Rather than visualizing the alignment results in a post hoc manner – as is often the case with many available alignment packages – ViTA’s interactive pipeline editing facility essentially becomes a visual programming interface from which users can iteratively build and export more efficient text alignment methods.

ViTA Editor panel

ViTA Editor panel

Screen shot of a ViTA text alignment

Screen shot of a ViTA text alignment

We are hoping to use the ViTA interface to refine our existing PhiloLine-PAIR alignment algorithms, with the goal of identifying ‘commonplaces’ and other forms of large-scale text reuse in the Gale-Cengage Eighteenth Century Collections Online (ECCO) database. A classic ‘big data’ humanities collection, ECCO currently contains more than 32 million digitized pages from 182,898 titles in 205,639 volumes.

Digging into Data

I am very pleased to be one of the co-investigators for a winning project in the third round of the Digging into Data Challenge, an international grant scheme that brings together teams working in computer science and the humanities in the US, Canada, UK, and Netherlands. Our project, “Commonplace Cultures: Mining Shared Passages in the 18th Century using Sequence Alignment and Visual Analytics”, aims to explore 18th-century literary culture through the lens of the early modern practice of commonplacing. Leveraging previous work on data mining and automatic classification of Enlightenment texts (link), machine learning approaches to textual borrowings and source criticism in the 18th century (link), sequence alignment techniques for identifying intertextuality (link) and citation practices in the Encyclopédie (link), we plan to use these same approaches to examine commonplaces and to visualise their deployment over the largest collection of 18th-century works ever assembled.

This project is a partnership between the ARTFL Project and Computation Institute (CI) at the University of Chicago and the University of Oxford’s e-Research Centre (OeRC) and Voltaire Foundation (VF). Bringing together world-class centres for Enlightenment studies (ARTFL, VF) and multi-disciplinary computing applications (CI, OeRC), the team consists of 18th-century scholars: Robert Morrissey (PI, Chicago) and Nicholas Cronk (Co-I, Oxford); computer scientists: Min Chen (PI, Oxford) and Ian Foster (Co-I, Chicago); and digital humanists: Mark Olsen (Chicago), and me (ANU), among other participants.

See the new Project Website for more updates.