A preliminary outcome of our Commonplace Cultures Digging into Data project, we have developed a web-based visual analytics system called ViTA: Visualization for Text Alignment. Hosted by the Oxford e-Research Centre at the University of Oxford, ViTA is a web-based visual analytics interface that enables domain experts to construct a text alignment pipeline, visualize the components and connections for any given method (i.e., an alignment model) using image processing techniques, and then test assumptions about the corresponding inputs and outputs. Rather than visualizing the alignment results in a post hoc manner – as is often the case with many available alignment packages – ViTA’s interactive pipeline editing facility essentially becomes a visual programming interface from which users can iteratively build and export more efficient text alignment methods.
We are hoping to use the ViTA interface to refine our existing PhiloLine-PAIR alignment algorithms, with the goal of identifying ‘commonplaces’ and other forms of large-scale text reuse in the Gale-Cengage Eighteenth Century Collections Online (ECCO) database. A classic ‘big data’ humanities collection, ECCO currently contains more than 32 million digitized pages from 182,898 titles in 205,639 volumes.