Modelling Enlightenment: ERC Consolidator Grant

I am thrilled to announce the ModERN project: Modelling Enlightenment – Reassembling Networks of Modernity through data-driven research. This five-year project, funded by the generous support of the European Research Council, seeks to write a new data-driven literary history of the French Enlightenment and its subsequent reception. See the project abstract below – more information to follow.

The Enlightenment has long been associated with the rise of modern Europe, and more generally with the concept of a typically European Modernity that took root in its wake. What it means to be ‘modern’ is indelibly bound up with our understanding of the Enlightenment’s core concepts: reason, religious toleration, civic virtue, political liberty, and scientific progress, to name but a few. For some, the Enlightenment is an essentially philosophical matter; for others it was and remains deeply political. Whatever the case may be, one thing is certain: for better or worse, it is widely accepted that the Enlightenment ushered in a new, modern era in both politics and philosophy, beginning in the 1790s and continuing today. The role of 18th-century ideas in this modernising process, and by proxy, the books that came to embody them, has long been the subject of intense scholarly debate, primarily concerned with the social and intellectual causes of the French Revolution. As a result, the field of Enlightenment studies today continues to privilege a relatively small canon of writers— primarily those that participated in the more ‘radical’ strains of Enlightenment thought in France. This is only one version, or vision, of the Enlightenment, however, albeit one that tends to dominate contemporary discourse. This project aims to fundamentally re-evaluate this interpretation of the Enlightenment and its actors by expanding the knowledge base on which these previous claims have been made; not only in terms of the diversity of authors and texts included, but also in the development of new digital techniques for identifying and analysing 18th-century information networks and their subsequent reception. In so doing, ModERN will move Enlightenment studies in a decidedly new direction; one that is both more comprehensive and more systematic in terms of its relationship to the existing digital cultural record, and one that likely challenges subsequent narratives of European Modernity.

Voltaire’s Correspondence – Digital Readings

La lettre au fil du temps: philosophe

‘La lettre au fil du temps: philosophe.’

A stamp produced by the French post office in 1998 celebrates the art of letter-writing by depicting Voltaire writing letters with both hands. It’s true that Voltaire wrote a lot of letters – over 15,000 are known, and more turn up all the time – but even so it’s not altogether clear that an ambidextrous letter-writer is someone we entirely want to trust. Voltaire’s correspondence is full of difficulties and traps, and faced by such a huge corpus, it is hard to know where to start. Without question, the Besterman ‘definitive’ edition (1968-77), digitised in Electronic Enlightenment, has had a major impact on Enlightenment scholarship: historians and literary critics make frequent use of these letters, but usually in an instrumental way, adducing a single passage in a letter as evidence in support of a date or an interpretation.

Nicholas Cronk and Glenn Roe, Voltaire’s correspondence: digital readings (CUP, 2020)

Nicholas Cronk and Glenn Roe, Voltaire’s correspondence: digital readings (CUP, 2020).

Voltaire’s letters can be notoriously ‘unreliable’, however, and they really need to be read and interpreted – like all his texts – as literary performances. Few critics have attempted to examine the corpus of the correspondence in its entirety and to understand it as a literary whole. In our new book, Voltaire’s correspondence: digital readings, we have experimented with a range of digital humanities methods, to explore to what extent they might help us identify new interpretative approaches to this extraordinary correspondence. The size of the corpus seems intimidating to the critic, but it is precisely this that makes these texts a perfect test-case for digital experimentation: we can ask questions that we would simply not have been able to ask before.

For example, we looked at the way Voltaire signs off his letters – and were surprised to find that only 13% of the letters are actually signed ‘Voltaire’; while over a third of the letters are signed with a single letter, ‘V’. Then Voltaire is hugely inventive in the way he plays with the rules of epistolary rhetoric, posing as a marmot to the duc de Choiseul. And if you want to know why in a letter (D18683) to D’Alembert he signs off ‘Miaou’, the answer is to be found in a fable by La Fontaine…

We studied Voltaire as a neologist. Critics have usually described Voltaire as an arch-classicist adhering rigorously to the norms of seventeenth-century French classicism. True, yet at the same time he is hugely energetic in coining new words, an aspect of his literary style that has been insufficiently studied. Here, corpus analysis tools, coupled with available lexicographical digital resources, allow us to consider Voltaire’s aesthetic of lexical innovation. In so doing, we can test the hypothesis that Voltaire uses the correspondence as a laboratory in which he can experiment with new formulations, ideas, and words – some of which then pass into his other works. We identified 30 words first coined by Voltaire in his letters, and another 36 words first used in his other works, many of which are then reused in the correspondence. Emmanuel Macron has encouraged the description of himself as a ‘président jupitérien’, so it’s good to discover that ‘jupitérien’ is one of the words first coined by Voltaire.

Voltaire letter

A letter in Voltaire’s hand, sent from the city of Colmar to François Louis Defresnay (D5612, dated 1753/1754).

A reader of Voltaire’s letters cannot fail to be struck by the frequency of his literary quotations. We explore this phenomenon through the use of sequence alignment algorithms – similar to those used in bioinformatics to sequence genetic data – to identify similar or shared passages. Using the ARTFL-Frantext database of French literature as a comparison dataset, we attempt a detailed quantification and description of French literary quotations contained in Voltaire’s correspondence. These citations, taken together, give us a more comprehensive understanding of Voltaire’s literary culture, and provide invaluable insights into his rhetoric of intertextuality. No surprise that he quotes most often the authors of ‘le siècle de Louis XIV’, though it was a surprise to find that Les Plaideurs is the Racine play most frequently cited. And who expected to find two quotations from poems by Fontenelle (neither of them identified in the Besterman edition)?! Quotations in Latin also abound in Voltaire’s letters, many of these drawn, predictably enough, from the famous poets he would have memorised at school, Horace, Virgil, and Ovid – but we also identified quotations, hitherto unidentified, from lesser poets, such as a passage from Manilius’ Astronomica. By examining as a group the correspondents who receive Latin quotations, and assigning to them social and intellectual categories established by colleagues working at Stanford, we were able to establish clear networks of Latin usage throughout the correspondence, and confirm a hunch about the gendered aspect of quotation in Latin: Voltaire uses Latin only to his élite correspondents, and even then, with notably rare exceptions such as Emilie Du Châtelet, only to men.

The woman on the left, a trainee pilot in the Brazilian air force, is an unwitting beneficiary of Voltaire’s bravura use of Latin quotation. The motto of the Air Force Academy is a stirring (if slightly macho) Latin quotation: ‘Macte animo, generose puer, sic itur ad astra’ (Congratulations, noble boy, this is the way to the stars). The quotation is one that Voltaire uses repeatedly in some dozen letters, and it is found later, for example in Chateaubriand’s Mémoires d’outre-tombe. On closer investigation it turns out that this piece of Latin is an amalgam of quotations from Virgil and Statius – in effect, a piece of pure Voltairean invention.

In the end, Voltaire’s correspondence is undoubtedly one of his greatest literary masterpieces – but it is arguably one that only becomes fully legible through the use of digital resources and methods. Our intention with this book was to affirm the simple postulate that digital collections – whether comprised of letters, literary works, or historical documents – can, and should, enable multiple reading strategies and interpretative points of entry; both close and distant readings. As such, digital resources should continue to offer inroads to traditional critical practices while at the same time opening up new, unexplored avenues that take full advantage of the affordances of the digital. Not only can digital humanities methods help us ask traditional literary-critical questions in new ways – benefitting from economies of both scale and speed – but, as we show in the book, they can also generate new research questions from historical content; providing interpretive frameworks that would have been impossible in a pre-digital world.

The size and complexity of Voltaire’s correspondence make it an almost ideal corpus for testing the two dominant modes of (digital) literary analysis: on the one hand, ‘distant’ approaches to the corpus as a whole and its relationship to a larger literary culture; on the other, fine-grained analyses of individual letters and passages that serve to contextualise the particular in terms of the general, and vice versa. The core question at the heart of the book is thus one that remains largely untreated in the wider world: how can we use digital ‘reading’ methods – both close and distant – to explore and better understand a literary object as complex and multifaceted as Voltaire’s correspondence?

– Nicholas Cronk & Glenn Roe, Co-directors of the Voltaire Lab at the VF

Digitizing the Enlightenment

As country after country has gone into COVID-19 lockdown, we have all had to learn to communicate, network, teach, study and relate online in ways unimaginable a few short years – or even months – ago. This phenomenon is just the latest stage in the information-technology revolution and part and parcel of the ongoing development of an increasingly digital society. This revolution has touched almost every aspect of our lives, from how we work, study, shop, relax and even make and maintain personal relationships. But it is also transforming scholarship and the way we conduct and communicate academic research. Thus, it is perhaps apt, and with consummate good timing, that Oxford University Studies in the Enlightenment has chosen to subject tag our new volume as ‘History of Scholarship (Principally of Social Sciences and Humanities)’. Yet this is certainly not how we and our collaborators envisaged our project at the outset, nor can any single tag capture the content of our volume and its collaborative agenda in its entirety.

Ironically, as we write, Digitizing Enlightenment is also a living movement – or at least a loose network of scholars who meet annually in pursuit of a common agenda. That agenda was born in a series of conversations that took place from 2010, culminating in Dan Edelstein’s post-panel suggestion at the American Historical Association conference at Montreal in April 2014 that we should hold periodic meetings between like-minded digital projects relating to the Enlightenment. The aim of these meetings would be to establish common conventions and digital standards, with a view to linking our resources and realising the enormous and still largely untapped potential of Linked Open Data. Those present for Dan’s suggestion – Simon Burrows, Jeff Ravel, Sean Takats and Dan himself – have all provided chapters for our book, but much of the energy behind Digitizing Enlightenment since has come from Glenn Roe, who Simon had first encountered a month earlier in Australia, where they had both recently taken up academic positions.

DigitizingEnlightenment_Logo

t was this fortuitous coincidence, underpinned by the fertile combination of Simon’s professorial establishment funds and Glenn’s energy, together with their mutual contact books, that led to Western Sydney University hosting the first Digitizing Enlightenment symposium in July 2016. Among the projects discussed there, and in our book, were large-scale treatments of Enlightenment correspondences, theatre attendance records, and textual corpora including the mid-eighteenth century Encyclopédie; bibliometric projects were presented on the production and dissemination of literature; together with presentations on mapping and data visualization growing out of these projects. The symposium was so well received that it has been an annual event ever since. It was held at Radboud University in Nijmegen (2017), Oxford (2018), Edinburgh (2019). In 2020, but for COVID-19, it would have been held in Montpellier.

It was not entirely by chance that such a project coalesced around the guiding notion of the ‘Enlightenment’. For the long eighteenth century has been blessed by a number of high-profile and long-established digital projects. These include ground-breaking commercial datasets such as Gale-Cengage’s Eighteenth-Century Collections Online (ECCO), which features in several of our chapters, semi-commercial projects such as the Electronic Enlightenment and large academic consortiums such as the Franco-American ARTFL project. This made the Enlightenment a natural laboratory for exploring the possibilities and achievements of the Digital Humanities for transforming scholarship on a single historical era. Further, as our book emphases, our discussions built on a long tradition of digital innovation in eighteenth-century studies that can be traced back at least as far as the twin Livre et société dans la France du XVIIIe siècle volumes produced by a team led by François Furet in 1965 and 1970. It might further be added that our over-arching subject material lends itself to digital-historical analysis; the Enlightenment might after all be viewed as the long-run culmination of the intellectual turmoil and – as several contributors point out – information overload unleashed by a previous technological and communications revolution.

9781789621945

With this in mind, then, we offer up Digitizing Enlightenment: Digital Humanities and the Transformation of Eighteenth-Century Studies as rather more than a contribution to the history of scholarship. Certainly, we have offered a sample of Digital Humanities c. 2016-2020, as it relates to the technologies available and their application to Enlightenment studies broadly construed. In addition, the first half of the book offers detailed accounts of the origins and development of key Enlightenment digital projects up until that point, accompanied by valuable and sometimes disarming insights on the dangers and delights of digital research from foremost practitioners in the field. These chapters, as well as some later contributions, are helping to reshape some dominant meta-narratives of the Enlightenment, not least by hinting simultaneously at the enduring aristocratic leadership of the French Enlightenment and the extent to which Enlightenment literary production and consumption was infused with religious content. However, our contributors also showcase other ways that Digital Humanities scholarship is in the process of changing the field through the transparency, methodological rigour, and collaborative imperatives that are necessary concomitants of this new kind of research. Finally, the book offers a collaborative roadmap for future digital research – at a moment where, as our final contributor, Sean Takats points out, the Enlightenment is fast losing its privileged position as the most richly digitized century of the modern era. As a corollary, we hope that our volume may be as useful to scholars of other periods as for Enlightenment scholars themselves.

– Simon Burrows (Western Sydney University) and Glenn Roe (Sorbonne University)


Voltaire Lab

As part of collaborative efforts with the Voltaire Foundation to establish the Voltaire Lab as a virtual research centre, we are pleased to announce a major update of the TOUT Voltaire database and search interface, expanding links between the ARTFL Encyclopédie Project and several new research databases made available for the first time. Working in close collaboration with the ARTFL Project at the University of Chicago – one of the oldest and better known North American centres for digital humanities research – we have rebuilt the TOUT Voltaire database under PhiloLogic4, ARTFL’s next-generation search and corpus analysis engine.

Image1

New Search interface for TOUT Voltaire

PhiloLogic4 is a powerful research tool, allowing users to browse Voltaire’s works dynamically by date or title, along with further faceted browsing using the ‘title’, ‘year’ and ‘genre’, combined with word and phrase searching. Word searches are greatly improved for flexibility and ease of display and now include four primary result reports:

  • Concordance, or search terms in their context
  • KWIC, or line-by-line occurrences of the search term
  • Collocation, or terms that co-occur most with the search term
  • Time Series, which displays search term frequency over time

The new search interface will allow users to formulate complex queries with relatively little effort, following lines of enquiry in a dynamic fashion that moves from ‘distant reading’ scales of exploration to more fine-grained close textual analysis.

Image2

TOUT Voltaire search results

Also in collaboration with ARTFL, we have just released the Autumn Edition 2017 of the ARTFL Encyclopédie, a flagship digital humanities project that for the past almost twenty years has made available online the full text of Diderot and d’Alembert’s great philosophical dictionary. This new release offers many new features, functionalities and improvements. The powerful new faceted search and browse capabilities offered by PhiloLogic4 allow users better to leverage the organisational structure of the Encyclopédie – classes of knowledge, authors, headwords, volumes, and the like. Further it gives them the possibility of exploring the interesting alternatives offered by algorithmically or machine-generated classes. The collocation search generates word-clouds or word lists that are clickable to obtain concordances for any of the words immediately. Further improvements include new author attributions, various text corrections, and better cross-referencing functionality.

Image3

New ARTFL Encyclopédie interface

This release also contains a beautiful new set of high-resolution plate images. Clickable thumbnail versions lead to larger images that can be viewed in much greater detail than was previously possible.

Image4

New high resolution plate images, ‘Imprimerie en taille douce’

Image5

Close up of plate image

Thanks to the Voltaire Foundation, full biographies of the encyclopédistes are directly accessible from within the ARTFL Encyclopédie simply by clicking on the name of the author of any given article. This information is drawn directly from Frank and Serena Kafker’s The Encyclopedists as Individuals: A Biographical Dictionary of the Authors of the Encyclopédie (SVEC 257, 1988) – still the standard reference for biographical information on the Encyclopédie’s 139 contributors. Our hope is that this first experiment will demonstrate the value of linking digital resources openly in ways that can add value to existing projects and, at the same time, increase the visibility of the excellent works contained in the Oxford University Studies in the Enlightenment back catalogue.

Finally, we have begun the work of establishing new research collections that will form the basis of the Voltaire Lab’s textual corpus. For example, working with files provided by Electronic Enlightenment, we have combined all of Voltaire’s correspondence with TOUT Voltaire. This new resource, which we are for the moment calling ‘TV2’, contains over 22,000 individual documents and more than 13 million words, making it one of the largest single-author databases available for research. Due to copyright restrictions in the correspondence files we cannot make the full dataset publicly available, however we are keen to allow researchers access to this important resource on a case-by-case basis. Students and scholars who wish to access the PhiloLogic4 build of TV2 should contact me here.

Performing Transdisciplinarity

This Australian Research Council Discovery Project is a cross-institutional collaboration between ANU (Glenn Roe and Robert Wellington), The University of Melbourne (Erin Helyard), The University of Sydney (Mark Ledbury), and Oxford University (Nicholas Cronk). Through the study of a unique and ambitious eighteenth-century songbook – Jean-Benjamin de Laborde’s Choix de Chansons (1773) – our project provides a workable solution to these questions by way of the notion of ‘transdisciplinarity’. First described by the developmental psychologist and philosopher Jean Piaget as a superior stage of interdisciplinary relationships, the transdisciplinary approach implies a total system of interrelated knowledge without established disciplinary boundaries; a system that has much in common with that imagined by Diderot and d’Alembert in the Encyclopédie. We propose that the Choix de Chansons is also in many ways a quintessential transdisciplinary object. As such, it requires a new methodological approach that operates at the interface of interdisciplinary collaboration, rich historical contextualization, and new media dissemination.

This is the first project of its kind to address the complex transdisciplinary and transmedial nature of both Laborde’s Choix de Chansons, and of eighteenth-century print culture more generally. The complementary disciplines of musicology, art history, and French literature will create a unique transdisciplinary matrix in which our team will ‘perform’ Laborde’s text in order to recreate its original modes of reception, evoking the ways eighteenth-century participants appreciated, decoded, and debated the intersections of music, visual art, and literature.

Engraving from Laborde's Choix des chansons

Engraving from Laborde’s Choix de chansons

Most often, this kind of cultural consumption was enacted publicly and sensationally at the opera. But, unlike such multimedia events, and based on the quasi-democratic principles of Masonic culture, Laborde’s text is meant for a small community of like-minded individuals who commingle their performative experiences in the intimacy of a salon around a single instrument: harp or harpsichord. In many respects, Laborde’s project aims to reproduce – albeit, in miniature – the operatic experience by simulating the close connections between image, music and text. The novel aspect in Laborde’s scenarios is that these connections take place not in the public arena of the opera box, where spectators perform only as audience members, but rather in the chamber, where the participants are no longer merely spectators but themselves performers.

This emphasis on individual expression as a meaningful component of a close engagement with others in a culture of sociability and sensibilité echoes the contemporaneous musical, philosophical and social trends. By implication, any attempt at recapturing the creative, receptive, and performative complexity of Laborde’s songbook – or any other complex cultural artefact for that matter – today requires new models of cross-disciplinary collaboration and multimedia dissemination. Our project will provide one such model, aimed at reproducing digitally the cultural context of the Chansons, both as an object of transdisciplinary communication – one that actively speaks from the nexus of image music, and text – and as a product of the cultural and intellectual networks of the time, from courtly and salon culture to the more progressive sociability of the Masonic societies.

By moving the Chansons from print to digital media we can not only incorporate multiple layers of remediation (image, music, text), but also shed greater light on the various strata – social, intellectual, political, philosophical – that informed the work’s production and its relationship to the cultural networks mentioned above. We will develop a new digital edition of the Chansons that will present high-resolution scans of its pages and engravings alongside transcriptions of the poetry and recordings of the songs. This juxtaposition will expose the image-music-text relationship inherent to the illustrated songbook.

text by Glenn Roe, Erin Helyard, Mark Ledbury, and Robert Wellington.

Related conference presentations:

“Performing Transdisciplinarity: Image, Music, and Text in Eighteenth-Century Print Culture” (with Robert Wellington) in Digital Approaches in the Study of Early Modern Culture, ANU, Canberra, November 2015; and Recasting the Question: Digital Approaches in Art History and Museums, The University of Sydney, November 2015.

Digitizing Raynal

A collaborative digital research project

On the heels of Cecil Courtney and Jenny Mander’s recent publication, Raynal’s ‘Histoire des deux Indes’ colonialism, networks and global exchange (OSE, 2015), I am pleased to announce a new international research project aimed at further exploring Raynal’s monumental work and its impact on Enlightenment thought. Thanks to the generous support of the Consortium for the Study of the Premodern World at the University of Minnesota, the Centre for Digital Humanities Research at the Australian National University, Stanford University Libraries, and The ARTFL Project at the University of Chicago, we have recently completed the digitization and text encoding (in TEI-XML) of the three primary editions of the Histoire philosophique et politique des établissements et du commerce des Européens dans les deux Indes. These editions – the first edition of 1770, the second of 1774, and the 1780 third edition – were those that Raynal himself oversaw during his lifetime.

Our digital editions are based on high quality PDFs provided by the BNF’s Gallica online library (1770 and 1780 editions) and the Bodleian’s Oxford Google Books Project (1774 edition). A preliminary search interface has been built using the ARTFL Project’s PhiloLogic software and can be accessed here: Raynal search form. Users can query one or all of the above editions, which represent the first publicly available full-text digital edition(s) of the Histoire des deux Indes. In the coming months we will release a new version of the database running on ARTFL’s state-of-the-art PhiloLogic4 system, along with a preliminary ‘intertextual interface’ that will aim to incorporate the text of the three separate editions into one reading interface.

Roe1-2

Title page and frontispiece of the 1780 edition of Raynal’s Histoire des deux Indes (Gallica).

Diderot, Hornoy, and the 1780 edition

What is perhaps most exciting about these new digital resources is the inclusion of a unique 1780 edition of the Histoire des deux Indes recently made available by the BNF. Acquired at public auction in March 2015, this particular edition had been conserved since the late 18th century in the private library of Alexandre Marie Dompierre d’Hornoy (1742-1828). A lawyer at the Parlement de Paris and great-nephew of Voltaire – he in fact inherited Jean-Baptiste Pigalle’s infamous nude statue of Voltaire upon his great-uncle’s death – Hornoy corresponded with many of the philosophes, Diderot included. His copy of the Histoire contains pencil marks in the margins of some passages, an unremarkable fact, perhaps, were it not for a note written by Hornoy just above a three-page insert at the beginning of the first tome. The handwritten tables included in the insert list all the sections marked in pencil over the four volumes of text: ‘mourceaux qui sont de M. Diderot’, Hornoy writes, ‘marqués en crayon par Mme de Vandeul’. Madame de Vandeul was, of course, Diderot’s daughter.

Roe3

Handwritten insert of the 1780 edition (Gallica)

The existence of such an annotated volume of the Histoire was posited in the 19th century, notably by Joseph Marie Quérard in his Supercheries littéraires dévoilées (5 vols., 1845-1856). Quérard claimed that there supposedly existed a copy of the 1780 edition on which Diderot himself had marked in pencil all the passages that belonged to him [1]. According to Quérard, this copy became the property of Madame de Vandeul shortly after Diderot’s death. Whether or not the copy acquired by the BNF is the same as that owned by Vandeul we cannot say for sure, but Herbert Dieckmann, in his inventory of the ‘fonds Vandeul’, also mentions the hypothetical existence of a copy of the in-4o edition (e.g. 1780) that was purportedly annotated by hand, but that had since been lost [2].

Some preliminary experiments

While consensus as to the validity of Hornoy’s assertion that the marked sections are in fact those authored by Diderot will most likely take years to accrue, we can begin, using the new digital edition, to ask some basic questions as to the authorship claims indicated in the text. Thanks to extensive markup in TEI-XML notation, sections purportedly belonging to Diderot are clearly indicated, and perhaps more importantly, can be extracted as one test corpus. Using some basic statistical measures drawn from authorship attribution studies, or Stylometry, we can begin to think about how the ‘Diderot’ sections may, or may not, differ stylistically – i.e. in terms of comparative word usage over the most common words, an established metric of ‘authorship’ in stylometry and forensic linguistics – from the rest of the text.

Roe4

Page from 1780 edition with ‘Diderot’ section marked in pencil (Gallica)

Working with the Centre for Literary and Linguistic Computing at the University of Newcastle (Australia), and in particular with their Intelligent Archive software for stylistic and statistical text analysis, we extracted the top 200 words for each ‘author’ (e.g. those drawn from sections putatively by Diderot, and the remaining ‘Raynal’ sections). As a result, we were left with 4 ‘Diderot’ tomes (containing all of the text marked in pencil) and 4 ‘Raynal’ tomes (containing the remainder), representing their unique word lists over the entire edition. For a first preliminary test, we ran a cluster analysis on the 8 tomes to see if they would cluster together or separately:

Roe5

Cluster analysis of ‘Diderot’ tomes vs. ‘Raynal’ tomes, based on top 200 word lists

Cluster analysis works by separating (or clustering) the most similar texts first and the most distinct last, in this case into 2 branches. A division like the one above, clearly separated into two distinct ‘trees’ is a very clear indication that the texts in each of the two branches are highly likely to be those of two different authors.

Principal component analysis (PCA) provides another method of examining our corpora. PCA is a procedure for identifying a smaller number of uncorrelated variables, called ‘principal components’, from a large set of data. The goal of PCA is to explain the maximum amount of variance with the fewest number of principal components. In our case, it is a technique that allows for the first two principal components of our two sets of texts, i.e. their word variance, to be plotted on a bi-axial or two-dimensional graph. One of these plots (using the 100 most frequent words of the full text) with both text corpora divided into 10,000 word blocks, is shown below.

Roe6

Principal component analysis using 10,000 word blocks and 100 most frequent words

The disparity in size of our two test corpora meant that while there were 68 text sections for Raynal (in green), there were only 14 for Diderot (in blue). Nonetheless, the separation between the two authorial sets is almost complete, with just two of the Diderot sections located in the outer fringes of the Raynal set. Since the word variables underlying this plot were the 100 most frequent words of the whole text, this is a convincing stylistic division, one that suggests a strong distinction in terms of authorship signal between the two sets.

In order to account for the size discrepancy between the two corpora, we ran another PCA test but this time we increased the number of Diderot sections by segmenting his text into 5,000 word blocks and running these against the previous Raynal 10,000-word sections. This plot is shown below:

Roe7

Principal component analysis on 5,000 word blocks (Diderot) and Raynal, using 100 most frequent words

Here we see the same sort of authorial/stylistic separation as we saw above, but this time (with the Diderot sections halved in size) the distinction is even stronger, as there is only one section located within the Raynal set of entries, indicating an even greater likelihood that the sections marked in pencil were written by a different author than the rest of the 1780 edition.

These are obviously very rudimentary experiments, but they nonetheless indicate several promising future avenues of exploration. Moving forward, we intend to apply a full suite of computational and stylistic approaches to the 1780 edition and its predecessors, including sequence alignment tools developed by ARTFL, text collation software, and the MEDITE system developed by the labex OBVIL at the Sorbonne for computational genetic criticism. All of these approaches will allow us to explore the textual evolution of the Histoire from 1770 to 1780 in an unprecedented manner, as well as its relationship to other Enlightenment texts and text collections such as Electronic Enlightenment, TOUT Voltaire, and the Encyclopédie.

*I would especially like to thank Alexis Antonia and the Centre for Literary and Linguistic Computing at Newcastle for their generous help with the above stylistic analyses.

[1] See Michèle Duchet, Diderot et l’Histoire des deux Indes ou l’écriture fragmentaire, Paris, Nizet, 1978, p. 22.

[2] Herbert Dieckmann, Inventaire du fonds Vandeul et inédits de Diderot, Genève, Droz, 1951.

ViTA: Visualization for Text Alignment

A preliminary outcome of our Commonplace Cultures Digging into Data project, we have developed a web-based visual analytics system called ViTA: Visualization for Text Alignment. Hosted by the Oxford e-Research Centre at the University of Oxford, ViTA is a web-based visual analytics interface that enables domain experts to construct a text alignment pipeline, visualize the components and connections for any given method (i.e., an alignment model) using image processing techniques, and then test assumptions about the corresponding inputs and outputs. Rather than visualizing the alignment results in a post hoc manner – as is often the case with many available alignment packages – ViTA’s interactive pipeline editing facility essentially becomes a visual programming interface from which users can iteratively build and export more efficient text alignment methods.

ViTA Editor panel

ViTA Editor panel

Screen shot of a ViTA text alignment

Screen shot of a ViTA text alignment

We are hoping to use the ViTA interface to refine our existing PhiloLine-PAIR alignment algorithms, with the goal of identifying ‘commonplaces’ and other forms of large-scale text reuse in the Gale-Cengage Eighteenth Century Collections Online (ECCO) database. A classic ‘big data’ humanities collection, ECCO currently contains more than 32 million digitized pages from 182,898 titles in 205,639 volumes.

Digging into Data

I am very pleased to be one of the co-investigators for a winning project in the third round of the Digging into Data Challenge, an international grant scheme that brings together teams working in computer science and the humanities in the US, Canada, UK, and Netherlands. Our project, “Commonplace Cultures: Mining Shared Passages in the 18th Century using Sequence Alignment and Visual Analytics”, aims to explore 18th-century literary culture through the lens of the early modern practice of commonplacing. Leveraging previous work on data mining and automatic classification of Enlightenment texts (link), machine learning approaches to textual borrowings and source criticism in the 18th century (link), sequence alignment techniques for identifying intertextuality (link) and citation practices in the Encyclopédie (link), we plan to use these same approaches to examine commonplaces and to visualise their deployment over the largest collection of 18th-century works ever assembled.

This project is a partnership between the ARTFL Project and Computation Institute (CI) at the University of Chicago and the University of Oxford’s e-Research Centre (OeRC) and Voltaire Foundation (VF). Bringing together world-class centres for Enlightenment studies (ARTFL, VF) and multi-disciplinary computing applications (CI, OeRC), the team consists of 18th-century scholars: Robert Morrissey (PI, Chicago) and Nicholas Cronk (Co-I, Oxford); computer scientists: Min Chen (PI, Oxford) and Ian Foster (Co-I, Chicago); and digital humanists: Mark Olsen (Chicago), and me (ANU), among other participants.

See the new Project Website for more updates.

TOUT Voltaire…

09The Voltaire Foundation, in collaboration with the ARTFL Project, is pleased to announce the public release of the TOUT VOLTAIRE online database. This database brings you in fully searchable form all of Voltaire’s works apart from his correspondence (which can be searched separately, in Electronic Enlightenment).

Currently publishing the Complete works of Voltaire in print, the Voltaire Foundation plans to unveil an online version of this definitive critical edition sometime after 2018. In the meantime, this plain text version of Voltaire’s writings (without critical apparatus or notes) is the most reliable version available anywhere on the web.

The various editions used to establish this database are clearly marked: from the Voltaire Foundation’s own Complete works of Voltaire to nineteenth-century editions by Beuchot and Moland, among others. When possible we have included Voltaire’s notes, as well as some textual variants depending on the edition. Pagination, however, is often not representative of the print editions, so if you wish to cite Voltaire for scholarly purposes, you should always consult the list of the best critical editions currently available.

The TOUT VOLTAIRE database is built using ARTFL’s full-text search and retrieval engine PhiloLogic, one of the oldest and most successful text analysis systems in the digital humanities. With a wide variety of search and reporting functions, users can look for words, groups of words, or phrases over Voltaire’s entire corpus, or in individual works (and even parts of works). Results can be displayed in context, as frequency reports (by title, by decade, etc.), or as a collocation table and word cloud.

Example searches could include:

For more search tips, please visit the PhiloLogic user manual.

This research tool is made available free of charge by the Voltaire Foundation (University of Oxford) and the ARTFL Project (University of Chicago). If you wish to make a contribution to our work, please contact the Voltaire Foundation.