Digitizing Raynal

A collaborative digital research project

On the heels of Cecil Courtney and Jenny Mander’s recent publication, Raynal’s ‘Histoire des deux Indes’ colonialism, networks and global exchange (OSE, 2015), I am pleased to announce a new international research project aimed at further exploring Raynal’s monumental work and its impact on Enlightenment thought. Thanks to the generous support of the Consortium for the Study of the Premodern World at the University of Minnesota, the Centre for Digital Humanities Research at the Australian National University, Stanford University Libraries, and The ARTFL Project at the University of Chicago, we have recently completed the digitization and text encoding (in TEI-XML) of the three primary editions of the Histoire philosophique et politique des établissements et du commerce des Européens dans les deux Indes. These editions – the first edition of 1770, the second of 1774, and the 1780 third edition – were those that Raynal himself oversaw during his lifetime.

Our digital editions are based on high quality PDFs provided by the BNF’s Gallica online library (1770 and 1780 editions) and the Bodleian’s Oxford Google Books Project (1774 edition). A preliminary search interface has been built using the ARTFL Project’s PhiloLogic software and can be accessed here: Raynal search form. Users can query one or all of the above editions, which represent the first publicly available full-text digital edition(s) of the Histoire des deux Indes. In the coming months we will release a new version of the database running on ARTFL’s state-of-the-art PhiloLogic4 system, along with a preliminary ‘intertextual interface’ that will aim to incorporate the text of the three separate editions into one reading interface.

Roe1-2

Title page and frontispiece of the 1780 edition of Raynal’s Histoire des deux Indes (Gallica).

Diderot, Hornoy, and the 1780 edition

What is perhaps most exciting about these new digital resources is the inclusion of a unique 1780 edition of the Histoire des deux Indes recently made available by the BNF. Acquired at public auction in March 2015, this particular edition had been conserved since the late 18th century in the private library of Alexandre Marie Dompierre d’Hornoy (1742-1828). A lawyer at the Parlement de Paris and great-nephew of Voltaire – he in fact inherited Jean-Baptiste Pigalle’s infamous nude statue of Voltaire upon his great-uncle’s death – Hornoy corresponded with many of the philosophes, Diderot included. His copy of the Histoire contains pencil marks in the margins of some passages, an unremarkable fact, perhaps, were it not for a note written by Hornoy just above a three-page insert at the beginning of the first tome. The handwritten tables included in the insert list all the sections marked in pencil over the four volumes of text: ‘mourceaux qui sont de M. Diderot’, Hornoy writes, ‘marqués en crayon par Mme de Vandeul’. Madame de Vandeul was, of course, Diderot’s daughter.

Roe3

Handwritten insert of the 1780 edition (Gallica)

The existence of such an annotated volume of the Histoire was posited in the 19th century, notably by Joseph Marie Quérard in his Supercheries littéraires dévoilées (5 vols., 1845-1856). Quérard claimed that there supposedly existed a copy of the 1780 edition on which Diderot himself had marked in pencil all the passages that belonged to him [1]. According to Quérard, this copy became the property of Madame de Vandeul shortly after Diderot’s death. Whether or not the copy acquired by the BNF is the same as that owned by Vandeul we cannot say for sure, but Herbert Dieckmann, in his inventory of the ‘fonds Vandeul’, also mentions the hypothetical existence of a copy of the in-4o edition (e.g. 1780) that was purportedly annotated by hand, but that had since been lost [2].

Some preliminary experiments

While consensus as to the validity of Hornoy’s assertion that the marked sections are in fact those authored by Diderot will most likely take years to accrue, we can begin, using the new digital edition, to ask some basic questions as to the authorship claims indicated in the text. Thanks to extensive markup in TEI-XML notation, sections purportedly belonging to Diderot are clearly indicated, and perhaps more importantly, can be extracted as one test corpus. Using some basic statistical measures drawn from authorship attribution studies, or Stylometry, we can begin to think about how the ‘Diderot’ sections may, or may not, differ stylistically – i.e. in terms of comparative word usage over the most common words, an established metric of ‘authorship’ in stylometry and forensic linguistics – from the rest of the text.

Roe4

Page from 1780 edition with ‘Diderot’ section marked in pencil (Gallica)

Working with the Centre for Literary and Linguistic Computing at the University of Newcastle (Australia), and in particular with their Intelligent Archive software for stylistic and statistical text analysis, we extracted the top 200 words for each ‘author’ (e.g. those drawn from sections putatively by Diderot, and the remaining ‘Raynal’ sections). As a result, we were left with 4 ‘Diderot’ tomes (containing all of the text marked in pencil) and 4 ‘Raynal’ tomes (containing the remainder), representing their unique word lists over the entire edition. For a first preliminary test, we ran a cluster analysis on the 8 tomes to see if they would cluster together or separately:

Roe5

Cluster analysis of ‘Diderot’ tomes vs. ‘Raynal’ tomes, based on top 200 word lists

Cluster analysis works by separating (or clustering) the most similar texts first and the most distinct last, in this case into 2 branches. A division like the one above, clearly separated into two distinct ‘trees’ is a very clear indication that the texts in each of the two branches are highly likely to be those of two different authors.

Principal component analysis (PCA) provides another method of examining our corpora. PCA is a procedure for identifying a smaller number of uncorrelated variables, called ‘principal components’, from a large set of data. The goal of PCA is to explain the maximum amount of variance with the fewest number of principal components. In our case, it is a technique that allows for the first two principal components of our two sets of texts, i.e. their word variance, to be plotted on a bi-axial or two-dimensional graph. One of these plots (using the 100 most frequent words of the full text) with both text corpora divided into 10,000 word blocks, is shown below.

Roe6

Principal component analysis using 10,000 word blocks and 100 most frequent words

The disparity in size of our two test corpora meant that while there were 68 text sections for Raynal (in green), there were only 14 for Diderot (in blue). Nonetheless, the separation between the two authorial sets is almost complete, with just two of the Diderot sections located in the outer fringes of the Raynal set. Since the word variables underlying this plot were the 100 most frequent words of the whole text, this is a convincing stylistic division, one that suggests a strong distinction in terms of authorship signal between the two sets.

In order to account for the size discrepancy between the two corpora, we ran another PCA test but this time we increased the number of Diderot sections by segmenting his text into 5,000 word blocks and running these against the previous Raynal 10,000-word sections. This plot is shown below:

Roe7

Principal component analysis on 5,000 word blocks (Diderot) and Raynal, using 100 most frequent words

Here we see the same sort of authorial/stylistic separation as we saw above, but this time (with the Diderot sections halved in size) the distinction is even stronger, as there is only one section located within the Raynal set of entries, indicating an even greater likelihood that the sections marked in pencil were written by a different author than the rest of the 1780 edition.

These are obviously very rudimentary experiments, but they nonetheless indicate several promising future avenues of exploration. Moving forward, we intend to apply a full suite of computational and stylistic approaches to the 1780 edition and its predecessors, including sequence alignment tools developed by ARTFL, text collation software, and the MEDITE system developed by the labex OBVIL at the Sorbonne for computational genetic criticism. All of these approaches will allow us to explore the textual evolution of the Histoire from 1770 to 1780 in an unprecedented manner, as well as its relationship to other Enlightenment texts and text collections such as Electronic Enlightenment, TOUT Voltaire, and the Encyclopédie.

*I would especially like to thank Alexis Antonia and the Centre for Literary and Linguistic Computing at Newcastle for their generous help with the above stylistic analyses.

[1] See Michèle Duchet, Diderot et l’Histoire des deux Indes ou l’écriture fragmentaire, Paris, Nizet, 1978, p. 22.

[2] Herbert Dieckmann, Inventaire du fonds Vandeul et inédits de Diderot, Genève, Droz, 1951.

ViTA: Visualization for Text Alignment

A preliminary outcome of our Commonplace Cultures Digging into Data project, we have developed a web-based visual analytics system called ViTA: Visualization for Text Alignment. Hosted by the Oxford e-Research Centre at the University of Oxford, ViTA is a web-based visual analytics interface that enables domain experts to construct a text alignment pipeline, visualize the components and connections for any given method (i.e., an alignment model) using image processing techniques, and then test assumptions about the corresponding inputs and outputs. Rather than visualizing the alignment results in a post hoc manner – as is often the case with many available alignment packages – ViTA’s interactive pipeline editing facility essentially becomes a visual programming interface from which users can iteratively build and export more efficient text alignment methods.

ViTA Editor panel

ViTA Editor panel

Screen shot of a ViTA text alignment

Screen shot of a ViTA text alignment

We are hoping to use the ViTA interface to refine our existing PhiloLine-PAIR alignment algorithms, with the goal of identifying ‘commonplaces’ and other forms of large-scale text reuse in the Gale-Cengage Eighteenth Century Collections Online (ECCO) database. A classic ‘big data’ humanities collection, ECCO currently contains more than 32 million digitized pages from 182,898 titles in 205,639 volumes.

The Passion of Charles Péguy

I am delighted to announce that my first book, The Passion of Charles Péguy: Literature, Modernity, and the Crisis of Historicism is now available from Oxford University Press. Those of you in Australia & New Zealand can purchase the book at a 30% discount using this promotional flyer.
Book_Cover
Summary: In many ways, the development of twentieth-century literary criticism and theory can be seen as a prolonged struggle against the pervading influence of nineteenth-century positivist historicism. Anglo-American New Criticism and later French Post-structuralism and Deconstruction are the best-known instances of this conflict. Less widely known, but no less important to contemporary literary studies, are Charles Péguy’s earlier debates with French academic historicism in the years leading up to World War One. First examined by Antoine Compagnon in his ground-breaking work La Troisième République des lettres in 1983, it is a period in French literary and cultural history that remains, some thirty years later, largely untreated in English. This book thus addresses an important, albeit relatively unexplored, moment in the development of twentieth-century literary history and theory. By way of Péguy’s foundational polemics with modernity and his role in the related ‘crisis of historicism’, we gain a better understanding of the critical basis from which similar anti-positivist and anti-historicist critiques were later enacted on both sides of the Atlantic. In situating Péguy’s passions and polemics within the larger cultural and historical context, Glenn H. Roe invites us to reconsider and re-evaluate Péguy’s place among twentieth-century literary figures. Beyond its literary-critical aspects, The Passion of Charles Péguy provides a general view of early twentieth-century debates related to the role of literary studies in modern society, the reform of the French educational system, and the formation of literary history as an academic discipline in both France and abroad.

Digging into Data

I am very pleased to be one of the co-investigators for a winning project in the third round of the Digging into Data Challenge, an international grant scheme that brings together teams working in computer science and the humanities in the US, Canada, UK, and Netherlands. Our project, “Commonplace Cultures: Mining Shared Passages in the 18th Century using Sequence Alignment and Visual Analytics”, aims to explore 18th-century literary culture through the lens of the early modern practice of commonplacing. Leveraging previous work on data mining and automatic classification of Enlightenment texts (link), machine learning approaches to textual borrowings and source criticism in the 18th century (link), sequence alignment techniques for identifying intertextuality (link) and citation practices in the Encyclopédie (link), we plan to use these same approaches to examine commonplaces and to visualise their deployment over the largest collection of 18th-century works ever assembled.

This project is a partnership between the ARTFL Project and Computation Institute (CI) at the University of Chicago and the University of Oxford’s e-Research Centre (OeRC) and Voltaire Foundation (VF). Bringing together world-class centres for Enlightenment studies (ARTFL, VF) and multi-disciplinary computing applications (CI, OeRC), the team consists of 18th-century scholars: Robert Morrissey (PI, Chicago) and Nicholas Cronk (Co-I, Oxford); computer scientists: Min Chen (PI, Oxford) and Ian Foster (Co-I, Chicago); and digital humanists: Mark Olsen (Chicago), and me (ANU), among other participants.

See the new Project Website for more updates.

TOUT Voltaire…

09The Voltaire Foundation, in collaboration with the ARTFL Project, is pleased to announce the public release of the TOUT VOLTAIRE online database. This database brings you in fully searchable form all of Voltaire’s works apart from his correspondence (which can be searched separately, in Electronic Enlightenment).

Currently publishing the Complete works of Voltaire in print, the Voltaire Foundation plans to unveil an online version of this definitive critical edition sometime after 2018. In the meantime, this plain text version of Voltaire’s writings (without critical apparatus or notes) is the most reliable version available anywhere on the web.

The various editions used to establish this database are clearly marked: from the Voltaire Foundation’s own Complete works of Voltaire to nineteenth-century editions by Beuchot and Moland, among others. When possible we have included Voltaire’s notes, as well as some textual variants depending on the edition. Pagination, however, is often not representative of the print editions, so if you wish to cite Voltaire for scholarly purposes, you should always consult the list of the best critical editions currently available.

The TOUT VOLTAIRE database is built using ARTFL’s full-text search and retrieval engine PhiloLogic, one of the oldest and most successful text analysis systems in the digital humanities. With a wide variety of search and reporting functions, users can look for words, groups of words, or phrases over Voltaire’s entire corpus, or in individual works (and even parts of works). Results can be displayed in context, as frequency reports (by title, by decade, etc.), or as a collocation table and word cloud.

Example searches could include:

For more search tips, please visit the PhiloLogic user manual.

This research tool is made available free of charge by the Voltaire Foundation (University of Oxford) and the ARTFL Project (University of Chicago). If you wish to make a contribution to our work, please contact the Voltaire Foundation.

Australian Society for French Studies

21st Annual conference of the Australian Society for French Studies

9–11 December 2013
University of Queensland
Brisbane, Australia

Distance/proximité:

Keynote speakers:

  • Professor Marc Augé
    The anthropological gaze and fieldwork of Marc Augé has focussed on societies from the Ivory Coast to Paris. The celebrated author of Non-Places: Introduction to an Anthropology of Supermodernity, In the Metro and Oblivion, Augé coined the term “non-places” to designate ambivalent transit spaces (airport lounges, hotel rooms, supermarkets) that do not inspire feelings of belonging or lasting social relations among the majority of those who pass through.
     
  • Dr Charlotte Dejean-Thircuir of Université Stendhal – Grenoble 3 is an expert in the fields of teaching French as a foreign language (FLE) and distance education. She is the director of Stendhal’s two programmes in FLE which are taught in distance mode. She has researched and published on student-tutor interaction online; learner communities online; peer-guided learning online. 

     

  • Emeritus Professor Peter Cryle, founding director of the Centre for the History of European Discourses at the University of Queensland, is a scholar of intellectual and cultural history. His current work focuses on the historical emergence of the idea of the normal in nineteenth-century European thinking, especially in France and Italy. This research is focussed on medical and anthropological texts, and is funded by an ARC grant shared with Elizabeth Stephens. He also has a strong interest in French fiction, including middle-brow fiction of the nineteenth century and libertine literature of the eighteenth.