The ARTFL Project

By Sebastian Hierl

WESS Newsletter
Spring 2005
Vol. 28, no. 2

previous article next article

ARTFL, or the Project for American and French Research on the Treasury of the French Language, is a cooperative enterprise of Analyse et Traitement Informatique de la Langue Française (ATILF) of the Centre National de la Recherche Scientifique (CNRS), and the Division of the Humanities at the University of Chicago, with support of the University of Chicago Library.

The history of ARTFL began in 1957, when the French government initiated the creation of the Trésor de la Langue Française. In order to provide access to a large body of word samples, it was decided to create an extensive corpus of digitized texts. In 1981, ARTFL was established by the CNRS and the University of Chicago to provide access to FRANTEXT, a corpus representing a broad range of written French – from novels and poetry to biology and mathematics – stretching from the 17th to the 20th centuries.

Over the years, the goals of ARTFL have remained the same:

ARTFL Subscription Databases

An ARTFL subscription provides access to a number of databases and projects. First and foremost is FRANTEXT. Informally known as the “main ARTFL database,” FRANTEXT provides access to nearly 2,000 standard scholarly editions, ranging from classic works of French literature to various kinds of non-fiction prose and technical writing. The 18th, 19th, and 20th centuries are about equally represented, with a smaller selection of 17th century texts as well as some medieval and Renaissance texts. Genres include novels, verse, theater, journalism, essays, correspondence, and treatises. Subjects include literary criticism, biology, history, economics, and philosophy. Both the CNRS and the University of Chicago are committed to the future growth of the ARTFL Project and this includes the continuous expansion of FRANTEXT (as with Encore: New Additions to the Main ARTFL Database).

To complement FRANTEXT, ARTFL has made available to subscribers the digital version of one of the greatest achievements of Enlightenment, the Encyclopédie. The project provides access in full text to the 72,000 articles published between 1751 and 1772 and continues to be expanded with the recent addition of the four volumes of Robinet’s Supplément à l'Encyclopédie. While the full text for both projects has been made available for immediate access, ARTFL is continuously improving upon the data capture by eliminating typographical errors that occurred during digitization.

Another important addition to the ARTFL Project are the Dictionnaires d’autrefois, providing full text access to Jean Nicot's Thresor de la langue française (1606; in conjunction with the University of Toronto), to Jean-François Féraud's Dictionaire critique de la langue française (Marseille, Mossy 1787-1788; in conjunction with GEHLF at the École Normale Supérieure) and to the 1st (1694), 4th (1762), 5th (1798), 6th (1835), and 8th (1932-5) editions of the Dictionnaire de L'Académie française (in conjunction with the University of Toronto and Éditions Champion). The full text of the third edition (1552) of Robert Estienne's Dictionarium latinogallicum has also recently been added (in cooperation with the University of Toronto) and a project to capture the full text for the 1740 edition of Pierre Bayle’s Dictionnaire historique et critique is currently underway. Other titles considered for full text capture, and for which ARTFL would like to collaborate with interested institutions, are the 20th and last edition (1759) of Louis Moréri's Grand dictionnaire historique and Pierre Larousse’s Grand dictionnaire universel du XIXe siècle, as well as the Dictionnaire de la conversation et de la lecture.

Further included in the ARTFL subscription are the French Women Writers project (currently 99 texts); Provençal Poetry, developed by the University of Minnesota and the ARTFL Project (38 texts); the Textes de Français Ancien, developed by University of Ottawa and the ARTFL Project (103 texts); and – expanding the subscription to the Italian language – the Opera del Vocabolario Italiano.

The ARTFL subscription has remained the same over the years: $500 for Ph.D. granting institutions and $250 for other universities and colleges. Access is unlimited by IP recognition and there are no maintenance fees.

Additional Access Provided to ARTFL subscribers

Beyond the subscription databases, ARTLF provides full text access by IP recognition to the following databases, some of which are only available on CD-ROM. To receive access to these, institutions must both be ARTFL subscribers and have purchased access to the individual databases/CD-ROMs directly from the publisher. These include Voltaire électronique, based upon the Voltaire Foundation Oxford edition of the Complete Works of Voltaire; B.A.S.I.L.E.: Le Corpus de la littérature narrative du Moyen Age au XXe siècle: Romans, Contes, Nouvelles, by the Éditions Honoré Champion; the Teatro Español del Siglo de Oro, published by Chadwyck-Healey; and Art Theorists of the Italian Renaissance, also by Chadwyck-Healey. Furthermore, ARTFL has been providing full text to the Bibliothèque des lettres, formerly by the Editions Bibliopolis.

Collaborations

ARTFL provides access to a large number of collaborative projects. Access modalities to these vary and are indicated on the ARTFL web page, under “Collaborations”. For access to restricted databases, contact the publisher or institution directly.

Together with the Electronic Full Text Services (EFTS) at the University of Chicago Library, ARTFL is working with Alexander Street Press on providing the ARTLF full text search engine, PhiloLogic™, to the majority of Alexander Street Press databases. Additional projects developed with EFTS include the freely accessible Italian Women Writers project and the PhiloLogic implementation of EEBO-TCP (in conjunction with the Text Creation Partnership and restricted access to members only), as well as Lincoln/Net (with Northern Illinois University), and the Image of France (with Binghamton University).

ARTFL also works directly with a number of institutions on French language projects, such as the University of Chicago’s Department of Romance Languages and Literatures on the Montaigne Project ; the Groupe international de Recherches balzaciennes and the Maison de Balzac on the recently developed online critical edition of Balzac’s La Comédie humaine; as well as with Fabula on the Artamène Project ; and with the Center for Research Libraries on the Pamphlets and Periodicals of the French Revolution of 1848. All of these are freely accessible.

Furthermore, ARTFL collaborates with CRL on providing full text searching to projects pertaining to the Digital South Asia Library and the Digital Dictionaries of South Asia ; as well as with the ItalNet Consortium.

Future Developments

The ARTFL Project welcomes contributions and proposals on providing access to new corpora of texts. Users at institutional members will continue to play an important role in providing direction to the ARTFL Project and are invited to contact ARTFL to discuss possible collaborations.

PhiloLogic

PhiloLogic™ is the full-text search, retrieval, and analysis tool developed by the ARTFL Project and the University of Chicago Library. With PhiloLogic being at the heart of so many ARTFL collaborations, on account of its performance and versatility, it is briefly described in this section.

The latest version of PhiloLogic has been developed to manage large TEI-Lite document collections in XML and SGML, using the Unicode character specification. Originally implemented to support FRANTEXT, PhiloLogic has been extended to support a wide variety of textual and hypermedia databases in collaboration with numerous academic institutions and publishers, such as Alexander Street Press. These collaborations have permitted to develop PhiloLogic as a language independent and Unicode-compliant search engine (databases are currently running in French, English, Italian, Spanish, German, Hindi, Urdu, Pushto, Tamil, and a number of South Asian languages) that permits the inclusion of page images, streaming video and audio, and other multimedia formats.

PhiloLogic permits to search on single terms and phrases and to perform proximity searches on terms or phrases occurring within the same sentence or paragraph. For proximity searches, the user may stipulate the number of words separating the search terms. Wildcards can be used and combined with Boolean operators such as “OR” (the vertical bar "|") and “AND” (a space). This full text searching can be coupled with a large number of limits on the associated metadata. For example, one may combine a complicated full text search with biographical data on the author or bibliographic information. The advanced search interface of the Italian Women Writers database provides an example of this at http://www.lib.uchicago.edu/efts/IWW/search.advanced.html.

Rather than searching on the full text, one may also search on sections of the texts, such as forewords, introductions, conclusions, notes, scenes, acts, epilogues, covers, chapters, etc. In this manner, one may quickly compare introductions or openings of novels or plays contained in a corpus and, again, limit one’s search by the associated metadata: http://www.lib.uchicago.edu/efts/IWW/find.divs.html.

Results are provided within their context, as a concordance report, or within a Keyword In Context (KWIC) report, displaying all occurrences of the word(s) with results highlighted. The user may browse through the full context of any result, examining sentences or paragraphs around the target of the search. PhiloLogic displays the bibliographic information and page number for each occurrence and results may also be sorted by frequency by author, year, or title. Furthermore, a collocation table provides information on which words are most commonly used with your search term.

ARTFL continues to make improvements to PhiloLogic that are announced at http://philologic.uchicago.edu/. Having been released as Open Source software, PhiloLogic can be freely downloaded at http://philologic.uchicago.edu/download.php. Further information, such as a user manual, is also available at the site.


Editor: Sarah G. Wenzel

Association of College & Research Libraries
©American Library Association

Return to WESS Newsletter
WESSWEB