Online Seminar DHAI

Fostered by the creation of new algorithms, computation power and the development of deep learning techniques, Artificial Intelligence needs constantly to confront new issues and data sets in order to deepen its methodologies and increase its range of scientific applications. Digital humanities, developing digital science methodologies in the study of humanities and using the critical approaches of humanities in the analysis of the contemporary “digital revolutions”, are constantly in search of new tools to explore more and more complex and diversified data sets.

The coupling AI/DH is globally emerging as one key interface for both domains and will probably prove to be a deep transformative trend in tomorrow intellectual world.

The ambition of this seminar is to be one of the places where this coupling is shaped, fostered and analyzed. It intends to offer a forum where both communities, understood in a very inclusive way, exchange around emerging issues, ongoing projects, and past experiences in order to build a common language, a shared space, and to encourage innovative cooperation on the long run.

October 22, 2019, Emily L. Spratt (Columbia) 
Title: Exhibition Film Screening of  » Au-delà du Terroir, Beyond AI Art, » and Discussion with Curator Emily L. Spratt
Abstract: « Au-delà du Terroir, Beyond AI Art » is the art exhibition for the Global Forum on AI for Humanity, which is being hosted by President Macron and is sponsored by the Government of France; it is being held at the at the Institute de France, Quai Conti, Paris, October 28-30, 2019. The forum is a direct outcome of the last G7 meeting regarding the responsible use of AI. This exhibition shapes the visual tenor of the meeting and features the brilliance of some of the leading artists and cultural heritage specialists of our time who are working with AI: Hito Steyerl, Mario Klingemann, Refik Anadol, Robbie Barrat, AICAN (Ahmed Elgammal), ICONEM, and even one project in collaboration with the inimitable Chef Alain Passard. In this limited screening of the exhibition, which takes places in the form of a digital projection, curator Emily L. Spratt will discuss the issues involved with « AI art » and the implications of the creative applications of machine learning for images and videos. A recent Columbia article was published on the exhibition.

October 22, 2019, Emily L. Spratt (Columbia) 
Title: Art, Ethics, and AI: Problems in the Hermeneutics of the Digital Image
Abstract: In the last five years, the nature of historical inquiry has undergone a radical transformation as the use of AI-enhanced search engines has become the predominant mode of knowledge investigation, consequentially affecting our engagement with images. In this system, the discovery of responses to our every question is facilitated as the vast stores of digital information that we have come to call the data universe are conjured to deliver answers that are commensurate with our human scale of comprehension, yet often exceed it. In this digital interaction it is often assumed that queries are met with complete and reliable answers, and that data is synonymous with empirical validity, despite the frequently changing structure of this mostly unsupervised repository of digital information, which in actuality projects a distortion of the physical world it represents. In this presentation, the role of vision technology and AI in navigating, analyzing, organizing, and constructing our art and art historical archives of images will be examined as a shaping force on our interpretation of the past and projection of the future. Drawing upon the observations made by Michel Foucault in The Archaeology of Knowledge that the trends toward continuity and discontinuity in descriptions of historical narratives and philosophy, respectively, are reflections of larger hermeneutic structures that in and of themselves influence knowledge formation, the question of the role of image-related data science in our humanistic interpretation of the world will be explored. Through the examples of preservationists and artists using machine learning techniques to curate and create visual information, and in consideration of the information management needs of cultural institutions, the machine-learned image will be posited as a new and radical phenomenon of our society that is altering the nature of historical interpretation itself. By extension, this presentation brings renewed attention to aesthetic theory and calls for a new philosophical paradigm of visual perception to be employed for the analysis and management of our visual culture and heritage in the age of AI, one which incorporates and actively partakes in the development of computer vision-based technologies. 

September 17, 2019, Alexei Efros (UC Berkeley) 
Title:Finding Visual Patterns in Large Photo Collections for Visualization, Analytics, and Artistic Expression
Abstract: Our world is drowning in a data deluge, and much of this data is visual. Humanity has captured over one trillion photographs last year alone. 500 hours of video is being uploaded to YouTube every minute. In fact, there is so much visual data out there already that much of it might never be seen by a human being! But unlike other types of ‘Big Data’, such as text, much of the visual content cannot be easily indexed or searched, making it Internet’s ‘digital dark matter’ [Perona 2010]. In this talk, I will first discuss some of the unique challenges that make visual data difficult compared to other types of content. I will them present some of our work on navigating, visualizing, and mining for visual patterns in large-scale image collections. Example data sources will include user-contributed Flickr photographs, Google StreetView imagery of entire cities, a hundred years of high school student portraits, and a collection of paintings attributed to Jan Brueghel. I will also show how recent progress in using deep learning as a way to find visual patterns and correlations could be used to synthesize novel visual content using ‘image-to-image translation’ paradigm. I will conclude with examples of contemporary artists using our work as a new tool for artistic visual expression.

October 7, 2019, Léa Saint-Raymond / Béatrice Joyeux-Prunel for the DHAI organizing members(ENS) [Slides
Title:When Digital Humanities meet Artificial Intelligence, an Introduction
Abstract: Introductory and methodological session on the themes of the seminars

November 5, 2019, Thierry Poibeau, Mathilde Roussel and Matthieu Raffard, Tim Van De Cruys(Lattice (CNRS)/ IRIT (Toulouse)) [video] [Slides
Title:Oupoco, l’ouvroir de poésie potentielle (Thierry Poibeau)
Abstract: Oupoco, l’ouvroir de poésie potentielle. Thierry Poibeau, Lattice (Paris). La présentation portera sur le projet Oupoco, qui est largement inspiré de l’ouvrage de Raymond Queneau « Cent mille milliards de poèmes », paru en 1961. Dans cet ouvrage, Queneau propose 10 sonnets dont tous les vers riment, ce qui permet de les combiner librement pour composer des poèmes respectant la forme du sonnet. Dans le cadre d’Oupoco, les poèmes de Queneau ont été remplacés par des sonnets du 19e siècle, qui sont à la fois libres de droit et plus variés quant à leur forme et leur structure. Un module d’analyse (structure globale, type de rimes, etc.) a été mis en place et les informations ainsi obtenues servent de base au générateur produisant des sonnets respectant les règles propres à ce genre. Au-delà de l’aspect ludique du projet, celui-ci pose des questions quant au statut de l’auteur, et quant à la cohérence et la pertinence des poèmes produits. Il suscite aussi la curiosité, et amène par exemple souvent le lecteur à revenir aux sonnets source pour vérifier quel est le sens original d’un vers donné. Finalement certaines extensions récentes du projet seront présentés, comme la « boîte à poésie », une version portative du générateur Oupoco.
Title: Présentation de la Boîte à poésie (Mathilde Roussel and Matthieu Raffard)
Abstract: Boîte à poésie, un générateur de poésie portable et basse consommation, développé dans le cadre du projet Oupoco suite à une collaboration avec l’Atelier Raffard-Roussel.
Title: La génération automatique de poésie à l’aide de réseaux de neurones (Tim Van De Cruys)
Abstract: La génération automatique de poésie est une tâche ardue pour un système informatique. Pour qu’un poème ait du sens, il est important de prendre en compte à la fois des aspects linguistiques et littéraires. Les modèles de langue basés sur les réseaux de neurones ont amélioré l’état de l’art par rapport à la modélisation prédictive de langage, mais quand ils sont entraînés sur des corpus de texte généraux, ils ne génèrent évidemment pas de poésie en soi. Dans cette présentation, on explorera comment ces modèles – entraînés sur des textes généraux – peuvent être adaptés afin de modéliser les aspects linguistiques et littéraires nécessaires pour la génération de poésie. Le cadre présenté est appliqué à la génération de poèmes en français, et évalué à l’aide d’une évaluation humaine. Le projet Oupoco est soutenu par le labex Transfers et l’EUR Translitterae.

December 2, 2019, Mathieu Aubry (ENPC) 
Title: Machine learning and text analysis for digital humanities
Abstract: I will present key concepts and challenges of Deep Learning approaches and in particular their applications on images for digital humanities. The presentation will use three concrete examples to introduce these concepts and challenges: artwork price predictionhistorical watermark recognition, and pattern recognition and discovery in artwork datasets.

January 6, 2020, Alexandre Guilbaud and Stavros Lazaris (Université Pierre et Marie Curie / CNRS) 
Title: La circulation de l’illustration scientifique au Moyen-Âge et à l’époque moderne
Abstract: Nous vivons entourés d’images. Elles nous portent, nous charment ou nous déçoivent et cela était également le cas, à des degrés différents bien entendu, pour l’homme durant le Moyen Age et l’époque moderne. Comment les images ont-elles façonné sa pensée dans le domaine des sciences et dans quelles mesures en sont-elles représentatives ? Quelle était la nature des illustrations scientifiques et comment les acteurs de ces époques les ont-t-il mises au point et utilisées ? Les périodes médiévale et moderne sont particulièrement propices pour mener une recherche sur la constitution d’une pensée visuelle liée aux savoirs scientifiques. Cet exposé sera l’occasion de présenter un projet de recherche en Humanités numériques visant à contribuer à cette problématique en examinant de quelle façon les développements actuels dans les domaines de l’IA et de la vision artificielle permettent d’envisager des approches nouvelles pour l’analyse historique de la circulation de l’illustration scientifique au cours de ces deux périodes. Nous présenterons à cette occasion les corpus sélectionnés pour cette étude (les manuscrits contenant le Physiologus et le De Materia medica de Dioscoride pour le Moyen Age ; les planches d’histoire naturelle et sciences mathématiques dans le corpus des dictionnaires et encyclopédies au XVIIIe siècle) et montrerons, sur des exemples, comment les modes de circulation qui sont à l’œuvre dans ces corpus appellent notamment le développement de nouvelles techniques, basées sur la reconnaissance des formes et la mise en relation entre textes et images.

January 6, 2020, Jean-Baptiste Camps (ENC) 
Title: Philology, old texts and machine learning
Abstract: Phrase de présentation: I will give an introduction to machine learning techniques applied to old documents (manuscripts) and texts, ranging from text acquisition (e.g. handwritten text recognition) to computational data analysis (e.g. authorship attribution).

February 3, 2020, Emmanuelle Bermès and Jean-Philippe Moreux (BnF) 
Title: From experimentation to community building: AI at the BnF
Abstract: Artificial intelligence has been present at the BnF for more than 10 years, at least in its ‘machine learning’ version, through R&D projects conducted with the image and document analysis community. But we can imagine that the rise and fall of expert systems at the beginning of the 1990s will also have questioned the BnF, as our American colleagues did: ‘Artificial Intelligence and Expert Systems: Will They Change the Library?’ (Linda C. Smith, F. W. Lancaster, University of Illinois, 1992). 
Today, the democratization of deep learning promotes the ability to experiment and carry out in virtual autonomy, but also and above all makes possible interdisciplinary projects where expertise on content, data and processing is required. This conference will be an opportunity to present the results of such a project, dedicated to the visual indexing of Gallica’s iconographic content, to share our feedback and to consider a common dynamic driven by the needs and achievements of the field of digital humanities practice. 
The presentation will place these experiments in the BnF’s overall strategy for services to the researchers, but will also broaden the scope by addressing the overall positioning of libraries with regard to AI.

March 2, 2020, Matteo Valleriani (Technische Universität, Berlin) 
Title: The Sphere. Knowledge System Evolution and the Shared Scientific Identity of Europe 
Abstract: On the basis of the corpus of all early modern printed editions of commentaries on the Sphere of Sacrobosco, the lecture shows how to reconstruct the transformation process—and its mechanisms—undergone by the treatise, and so to explore the evolutionary path, between the fifteenth and the seventeenth centuries, of the scientific system pivoted around cosmological knowledge: the shared scientific identity of Europe. The sources are analyzed on three levels: text, images, and tables. From a methodological point of view the lecture will also show how data are extracted by means of machine learning and analyzed by means of an approach derived from the physics of the complex systems and network theory.

March 30, 2020, Aaron Hershkowitz (Institute for Advanced Study) 
Title: The Cutting Edge of Epigraphy: Applying AI to the Identification of Stonecutters
Abstract: Inscriptions are a vital category of evidence about the ancient world, providing a wealth of information about subject matters and geographical regions outside of the scope of surviving literary texts. However, to be most useful inscriptions need to be situated within a chronological context: the more precise the better. This kind of chronological information can sometimes be gleaned from dating formulae or events mentioned in the inscribed texts, but very often no such guideposts survive. In these cases, epigraphers can attempt to date a given text on a comparative basis with other, firmly-dated inscriptions. This comparative dating can be done on the basis of socio-linguistic patterns or the physical shape of letter forms present in the inscription. In the latter case, a very general date can be achieved on the basis of the changing popularity of particular letter forms and shapes in a particular geographic context, or a more specific date can be achieved if the ‘handwriting’ of a stonecutter can be identified. Such a stonecutter would have a delimited length of activity, so that if any of his inscriptions have a firm date, a range of about thirty years or less can be provided to all other inscriptions made by him. Unfortunately, very few scholars have specialized in the ability to detect stonecutter handwriting, but as was showed by an early attempt (see Panagopoulos, Papaodysseus, Rousopoulos, Dafi, and Tracy 2009, Automatic Writer Identification of Ancient Greek Inscriptions) computer vision analysis has significant promise in this area. The Krateros Project to digitize the epigraphic squeezes of the Institute for Advanced Study is actively working to pursue this line of inquiry, recognizing it as critical for the future of epigraphy generally.

June 4, 2020, Sietske Fransen & Leonardo Impett (Max-Planck-Institut für Kunstgeschichte) 
Title: Print, Code, Data: New Media Disruptions and Scientific Visualization
Abstract: This paper discusses changes in scientific diagramming in response to new media disruptions: the printing press, and online data/research code. In the first case, the role of handwritten documents and the visual forms of scientific diagramming re-align in response to the circulational economics and medial accessibility of the printing press in early modern Europe. In the second, published research code unsettles the principle, common in the second half of the twentieth century, that a peer-reviewed article in computer science ought to outline its methods with enough detail to enable repeatability.
The printing press brought benefits as well as restrictions to the inclusion of diagrams in scientific works. Some of the downsides were that not every printer was able to manufacture separate wood blocks and/or copper plates that could contain the diagram as if hand-drawn. Instead, diagrams were often made entirely out of typeface. On the other hand, the quick spread of the use of printed books in addition to manuscripts, opened new roles for the manuscript as a medium of creativity. In the early days of print, it is therefore in manuscripts that we can find the visualization of scientific processes, which form the background to printed material. 
The information sufficient for ‘replicability’ in computer science (which in the physical sciences has meant ‘formal experimental methodology’, but in computer science is epistemically closer to the research itself) had most often been included in tables, schematizations and heavily-labelled diagrams, sometimes augmented by so-called ‘pseudocode’ (a kind of software caricature, which cannot itself be run on a machine). The inclusion of research code thus dramatically displaces the role of scientific diagrams in machine learning research: from a notational system which ideally contains sufficient information to reproduce an algorithm (akin to electrical circuit diagrams) to a didactic visualization technique (as in schoolbook diagrams of the Carbon Cycle). In Badiou’s (1968) terminology, diagrams shift from symbolic formal systems to synthetic spatializations of non-spatial processes. The relationship between ‘research output’ (as the commodity produced by computer-science research groups) and its constituent components (text, diagram, code, data) is further destabilized by deep learning techniques (which rely on vast amounts of training data) : no longer are algorithms published on their own, but rather trained models, assemblages of both data and software, again shifting the onus of reproducibility (and, therefore, the function of scientific notation). The changed epistemological role of neural network visualizations allows for a far greater formal instability, leading to the rich ecology of visual solutions (Alexnet, VGG, DeepFace) to the problem of notating multidimensional neural network architectures.
By comparing the impact of new media on the use, form and distribution of diagrams in the early modern period, with the impact of code on the role of diagrams in computer science publications, we are opening up a conversation about the influence of new media on science, both in history and in current practice.

June 8, 2020, Antonio Casilli (Paris School of Telecommunications (Telecom Paris)) 
Title: The last mile of inequality: What COVID-19 is doing to labor and automation
Abstract: The ongoing COVID-19 crisis, with it lockdowns, mass unemployment, and increased health risks, has been described as a automation-forcing event, poised to accelerate the introduction of automated processes replacing human workers. Nevertheless, a growing body of literature has emphasized the human contribution to machine learning. Especially platform-based digital labor performed by global crowds of underpaid micro-workers or extracting data from cab-hailing drivers and bike couriers, turns out to play an crucial role. Although the pandemic has been regarded as the triumph of ‘smart work’, telecommuting during periods of lockdown and closures concern only about 25 percents of workers. A class gradient seems to be at play, as platform-assisted telework is common among higher-income brackets, while people on lower rungs of the income ladder are more likely to hold jobs that involve physical proximity, which are deemed essential and cannot be moved online or interrupted. These include two groups of contingent workers performing what can be described as ‘the last mile of logistics’ (delivery, driving, maintenance and other gigs at the end of the supply chain) and the ‘last mile of automation’ (human-in-the-loop tasks such as data preparation, content moderation and algorithm verification). Indeed during lockdown, both logistic and micro-work platforms have reported a rise in activity – with millions signing up to be couriers, drivers, moderators, data trainers. The COVID-19 pandemic has thus given unprecedented visibility to these workers, but without increased social security. Their activities are equally carried out in public spaces, in offices, or from home—yet they generally expose workers to higher health risks with poor pay, no insurance, and no sick leave. Last mile platform workers shoulder a disproportionate share of the risk associated with ensuring economic continuity. Emerging scenarios include use of industrial actions to increase recognition and improve their working conditions. COVID-19 has opened spaces of visibility by organizing workers across Europe, South America, and the US. Since March 2020, Instacart walkouts, Glovo and Deliveroo street rallies, Amazon ‘virtual walkouts’ have started demanding health measures or protesting remuneration cuts.

October 9, 2020, Karine Gentelet (Université du Québec en Outaouais (Canada)) [video
Title:Réflexions sur les processus de décolonisation et souveraineté des données à partir des stratégies numériques et d’IA des Peuples autochtones au Canada
Abstract: Cette présentation portera sur les stratégies numériques développées par les Peuples autochtones pour réaffirmer leur souveraineté en matière d’information et sur la manière dont ils contribuent à la décolonisation des données. Les informations qui représentent les Peuples autochtones sont biaisées du fait de la colonisation et des pratiques systémiques de discrimination informationnelle. Leurs initiatives de souveraineté informationnelle et de décolonisation des données permettent de recueillir des données par et pour eux et donc beaucoup plus précises, diversifiées et représentatives de leurs réalités et de leurs besoins. Les principes développés par les Peuples autochtones témoignent non seulement d’une agence numérique affirmée, mais induisent également un changement de paradigme dû à l’inclusion des connaissances ancestrales et des modes de gouvernance traditionnels. Ils permettent un nouvel équilibre des pouvoirs au sein de l’écosystème numérique.

October 9, 2020, Karine Gentelet (Université du Québec en Outaouais (Canada)) [video
Title:Reflections on the decolonization processes and data sovereignty based on the digital and AI strategies of indigenous peoples in Canada
Abstract: This presentation will focus on the digital strategies developed by Indigenous Peoples to reaffirm their information sovereignty and how they contribute to the decolonization of data. The information that represents Indigenous Peoples is tainted by colonization and systemic practices of informational discrimination. Their initiatives of informational sovereignty and data decolonization allows data that is collected by and for them and therefore much more accurate, diversified and representative of their realities and needs. The principles developed by Indigenous Peoples not only testify to an asserted digital agency but also induce a paradigmatical shift due to the inclusion of ancestral knowledge and traditional modes of governance. It allows a new power balance within the digital ecosystem. (see below for the French version of this abstract)

October 20, 2020, The DHAI team (DHAI) [video
Title:Heads and Tails: When Digital Humanities and Artificial Intelligence Meet.
Abstract: Joint presentation of the DHAI team, which serves as an introduction to this second season of DHAI.

November 24, 2020, Philippe Gambette (Université Paris-Est Marne-la-Vallée) [video
Title:Alignment and text comparison for digital humanities
Abstract: This talk will provide several algorithmic approaches based on alignment or text comparison algorithms, at different scales, with applications in digital humanities. We will present an alignment-based approach for 16th and 17th century French text modernisation and show the impact of this normalisation process on automatic geographical named entity recognition.
We will also show several visualisation techniques which are useful to explore text corpora by highlighting similarities and differences between those texts at different levels. In particular, we will illustrate the use of Sankey diagrams at different levels to align various editions of the same text, such as poetry books by Marceline Desbordes-Valmore published from 1819 to 1830 or Heptameron by Marguerite de Navarre. This visualisation tool can also be used to contrast the most frequent words of two comparable corpora to highlight their differences. We will also illustrate how the use of word trees, built with the TreeCloud software, helps identifying trends in a corpus, by comparing the trees built for subsets of the corpus.
We will finally focus on stemmatology, where the analysed texts are supposed to be derived from a unique initial manuscript. We will describe a tree reconstruction algorithm designed to take linguistic input into account when building a tree describing the history of the manuscripts, as well as a list of observed variants supporting its edges.
Contributors of these works include Delphine Amstutz, Jean-Charles Bontemps, Aleksandra Chaschina, Hilde Eggermont, Raphaël Gaudy, Eleni Kogkitsidou, Gregory Kucherov, Tita Kyriacopoulou, Nadège Lechevrel, Xavier Le Roux, Claude Martineau, William Martinez, Anna-Livia Morand, Jonathan Poinhos, Caroline Trotot and Jean Véronis.

December 15th, 2020, Pierre-Carl Langlais (Paris-IV Sorbonne) 
Title: Redefining the cultural history of newspapers with artificial intelligence: the experiments of the Numapresse project
Abstract: During the last twenty years, libraries developed massive digitization program. While this shift has significantly enhanced the accessibility cultural digital archives, it has also opened up unprecedented research opportunities. Innovative projects have recently attempted to apply large scale quantitative methods borrowed from computer science to tackle ambitious historical issues. The Numapresse project proposes a new cultural history of French newspaper from 1800, notably through the distant reading of detailed digitization outputs from the French National Library and other partners. It has recently become a pilot project of the future data labs of the French National Library. This presentation features a series of ‘operationalization’ of core concepts of the cultural history of the news in the context of a continuous methodological dialog with statistics, data science, and machine learning. Classic methods of text mining have been supplemented with spatial analysis of pages to deal with the complex and polyphonic editorial structures of newspapers in order to retrieve specific formats like signatures or news dispatch. The project has created a library of ‘genre models’ which made it possible to retrieve large collections of texts belong to leading newspaper genres in different historical settings. This approach has been extended to large collections of newspaper images through the retraining of deep learning models. The automated identification of text and image reprints also makes it possible to map the transforming ecosystem of French networks and its connection to other publication formats. The experimental work of Numapresse aims to foster a modeling ecosystem among research and library communities working on cultural heritage archives.

January 19th, 2021, Daniel Stoekl (École Pratique des Hautes Études) 
Title: De la transcription automatique de manuscrits hébreux médiévaux via l’édition scientifique à l’analyse de l’intertextualité : outils et praxis autour d’eScriptorium
Abstract: Following a brief introduction to our open-source HTR infrastructure eScriptorium cum kraken I will demonstrate its application to the automatic layout segmentation, handwritten textsegmentation and paleography of Hebrew manuscripts. Using its rich (but still growing) internal functionalities and API as well as a number of external tools (Decker et alii 2011, Shmidman et alii 2018 and my own), I will deal with automatic text identification, alignment and crowdsourcing(Kuflik et al 2019, Wecker et al 2019) and how these procedures can be used to create different types of generic models for segmentation and transcription. I will show first ideas for automatically passing from a document hierarchy resulting from HTR to a text oriented model with integrated interlinear and marginal additions that can be displayed in tools like TEI-Publisher. While the methods presented are generic and applicable to most languages and scripts, special attention will be given to problems evolving from dealing with non-Latin scripts, RTL and morphologically rich languages.  
Bibliography:
– Dekker, R. H., Middell, G.: Computer-Supported Collation with CollateX: Managing Textual Variance in an Environment with Varying Requirements. Supporting Digital Humanities 2011. University of Copenhagen, Denmark (2011).  
– Kuflik, T. M. Lavee, A. Ohali, V. Raziel-Kretzmer, U. Schor, A. Wecker, E. Lolli, P. Signoret, D. Stökl Ben Ezra (2019) ‘Tikkoun Sofrim – Combining HTR and Crowdsourcing for Automated Transcription of Hebrew Medieval Manuscripts’, DH2019.  
– Lapin, Hayim and Daniel Stökl Ben Ezra, eRabbinica 
– Meier, Wolfgang, Magdalena Turska, TEI Processing Model Toolbox: Power To The Editor. DH 2016: 936  
– Meier, Wolfgang, Turska, Magdalena, TEI-Publisher.  
– Shmidman, A., Koppel, M., Porat, E.: Identification of parallel passages across a large hebrew/aramaic corpus. Journal of Data Mining and Digital Humanities, 2018  
– Wecker, A. V. Raziel-Kretzmer, U. Schor, T. Kuflik, A. Ohali, D. Elovits, M. Lavee, P. Stevenson, D. Stökl Ben Ezra, (2019) ‘Tikkoun Sofrim: A WebApp for Personalization and Adaptation of Crowdsourcing Transcriptions’, UMAP’19 Adjunct (Larnaca. New York: ACM Press)

February 16, 2021, Marc Smith, Oumayma Bounou (École nationale des chartes) 
Title: Filigranes pour tous: Historical watermarks recognition
Abstract: We developed a web application to identify a watermark from a simple smartphone photograph by matching it to a corresponding watermark design from a database with more than 16000 designs. After describing the deep learning based recognition method that was built upon the approach of Shen et al, we will present our web application which not only can be used for watermark recognition, but can also serve as a crowdsourcing platform aiming to be enriched by its users. 
Bibliography:
– Xi Shen, Ilaria Pastrolin, Oumayma Bounou, Spyros Gidaris, Marc Smith, Olivier Poncet, Mathieu Aubry, Large-Scale Historical Watermark Recognition: dataset and a new consistency-based approach, ICPR 2020
– Oumayma Bounou, Tom Monnier, Ilaria Pastrolin, Xi Shen, Christine Bénévent, Marie-Françoise Limon-Bonnet, François Bougard, Mathieu Aubry, Marc Smith, Olivier Poncet, Pierre-Guillaume Raverdy, A Web Application for Watermark Recognition, JDMDH 2020
Links:
– Humanity/Filigranes pour tous
– Computer Vision
– Web application

March 16, 2021, 12:00-14:00, room (online, link).
Stavros Lazaris, Alexandre Guilbaud, Tom Monnier et Mathieu Aubry ()
Title: [voir ci dessous]
Abstract: Stavros Lazaris (CNRS, UMR Orient & Méditerranée): Voir, c’est savoir. Nous vivons entourés d’images. Elles nous portent, nous charment ou nous déçoivent et cela était également le cas, à des degrés différents bien entendu, pour l’homme médiéval. Comment les images ont-elles façonné sa pensée ? Quelle était leur nature et comment l’homme médiéval pouvait les utiliser ? La période médiévale est particulièrement propice pour mener une recherche sur la constitution d’une pensée visuelle liée aux savoirs scientifiques grâce à l’apport de l’IA. Cette présentation sera l’occasion de passer en revue un projet de recherche en Humanités numériques basé sur la reconnaissance de formes et les solutions que peut apporter l’IA dans les recherches actuelles en représentations visuelles médiévales.
Alexandre Guilbaud (Sorbonne Université, Institut de mathématiques de Jussieu – Paris Rive Gauche, Institut des sciences du calcul et des données): L’IA et la manufacture des planches au XVIIIe siècle. Nombre de dictionnaires, d’encyclopédies et de traités de la période moderne renferment des planches illustrant les textes avec plus ou moins d’autonomie selon les cas. Ces planches sont parfois – et même souvent, dans les dictionnaires et encyclopédies du XVIIIe siècle – le fruit d’emprunts partiels et d’opérations de recomposition à partir de sources antérieures possiblement diverses. Identifier ces sources puis reconstituer à partir d’elles le mode de fabrication de ces planches (ce que l’on appelle leur manufacture) permet de caractériser les opérations de mise à jour et d’adaptation des contenus et modalités de représentations iconographiques effectuées par leurs auteurs, puis de contribuer à caractériser, à plus large échelle, certains processus de circulation des savoirs au travers de l’image dans ce type de corpus. Nous donnerons des exemples concrets de cette problématique de recherche actuelle et des nouvelles potentialités ouvertes par l’IA sur l’exemple de l’Encyclopédie de Diderot et de D’Alembert (1751-1772) et de l’édition critique actuellement conduite dans le cadre de l’ENCCRE.
Tom Monnier et Mathieu Aubry (Imagine, LIGM, École des Ponts ParisTech): Extraction et mise en correspondances automatique d’images. Nous présenterons les principaux défis qu’il faudrait surmonter pour pouvoir identifier de manière complètement automatique les reprises d’image. Le premier de ces défis que nous détaillerons est l’extraction automatique d’image dans des documents historiques. Nous discuterons ensuite les problèmes liés à la mise en correspondance entre les images extraites.

April 13, 2021, 12:00-14:00, room online (online, link).
Christophe Tuffery and Grazia Nicosia (Inrap, EUR Paris Seine Université Humanités)
Title: Patrimoine et humanités numériques. Regards croisés entre archéologie et conservation-restauration des biens culturels
Abstract: L’utilisation des technologies numériques a fortement impacté les données patrimoniales aussi bien issues de fouilles archéologiques que celles résultant dun suivi de l’état des œuvres dans un contexte muséal. Deux professionnels du patrimoine, sur la base de recherches en cours, montreront l’incidence du numérique sur le processus de fouille, de documentation et de conservation matérielle du patrimoine. Ce dialogue abordera sous les angles scientifiques et opérationnels les conditions de production et d’utilisation des données relatives au patrimoine archéologique et muséal.