Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du $\protect \textup {XIX}^{\protect \textup {e}}$ siècle : application aux métiers de la photographie
Revue Ouverte d'Intelligence Artificielle, Post-actes de la conférence Ingénierie des Connaissances (IC 2021-2022-2023), Volume 6 (2025) no. 1-2, pp. 179-200

Les annuaires professionnels anciens, édités à un rythme soutenu dans de nombreuses villes européennes tout au long des xixe et xxe siècles, forment un corpus de sources unique par son volume et la possibilité qu’ils donnent de suivre les transformations urbaines à travers le prisme des activités professionnelles des habitants, de l’échelle individuelle jusqu’à celle de la ville entière. L’analyse spatio-temporelle d’un type de commerces au travers des entrées d’annuaires demande cependant un travail considérable de recensement, de transcription et de recoupement manuels. Pour pallier cette difficulté, cet article propose une approche automatique pour construire et visualiser un graphe de connaissances géohistorique des commerces figurant dans des annuaires anciens. L’approche est testée sur des annuaires du commerce parisien du xixe siècle allant de 1798 à 1914, sur le cas des métiers de la photographie.

Business directories have been published at a high frequency in many European cities throughout the xixth and xxth centuries. This corpus of historical sources is unique because of its volume and the opportunity it gives to follow urban transformations through the professional activities of the inhabitants, from the individual scale to that of the entire city. However, the spatio-temporal analysis of businesses of a given type through directory entries requires a considerable amount of manual work. To overcome this difficulty, this article proposes an automatic approach to construct and visualise a geohistorical knowledge graph of businesses listed in old directories. The approach is tested on xixth century Parisian trade directories from 1798 to 1914, on the case of photographers.

Publié le :
DOI : 10.5802/roia.98
Mots-clés : Graphe de connaissances géohistorique, annuaires anciens, reconnaissance et résolution d’entités nommées, bruit OCR, visualisation spatio-temporelle.
Keywords: Geohistorical knowledge graph, old directories, named entity recognition and linking, OCR noise, spatio-temporal visualization.

Solenn Tual 1 ; Nathalie Abadie 1 ; Bertrand Duménieu 2 ; Joseph Chazalon 3 ; Edwin Carlinet 3

1 LASTIG, Université Gustave Eiffel, IGN-ENSG, 73 Avenue de Paris, 94165 Saint-Mandé Cedex (France)
2 Centre de Recherches Historiques, EHESS, 54 Boulevard Raspail, 75006 Paris (France)
3 LRE, EPITA, 14-16 rue Voltaire, 94270 Le Kremlin-Bicêtre (France)
Licence : CC-BY 4.0
Droits d'auteur : Les auteurs conservent leurs droits
@article{ROIA_2025__6_1-2_179_0,
     author = {Solenn Tual and Nathalie Abadie and Bertrand Dum\'enieu and Joseph Chazalon and Edwin Carlinet},
     title = {Cr\'eation d{\textquoteright}un graphe de connaissances g\'eohistorique \`a partir d{\textquoteright}annuaires du commerce parisien du $\protect \textup {XIX}^{\protect \textup {e}}$ si\`ecle~: application aux m\'etiers de la photographie},
     journal = {Revue Ouverte d'Intelligence Artificielle},
     pages = {179--200},
     year = {2025},
     publisher = {Association pour la diffusion de la recherche francophone en intelligence artificielle},
     volume = {6},
     number = {1-2},
     doi = {10.5802/roia.98},
     language = {fr},
     url = {https://roia.centre-mersenne.org/articles/10.5802/roia.98/}
}
TY  - JOUR
AU  - Solenn Tual
AU  - Nathalie Abadie
AU  - Bertrand Duménieu
AU  - Joseph Chazalon
AU  - Edwin Carlinet
TI  - Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du $\protect \textup {XIX}^{\protect \textup {e}}$ siècle : application aux métiers de la photographie
JO  - Revue Ouverte d'Intelligence Artificielle
PY  - 2025
SP  - 179
EP  - 200
VL  - 6
IS  - 1-2
PB  - Association pour la diffusion de la recherche francophone en intelligence artificielle
UR  - https://roia.centre-mersenne.org/articles/10.5802/roia.98/
DO  - 10.5802/roia.98
LA  - fr
ID  - ROIA_2025__6_1-2_179_0
ER  - 
%0 Journal Article
%A Solenn Tual
%A Nathalie Abadie
%A Bertrand Duménieu
%A Joseph Chazalon
%A Edwin Carlinet
%T Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du $\protect \textup {XIX}^{\protect \textup {e}}$ siècle : application aux métiers de la photographie
%J Revue Ouverte d'Intelligence Artificielle
%D 2025
%P 179-200
%V 6
%N 1-2
%I Association pour la diffusion de la recherche francophone en intelligence artificielle
%U https://roia.centre-mersenne.org/articles/10.5802/roia.98/
%R 10.5802/roia.98
%G fr
%F ROIA_2025__6_1-2_179_0
Solenn Tual; Nathalie Abadie; Bertrand Duménieu; Joseph Chazalon; Edwin Carlinet. Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du $\protect \textup {XIX}^{\protect \textup {e}}$ siècle : application aux métiers de la photographie. Revue Ouverte d'Intelligence Artificielle, Post-actes de la conférence Ingénierie des Connaissances (IC 2021-2022-2023), Volume 6 (2025) no. 1-2, pp. 179-200. doi: 10.5802/roia.98

[1] Nathalie Abadie; Stéphane Bacciochi; Edwin Carlinet; Joseph Chazalon; Pascal Cristofoli; Bertrand Duménieu; Julien Perret A Dataset of French Trade Directories from the 19th Century, A Dataset of French Trade Directories from the 19th Century (FTD) (1.0.0) [Data set]. Document Analysis Systems (15th IAPR International Workshop on) (DAS), La Rochelle, France, Zenodo (2022) | DOI

[2] Nathalie Abadie; Edwin Carlinet; Joseph Chazalon; Bertrand Duménieu A Benchmark of Named Entity Recognition Approaches in Historical Documents Application to 19th Century French Directories, Document Analysis Systems (DAS) (S. Uchida; E. Barney; V. Eglin, eds.) (Document Analysis Systems. DAS 2022.), Springer, Cham (2022) no. 13237 | HAL | DOI

[3] Nacira Abbas; Jérôme David; Amedeo Napoli Linkex : A Tool for Link Key Discovery Based on Pattern Structures, ICFCA 2019 - Workshop on Applications and tools of formal concept analysis, Frankfurt, Germany (2019), pp. 33-38 | HAL

[4] Thilo N.H. Albers; Kalle Kappner Perks and pitfalls of city directories as a micro-geographic data source, Explorations in Economic History, Volume 87 (2023), 101476 https://www.sciencedirect.com/... | DOI

[5] Sofia Ares Oliveira; Benoit Seguin; Frederic Kaplan dhSegment : A Generic Deep-Learning Approach for Document Segmentation, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) (2018), pp. 7-12 | DOI

[6] Samuel Bell; Thomas Marlow; Kai Wombacher; Anina Hitt; Neev Parikh; Andras Zsom; Scott Frickel Automated data extraction from historical city directories : The rise and fall of mid-century gas stations in Providence, RI, PLOS ONE, Volume 15 (2020) no. 8, e0220219 https://journals.plos.org/... | DOI

[7] Camille Bernard; Marlène Villanova-Oliver; Jérôme Gensel Theseus : A framework for managing knowledge graphs about geographical divisions and their evolution, Transactions in GIS, Volume 26 (2022) no. 8, pp. 3202-3224 | DOI

[8] Galal M. Binmakhashen; Sabri A. Mahmoud Document layout analysis : a comprehensive survey, ACM Computing Surveys (CSUR), Volume 52 (2019) no. 6, 109, 36 pages | DOI

[9] Mélodie Boillet; Christopher Kermorvant; Thierry Paquet Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks, 25th International Conference on Pattern Recognition (ICPR) (2021), pp. 2134-2141 | DOI

[10] Lucas Bourel; Nathalie Jane Hernandez; Nathalie Aussenac-Gilles; William Charles HHT : une ontologie modulaire pour représenter l’évolution des territoires en Histoire, 33e Journées Francophones d’Ingénierie des Connaissances (IC), AFIA (2022), pp. 131-136

[11] Carmen Brando; Frédérique Mélanie-Becquet Annuaires de propriétaires et des propriétés de Paris (1898, 1903, 1913, 1923)  : du papier à la carte, 2e Journée SoDUCo-BNF, Paris, France (2022)

[12] Thomas M. Breuel The OCRopus open source OCR system, Document Recognition and Retrieval XV, Volume 6815, Int. Soc. for Optics and Photonics (2008) | DOI

[13] Rémi Cura; Bertrand Duménieu; Nathalie Abadie; Benoit Costes; Julien Perret; Maurizio Gribaudi Historical collaborative geocoding, ISPRS International Journal of Geo-Information, Volume 7 (2018) no. 7, p. 262 | DOI

[14] Félicie De Maupeou; Léa Saint-Raymond Les “marchands de tableaux” dans le Bottin du commerce  : une approche globale du marché de l’art à Paris entre 1815 et 1955, Artl@ s Bulletin, Volume 2 (2013) no. 2, 7

[15] Bertrand Duménieu Un système d’information géographique pour le suivi d’objets historiques urbains à travers l’espace et le temps, Ph. D. Thesis, École des Hautes Etudes en Sciences Sociales (2015)

[16] Bertrand Duménieu; Edwin Carlinet; Nathalie Abadie; Joseph Chazalon Entry Separation using a Mixed Visual and Textual Language Model : Application to 19th century French Trade Directories (2023) (https://arxiv.org/abs/2302.08948)

[17] Marc Durand De l’image fixe à l’image animée  : 1820-1910. Tome 2  : actes des notaires de Paris pour servir à l ’histoire des photographes et de la photographie, Archives nationales, Pierrefitte-sur-Seine, 2015 no. 2

[18] Shancheng Fang; Hongtao Xie; Yuxin Wang; Zhendong Mao; Yongdong Zhang Read Like Humans : Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 7098-7107 | DOI

[19] Del Mondo Géraldine Un modèle de graphe spatio-temporel pour représenter l’évolution d’entités géographiques, Ph. D. Thesis, Université de Brest (2011)

[20] Jaekyu Ha; Robert M. Haralick; Ihsin T. Phillips Recursive XY-cut using bounding boxes of connected components, Proceedings of 3rd International Conference on Document Analysis and Recognition, Volume 2, IEEE (1995), pp. 952-955 | DOI

[21] Sepp Hochreiter; Jürgen Schmidhuber Long short-term memory, Neural computation, Volume 9 (1997) no. 8, pp. 1735-1780 https://ieeexplore.ieee.org/... | DOI

[22] Robert Isele; Anja Jentzsch; Christian Bizer Efficient Multidimensional Blocking for Link Discovery without losing Recall, Proceedings of the 14th International Workshop on the Web and Databases 2011, WebDB 2011, Athens, Greece, June 12, 2011 (2011) | DOI

[23] Tomi Kauppinen; Jari Väätäinen; Eero Hyvönen Creating and using geospatial ontology time series in a semantic cultural heritage portal, 5th European Semantic Web Conference (ESWC), Tenerife, Canary Islands, Spain, Springer (2008), pp. 110-123 | DOI

[24] Benjamin Kiessling Kraken-an universal text recognizer for the humanities, Éd., Actes de la conférence Digital Humanities, Utrecht, The Netherlands (2019) | DOI

[25] Geewook Kim; Teakgyu Hong; Moonbin Yim; JeongYeon Nam; Jinyoung Park; Jinyeong Yim; Wonseok Hwang; Sangdoo Yun; Dongyoon Han; Seunghyun Park OCR-Free Document Understanding Transformer, Computer Vision – ECCV 2022 (Shai Avidan; Gabriel Brostow; Moustapha Cissé; Giovanni Maria Farinella; Tal Hassner, eds.), Springer Nature, Cham (2022), pp. 498-517 | DOI

[26] Martin Kišš; Karel BeneÅ¡; Michal HradiÅ¡ AT-ST : Self-training Adaptation Strategy for OCR in Domains with Limited Transcriptions, Document Analysis and Recognition – ICDAR 2021, Volume 12824, Springer International Publishing, 2021, pp. 463-477 (Accessed 2024-03-29) | DOI

[27] Oldřich Kodym; Michal Hradiš Page Layout Analysis System for Unconstrained Historic Documents, Document Analysis and Recognition – ICDAR 2021 (Josep Lladós; Daniel Lopresti; Seiichi Uchida, eds.), Volume 12822, Springer International Publishing, Cham, 2021, pp. 492-506 https://link.springer.com/... (Accessed 2024-03-29) | DOI

[28] Jan Kohút; Michal Hradiš TS-Net : OCR Trained to Switch Between Text Transcription Styles, Document Analysis and Recognition – ICDAR 2021 (Josep Lladós; Daniel Lopresti; Seiichi Uchida, eds.), Springer Int. Publishing (2021), pp. 478-493 | DOI

[29] Aurélie Leborgne; Adrien Meyer; Henri Giraud; Florence Le Ber; Stella Marc-Zwecker Un graphe spatio-temporel pour modéliser l’évolution de parcelles agricoles, Conférence internationale francophone en analyse spatiale et géomatique SAGEO (2019) | DOI

[30] Chenliang Li; Bin Bi; Ming Yan; Wei Wang; Songfang Huang; Fei Huang; Luo Si StructuralLM : Structural Pre-training for Form Understanding, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online (2021), pp. 6309-6318 https://aclanthology.org/2021.acl-long.493 | DOI

[31] Jing Li; Aixin Sun; Jianglei Han; Chenliang Li A Survey on Deep Learning for Named Entity Recognition, IEEE Transactions on Knowledge and Data Engineering, Volume 34 (2020) no. 1, pp. 50-70 | DOI

[32] Minghao Li; Tengchao Lv; Jingye Chen; Lei Cui; Yijuan Lu; Dinei Florencio; Cha Zhang; Zhoujun Li; Furu Wei TrOCR : Transformer-Based Optical Character Recognition with Pre-trained Models, Proceedings of the AAAI Conference on Artificial Intelligence, Volume 37 (2023) no. 11, pp. 13094-13102 https://ojs.aaai.org/... | DOI

[33] Alireza Mansouri; Lilly Suriani Affendey; Ali Mamat Named entity recognition approaches, TAL, Volume 52 (2008) no. 1, p. 339–344 | DOI

[34] Louis Martin; Benjamin Muller; Pedro Javier Ortiz Suárez; Yoann Dupont; Laurent Romary; Éric Villemonte de la Clergerie; Djamé Seddah; Benoît Sagot CamemBERT : a Tasty French Language Model, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 7203-7219 (Accessed 2021-09-20) | arXiv | DOI

[35] Denis Maurel; Nathalie Friburger; Jean-Yves Antoine; Iriss Eshkol-Taravella; Damien Nouvel Casen : a transducer cascade to recognize french named entities, TAL, Volume 52 (2011) no. 1, pp. 69–-96 | arXiv | DOI

[36] Andrew McCallum; Wei Li Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons, Proceedings of the seventh conference on Computational Natural language learning at HLT-NAACL 2003 (2003), pp. 188-191

[37] David Nadeau; Satoshi Sekine A survey of named entity recognition and classification, Lingvisticae Investigationes, Volume 30 (2007) no. 1, p. 3-–26

[38] George Nagy; Sharad C. Seth Hierarchical representation of optically scanned documents, Seventh International Conference on Pattern Recognition, Proceedings, Volume 1, Montreal, Canada (1984), pp. 347-349 | DOI

[39] Axel-Cyrille Ngonga Ngomo Orchid–reduction-ratio-optimal computation of geo-spatial distances for link discovery, The Semantic Web -– ISWC 2013, Sydney, Australia, Springer (2013), pp. 395-410 | DOI

[40] Joel Nothman; Nicky Ringland; Will Radford; Tara Murphy; James R. Curran Learning multilingual named entity recognition from Wikipedia, Artificial Intelligence, Volume 194 (2013), pp. 151-175 https://linkinghub.elsevier.com/... (Accessed 2024-04-01) | DOI

[41] Damien Nouvel; Jean-Yves Antoine; Nathalie Friburger; Arnaud Soulet Recognizing Named Entities using Automatically Extracted Transduction Rules, 5th Language and Technology Conference, Poznan, Poland (2011), pp. 136-140 | HAL | DOI

[42] Lawrence O’Gorman The document spectrum for page layout analysis, IEEE Transactions on pattern analysis and machine intelligence, Volume 15 (1993) no. 11, pp. 1162-1173 | DOI | MR

[43] Joan Puigcerver Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition ?, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Volume 01, Kyoto,Japan (2017), pp. 67-72 | HAL | DOI

[44] François Scharffe; Alfio Ferrara; Andriy Nikolov Data linking for the semantic web, International Journal on Semantic Web and Information Systems, Volume 7 (2011) no. 3, pp. 46-76 | DOI

[45] Zejiang Shen; Ruochen Zhang; Melissa Dell; Benjamin Charles Germain Lee; Jacob Carlson; Weining Li Layoutparser : A unified toolkit for deep learning based document image analysis, Document Analysis and Recognition–ICDAR 2021 : 16th International Conference, Proceedings, Part I 16, Lausanne, Switzerland, Springer (2021), pp. 131-146 | DOI

[46] Willington Siabato; Christophe Claramunt; Sergio Ilarri; Miguel Ángel Manso-Callejo A survey of modelling trends in temporal GIS, ACM Computing Surveys (CSUR), Volume 51 (2018) no. 2, pp. 1-41 | DOI

[47] Karen Simonyan; Andrew Zisserman Very Deep Convolutional Networks for Large-Scale Image Recognition, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (Yoshua Bengio; Yann LeCun, eds.) (2015) | arXiv | DOI

[48] Ray Smith An overview of the Tesseract OCR engine, Int. Conf. on Doc. Analysis and Recognition, Volume 2, IEEE (2007), pp. 629-633 | DOI

[49] Phaisarn Sutheebanjard; Wichian Premchaiswadi A modified recursive XY-cut algorithm for solving block ordering problems, 2nd International Conference on Computer Engineering and Technology, Volume 3, IEEE (2010) | arXiv | DOI

[50] Danai Symeonidou; Vincent Armant; Nathalie Pernelle; Fatiha Saïs SAKey : Scalable Almost Key Discovery in RDF Data, Proceedings of the 13th International Semantic Web Conference, ISWC 2014 (The Semantic Web – ISWC 2014), Volume Lecture Notes in Computer Science, Springer Verlag, Riva del Garda, Italy (2014) no. 8796, pp. 33-49 | HAL | DOI

[51] Solenn Tual; Nathalie Abadie; Joseph Chazalon; Bertrand Duménieu; Edwin Carlinet A Benchmark of Nested Named Entity Recognition Approaches in Historical Structured Documents, Document Analysis and Recognition - ICDAR 2023, Springer Nature Switzerland, Cham (2023), pp. 115-131 | DOI

[52] Solenn Tual; Nathalie Abadie; Bertrand Duménieu; Joseph Chazalon; Edwin Carlinet Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du xixe siècle  : application aux métiers de la photographie, 34e Journées francophones d’Ingénierie des Connaissances (IC 2023)@ Plate-Forme Intelligence Artificielle (PFIA 2023), Strasbourg (2023) | HAL | DOI

[53] Diego Vinasco-Alvarez; John Samuel; Sylvie Servigne; Gilles Gesquière Towards limiting semantic data loss in 4D urban data semantic graph generation, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume 8 (2021), pp. 37-44 | DOI

[54] Christoph Wick; Christian Reul; Frank Puppe Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition, Digital Humanities Quarterly, Volume 14 (2020) no. 2 | DOI

[55] Yiheng Xu; Minghao Li; Lei Cui; Shaohan Huang; Furu Wei; Ming Zhou LayoutLM : Pre-training of text and layout for document image understanding, Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020), pp. 1192-1200 | DOI

Cité par Sources :