Les annuaires professionnels anciens, édités à un rythme soutenu dans de nombreuses villes européennes tout au long des xixe et xxe siècles, forment un corpus de sources unique par son volume et la possibilité qu’ils donnent de suivre les transformations urbaines à travers le prisme des activités professionnelles des habitants, de l’échelle individuelle jusqu’à celle de la ville entière. L’analyse spatio-temporelle d’un type de commerces au travers des entrées d’annuaires demande cependant un travail considérable de recensement, de transcription et de recoupement manuels. Pour pallier cette difficulté, cet article propose une approche automatique pour construire et visualiser un graphe de connaissances géohistorique des commerces figurant dans des annuaires anciens. L’approche est testée sur des annuaires du commerce parisien du xixe siècle allant de 1798 à 1914, sur le cas des métiers de la photographie.
Business directories have been published at a high frequency in many European cities throughout the xixth and xxth centuries. This corpus of historical sources is unique because of its volume and the opportunity it gives to follow urban transformations through the professional activities of the inhabitants, from the individual scale to that of the entire city. However, the spatio-temporal analysis of businesses of a given type through directory entries requires a considerable amount of manual work. To overcome this difficulty, this article proposes an automatic approach to construct and visualise a geohistorical knowledge graph of businesses listed in old directories. The approach is tested on xixth century Parisian trade directories from 1798 to 1914, on the case of photographers.
Keywords: Geohistorical knowledge graph, old directories, named entity recognition and linking, OCR noise, spatio-temporal visualization.
Solenn Tual 1 ; Nathalie Abadie 1 ; Bertrand Duménieu 2 ; Joseph Chazalon 3 ; Edwin Carlinet 3
CC-BY 4.0
@article{ROIA_2025__6_1-2_179_0,
author = {Solenn Tual and Nathalie Abadie and Bertrand Dum\'enieu and Joseph Chazalon and Edwin Carlinet},
title = {Cr\'eation d{\textquoteright}un graphe de connaissances g\'eohistorique \`a partir d{\textquoteright}annuaires du commerce parisien du $\protect \textup {XIX}^{\protect \textup {e}}$ si\`ecle~: application aux m\'etiers de la photographie},
journal = {Revue Ouverte d'Intelligence Artificielle},
pages = {179--200},
year = {2025},
publisher = {Association pour la diffusion de la recherche francophone en intelligence artificielle},
volume = {6},
number = {1-2},
doi = {10.5802/roia.98},
language = {fr},
url = {https://roia.centre-mersenne.org/articles/10.5802/roia.98/}
}
TY - JOUR
AU - Solenn Tual
AU - Nathalie Abadie
AU - Bertrand Duménieu
AU - Joseph Chazalon
AU - Edwin Carlinet
TI - Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du $\protect \textup {XIX}^{\protect \textup {e}}$ siècle : application aux métiers de la photographie
JO - Revue Ouverte d'Intelligence Artificielle
PY - 2025
SP - 179
EP - 200
VL - 6
IS - 1-2
PB - Association pour la diffusion de la recherche francophone en intelligence artificielle
UR - https://roia.centre-mersenne.org/articles/10.5802/roia.98/
DO - 10.5802/roia.98
LA - fr
ID - ROIA_2025__6_1-2_179_0
ER -
%0 Journal Article
%A Solenn Tual
%A Nathalie Abadie
%A Bertrand Duménieu
%A Joseph Chazalon
%A Edwin Carlinet
%T Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du $\protect \textup {XIX}^{\protect \textup {e}}$ siècle : application aux métiers de la photographie
%J Revue Ouverte d'Intelligence Artificielle
%D 2025
%P 179-200
%V 6
%N 1-2
%I Association pour la diffusion de la recherche francophone en intelligence artificielle
%U https://roia.centre-mersenne.org/articles/10.5802/roia.98/
%R 10.5802/roia.98
%G fr
%F ROIA_2025__6_1-2_179_0
Solenn Tual; Nathalie Abadie; Bertrand Duménieu; Joseph Chazalon; Edwin Carlinet. Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du $\protect \textup {XIX}^{\protect \textup {e}}$ siècle : application aux métiers de la photographie. Revue Ouverte d'Intelligence Artificielle, Post-actes de la conférence Ingénierie des Connaissances (IC 2021-2022-2023), Volume 6 (2025) no. 1-2, pp. 179-200. doi: 10.5802/roia.98
[1] A Dataset of French Trade Directories from the 19th Century, A Dataset of French Trade Directories from the 19th Century (FTD) (1.0.0) [Data set]. Document Analysis Systems (15th IAPR International Workshop on) (DAS), La Rochelle, France, Zenodo (2022) | DOI
[2] A Benchmark of Named Entity Recognition Approaches in Historical Documents Application to 19th Century French Directories, Document Analysis Systems (DAS) (S. Uchida; E. Barney; V. Eglin, eds.) (Document Analysis Systems. DAS 2022.), Springer, Cham (2022) no. 13237 | HAL | DOI
[3] Linkex : A Tool for Link Key Discovery Based on Pattern Structures, ICFCA 2019 - Workshop on Applications and tools of formal concept analysis, Frankfurt, Germany (2019), pp. 33-38 | HAL
[4] Perks and pitfalls of city directories as a micro-geographic data source, Explorations in Economic History, Volume 87 (2023), 101476 https://www.sciencedirect.com/... | DOI
[5] dhSegment : A Generic Deep-Learning Approach for Document Segmentation, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) (2018), pp. 7-12 | DOI
[6] Automated data extraction from historical city directories : The rise and fall of mid-century gas stations in Providence, RI, PLOS ONE, Volume 15 (2020) no. 8, e0220219 https://journals.plos.org/... | DOI
[7] Theseus : A framework for managing knowledge graphs about geographical divisions and their evolution, Transactions in GIS, Volume 26 (2022) no. 8, pp. 3202-3224 | DOI
[8] Document layout analysis : a comprehensive survey, ACM Computing Surveys (CSUR), Volume 52 (2019) no. 6, 109, 36 pages | DOI
[9] Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks, 25th International Conference on Pattern Recognition (ICPR) (2021), pp. 2134-2141 | DOI
[10] HHT : une ontologie modulaire pour représenter l’évolution des territoires en Histoire, 33e Journées Francophones d’Ingénierie des Connaissances (IC), AFIA (2022), pp. 131-136
[11] Annuaires de propriétaires et des propriétés de Paris (1898, 1903, 1913, 1923) : du papier à la carte, 2e Journée SoDUCo-BNF, Paris, France (2022)
[12] The OCRopus open source OCR system, Document Recognition and Retrieval XV, Volume 6815, Int. Soc. for Optics and Photonics (2008) | DOI
[13] Historical collaborative geocoding, ISPRS International Journal of Geo-Information, Volume 7 (2018) no. 7, p. 262 | DOI
[14] Les “marchands de tableaux” dans le Bottin du commerce : une approche globale du marché de lâart à Paris entre 1815 et 1955, Artl@ s Bulletin, Volume 2 (2013) no. 2, 7
[15] Un système d’information géographique pour le suivi d’objets historiques urbains à travers l’espace et le temps, Ph. D. Thesis, Ãcole des Hautes Etudes en Sciences Sociales (2015)
[16] Entry Separation using a Mixed Visual and Textual Language Model : Application to 19th century French Trade Directories (2023) (https://arxiv.org/abs/2302.08948)
[17] De l’image fixe à l’image animée : 1820-1910. Tome 2 : actes des notaires de Paris pour servir à l ’histoire des photographes et de la photographie, Archives nationales, Pierrefitte-sur-Seine, 2015 no. 2
[18] Read Like Humans : Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), pp. 7098-7107 | DOI
[19] Un modèle de graphe spatio-temporel pour représenter lâévolution dâentités géographiques, Ph. D. Thesis, Université de Brest (2011)
[20] Recursive XY-cut using bounding boxes of connected components, Proceedings of 3rd International Conference on Document Analysis and Recognition, Volume 2, IEEE (1995), pp. 952-955 | DOI
[21] Long short-term memory, Neural computation, Volume 9 (1997) no. 8, pp. 1735-1780 https://ieeexplore.ieee.org/... | DOI
[22] Efficient Multidimensional Blocking for Link Discovery without losing Recall, Proceedings of the 14th International Workshop on the Web and Databases 2011, WebDB 2011, Athens, Greece, June 12, 2011 (2011) | DOI
[23] Creating and using geospatial ontology time series in a semantic cultural heritage portal, 5th European Semantic Web Conference (ESWC), Tenerife, Canary Islands, Spain, Springer (2008), pp. 110-123 | DOI
[24] Kraken-an universal text recognizer for the humanities, Éd., Actes de la conférence Digital Humanities, Utrecht, The Netherlands (2019) | DOI
[25] OCR-Free Document Understanding Transformer, Computer Vision – ECCV 2022 (Shai Avidan; Gabriel Brostow; Moustapha Cissé; Giovanni Maria Farinella; Tal Hassner, eds.), Springer Nature, Cham (2022), pp. 498-517 | DOI
[26] AT-ST : Self-training Adaptation Strategy for OCR in Domains with Limited Transcriptions, Document Analysis and Recognition â ICDAR 2021, Volume 12824, Springer International Publishing, 2021, pp. 463-477 (Accessed 2024-03-29) | DOI
[27] Page Layout Analysis System for Unconstrained Historic Documents, Document Analysis and Recognition â ICDAR 2021 (Josep Lladós; Daniel Lopresti; Seiichi Uchida, eds.), Volume 12822, Springer International Publishing, Cham, 2021, pp. 492-506 https://link.springer.com/... (Accessed 2024-03-29) | DOI
[28] TS-Net : OCR Trained to Switch Between Text Transcription Styles, Document Analysis and Recognition – ICDAR 2021 (Josep Lladós; Daniel Lopresti; Seiichi Uchida, eds.), Springer Int. Publishing (2021), pp. 478-493 | DOI
[29] Un graphe spatio-temporel pour modéliser l’évolution de parcelles agricoles, Conférence internationale francophone en analyse spatiale et géomatique SAGEO (2019) | DOI
[30] StructuralLM : Structural Pre-training for Form Understanding, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Online (2021), pp. 6309-6318 https://aclanthology.org/2021.acl-long.493 | DOI
[31] A Survey on Deep Learning for Named Entity Recognition, IEEE Transactions on Knowledge and Data Engineering, Volume 34 (2020) no. 1, pp. 50-70 | DOI
[32] TrOCR : Transformer-Based Optical Character Recognition with Pre-trained Models, Proceedings of the AAAI Conference on Artificial Intelligence, Volume 37 (2023) no. 11, pp. 13094-13102 https://ojs.aaai.org/... | DOI
[33] Named entity recognition approaches, TAL, Volume 52 (2008) no. 1, p. 339â344 | DOI
[34] CamemBERT : a Tasty French Language Model, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), pp. 7203-7219 (Accessed 2021-09-20) | arXiv | DOI
[35] Casen : a transducer cascade to recognize french named entities, TAL, Volume 52 (2011) no. 1, pp. 69â-96 | arXiv | DOI
[36] Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons, Proceedings of the seventh conference on Computational Natural language learning at HLT-NAACL 2003 (2003), pp. 188-191
[37] A survey of named entity recognition and classification, Lingvisticae Investigationes, Volume 30 (2007) no. 1, p. 3-â26
[38] Hierarchical representation of optically scanned documents, Seventh International Conference on Pattern Recognition, Proceedings, Volume 1, Montreal, Canada (1984), pp. 347-349 | DOI
[39] Orchid–reduction-ratio-optimal computation of geo-spatial distances for link discovery, The Semantic Web -â ISWC 2013, Sydney, Australia, Springer (2013), pp. 395-410 | DOI
[40] Learning multilingual named entity recognition from Wikipedia, Artificial Intelligence, Volume 194 (2013), pp. 151-175 https://linkinghub.elsevier.com/... (Accessed 2024-04-01) | DOI
[41] Recognizing Named Entities using Automatically Extracted Transduction Rules, 5th Language and Technology Conference, Poznan, Poland (2011), pp. 136-140 | HAL | DOI
[42] The document spectrum for page layout analysis, IEEE Transactions on pattern analysis and machine intelligence, Volume 15 (1993) no. 11, pp. 1162-1173 | DOI | MR
[43] Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition ?, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Volume 01, Kyoto,Japan (2017), pp. 67-72 | HAL | DOI
[44] Data linking for the semantic web, International Journal on Semantic Web and Information Systems, Volume 7 (2011) no. 3, pp. 46-76 | DOI
[45] Layoutparser : A unified toolkit for deep learning based document image analysis, Document Analysis and Recognition–ICDAR 2021 : 16th International Conference, Proceedings, Part I 16, Lausanne, Switzerland, Springer (2021), pp. 131-146 | DOI
[46] A survey of modelling trends in temporal GIS, ACM Computing Surveys (CSUR), Volume 51 (2018) no. 2, pp. 1-41 | DOI
[47] Very Deep Convolutional Networks for Large-Scale Image Recognition, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings (Yoshua Bengio; Yann LeCun, eds.) (2015) | arXiv | DOI
[48] An overview of the Tesseract OCR engine, Int. Conf. on Doc. Analysis and Recognition, Volume 2, IEEE (2007), pp. 629-633 | DOI
[49] A modified recursive XY-cut algorithm for solving block ordering problems, 2nd International Conference on Computer Engineering and Technology, Volume 3, IEEE (2010) | arXiv | DOI
[50] SAKey : Scalable Almost Key Discovery in RDF Data, Proceedings of the 13th International Semantic Web Conference, ISWC 2014 (The Semantic Web – ISWC 2014), Volume Lecture Notes in Computer Science, Springer Verlag, Riva del Garda, Italy (2014) no. 8796, pp. 33-49 | HAL | DOI
[51] A Benchmark of Nested Named Entity Recognition Approaches in Historical Structured Documents, Document Analysis and Recognition - ICDAR 2023, Springer Nature Switzerland, Cham (2023), pp. 115-131 | DOI
[52] Création d’un graphe de connaissances géohistorique à partir d’annuaires du commerce parisien du xixe siècle : application aux métiers de la photographie, 34e Journées francophones d’Ingénierie des Connaissances (IC 2023)@ Plate-Forme Intelligence Artificielle (PFIA 2023), Strasbourg (2023) | HAL | DOI
[53] Towards limiting semantic data loss in 4D urban data semantic graph generation, ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume 8 (2021), pp. 37-44 | DOI
[54] Calamari – A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition, Digital Humanities Quarterly, Volume 14 (2020) no. 2 | DOI
[55] LayoutLM : Pre-training of text and layout for document image understanding, Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020), pp. 1192-1200 | DOI
Cité par Sources :