Étude de transférabilité de clés pour un liage explicable de données entre graphes de connaissances

Thibaut SOULARD; Fatiha SAÏS; Joe RAAD

doi:10.5802/roia.91

Thibaut SOULARD ¹ ; Fatiha SAÏS ¹ ; Joe RAAD ¹

¹ LISN, CNRS (UMR 9015), Université Paris Saclay (France)

Revue Ouverte d'Intelligence Artificielle, Post-actes de la conférence Ingénierie des Connaissances (IC 2021-2022-2023), Volume 6 (2025) no. 1-2, pp. 9-33

Résumé (VO)
Abstract

Le liage de données dans les graphes de connaissances est un problème crucial et de longue date ; il consiste à déterminer des liens d’identité entre les descriptions des entités de deux graphes désignant une même entité du monde réel (e.g., même personne, même livre, même protéine). Les clés, qui sont des sous-ensembles de propriétés permettant d’identifier chaque instance dans un graphe, sont des éléments importants pour la découverte de ces liens d’identité. L’approche classique de liage de données fondée sur les clés consiste à découvrir un ensemble de clés dans chaque graphe, et d’appliquer ensuite une procédure de fusion des ensembles de clés obtenues. Cependant, cette approche peut conduire à la réduction du nombre de clés valides dans les deux graphes et peut parfois être très coûteuse en temps de calcul. Nous proposons dans ce travail une nouvelle approche de liage de données fondée sur le transfert de clés découvertes sur un graphe source vers un graphe cible impliqué dans la tâche de liage de données. Ce transfert s’appuie sur un alignement de propriétés connues a priori et sur le calcul de métriques permettant de valider la qualité des clés transférées vers le graphe cible. Nous avons conduit des expérimentations sur plusieurs jeux de données extraits du web de données (DBpedia, Wikidata et YAGO) afin d’évaluer la qualité du liage de données et le gain en temps de calcul obtenu grâce au transfert de clés.

Data linking in knowledge graphs is a crucial and long-standing problem; it involves determining identity links between the descriptions of entities in two graphs designating the same real-world entity (e.g. the same person, the same book, the same protein). Keys, which are subsets of properties that identify each instance in a graph, are important elements in the discovery of these identity links. The traditional approach of key-based data linking is to discover a set of keys in each graph, and then apply a fusion procedure to the sets of keys obtained. However, this approach can lead to a reduction in the number of valid keys in both graphs and can sometimes be very computationally expensive. In this work, we propose a new data linking approach based on the transfer of keys discovered on a source graph to a target graph involved in the data linking task. This transfer is based on an alignment of properties known a priori and on the computation of metrics to validate the quality of the keys transferred to the target graph. We have carried out experiments on several datasets extracted from the web of data (DBpedia, Wikidata and YAGO) in order to evaluate the quality of the data linking and the gain in computing time obtained thanks to the transfer of keys.

Publié le : 2025-11-21

DOI : 10.5802/roia.91

Mots-clés : Liage de données, découverte de clés, graphes de connaissances, transfert de clés, Explicabilité.
Keywords: Data linking, key discovery, knowledge graphs, key transfer, Explainability

Affiliations des auteurs :

Thibaut SOULARD ¹ ; Fatiha SAÏS ¹ ; Joe RAAD ¹

¹ LISN, CNRS (UMR 9015), Université Paris Saclay (France)

Licence :

CC-BY 4.0

Droits d'auteur : Les auteurs conservent leurs droits

@article{ROIA_2025__6_1-2_9_0,
     author = {Thibaut SOULARD and Fatiha SA\"IS and Joe RAAD},
     title = {\'Etude de transf\'erabilit\'e de cl\'es pour un liage explicable de donn\'ees entre graphes de connaissances},
     journal = {Revue Ouverte d'Intelligence Artificielle},
     pages = {9--33},
     year = {2025},
     publisher = {Association pour la diffusion de la recherche francophone en intelligence artificielle},
     volume = {6},
     number = {1-2},
     doi = {10.5802/roia.91},
     language = {fr},
     url = {https://roia.centre-mersenne.org/articles/10.5802/roia.91/}
}

TY  - JOUR
AU  - Thibaut SOULARD
AU  - Fatiha SAÏS
AU  - Joe RAAD
TI  - Étude de transférabilité de clés pour un liage explicable de données entre graphes de connaissances
JO  - Revue Ouverte d'Intelligence Artificielle
PY  - 2025
SP  - 9
EP  - 33
VL  - 6
IS  - 1-2
PB  - Association pour la diffusion de la recherche francophone en intelligence artificielle
UR  - https://roia.centre-mersenne.org/articles/10.5802/roia.91/
DO  - 10.5802/roia.91
LA  - fr
ID  - ROIA_2025__6_1-2_9_0
ER  -

%0 Journal Article
%A Thibaut SOULARD
%A Fatiha SAÏS
%A Joe RAAD
%T Étude de transférabilité de clés pour un liage explicable de données entre graphes de connaissances
%J Revue Ouverte d'Intelligence Artificielle
%D 2025
%P 9-33
%V 6
%N 1-2
%I Association pour la diffusion de la recherche francophone en intelligence artificielle
%U https://roia.centre-mersenne.org/articles/10.5802/roia.91/
%R 10.5802/roia.91
%G fr
%F ROIA_2025__6_1-2_9_0

Thibaut SOULARD; Fatiha SAÏS; Joe RAAD. Étude de transférabilité de clés pour un liage explicable de données entre graphes de connaissances. Revue Ouverte d'Intelligence Artificielle, Post-actes de la conférence Ingénierie des Connaissances (IC 2021-2022-2023), Volume 6 (2025) no. 1-2, pp. 9-33. doi: 10.5802/roia.91

Bibliographie
Cité par

[1] Nacira Abbas; Alexandre Bazin; Jérôme David; Amedeo Napoli Discovery of link keys in resource description framework datasets based on pattern structures, Int. J. Approx. Reason., Volume 161 (2023), 108978 | DOI | Zbl | MR

[2] Mustafa Al-Bakri; Manuel Atencia; Steffen Lalande; Marie-Christine Rousset Inferring Same-As Facts from Linked Data : An Iterative Import-by-Query Approach, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA (2015), pp. 9-15 http://www.aaai.org/... | DOI

[3] Ali Assi; Hamid Mcheick; Wajdi Dhifli Data linking over RDF knowledge graphs : A survey, Concurrency and Computation : Practice and Experience, Volume 32 (2020) no. 19, e5746 https://api.semanticscholar.org/corpusid:219035154

[4] Manuel Atencia; Michel Chein; Madalina Croitoru; Jérôme David; Michel Leclère; Nathalie Pernelle; Fatiha Saïs; François Scharffe; Danai Symeonidou Defining Key Semantics for the RDF Datasets : Experiments and Evaluations, Graph-Based Representation and Reasoning – 21st International Conference on Conceptual Structures, ICCS 2014, Iaşi, Romania, July 27-30, 2014, Proceedings (2014), pp. 65-78 | DOI

[5] Manuel Atencia; Jérôme David; Jérôme Euzenat On the relation between keys and link keys for data interlinking, Semantic Web, Volume 12 (2021) no. 4, pp. 547-567 | DOI

[6] Vassilis Christophides; Vasilis Efthymiou; Themis Palpanas; George Papadakis; Kostas Stefanidis An Overview of End-to-End Entity Resolution for Big Data, ACM Comput. Surv., Volume 53 (2020) no. 6, 127, 42 pages | DOI | Zbl | MR

[7] Jacob Devlin; Ming-Wei Chang; Kenton Lee; Kristina N. Toutanova BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding (2018) (https://arxiv.org/abs/1810.04805) | DOI | Zbl | MR

[8] Nikolaos Fanourakis; Vasilis Efthymiou; Dimitris Kotzinos; Vassilis Christophides Knowledge Graph Embedding Methods for Entity Alignment : An Experimental Review (2022), 127 (https://arxiv.org/abs/2203.09280) | DOI

[9] Sidharth Mudgal; Han Li; Theodoros Rekatsinas; AnHai Doan; Youngchoon Park; Ganesh Krishnan; Rohit Deep; Esteban Arcaute; Vijay Raghavendra Deep Learning for Entity Matching : A Design Space Exploration, Proceedings of the 2018 International Conference on Management of Data (SIGMOD ’18) (2018), p. 19–34 | DOI

[10] Axel-Cyrille Ngonga Ngomo; Sören Auer LIMES : a time-efficient approach for large-scale link discovery on the web of data, Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Three (IJCAI’11), AAAI Press (2011), p. 2312–2317

[11] Nathalie Pernelle; Fatiha Saïs; Danai Symeonidou An automatic key discovery approach for data linking, Journal of Web Semantics, Volume 23 (2013), pp. 16-30 | DOI

[12] Fatiha Saïs; Nathalie Pernelle; Marie-Christine Rousset Combining a logical and a numerical method for data reconciliation, Journal on Data Semantics XII, Springer, 2009, pp. 66-94 | DOI

[13] Tommaso Soru; Edgard Marx; Axel-Cyrille Ngonga Ngomo ROCKER : A Refinement Operator for Key Discovery, Proceedings of the 24th International Conference on World Wide Web (WWW ’15), International World Wide Web Conferences Steering Committee (2015), p. 1025–1033 | DOI

[14] T. Soulard Knowledge-based Entity Linking in Heterogeneous Knowledge Graphs at Web-Scale (2022) (Technical report) | HAL | DOI

[15] Zequn Sun; Qingheng Zhang; Wei Hu; Chengming Wang; Muhao Chen; Farahnaz Akrami; Chengkai Li A benchmarking study of embedding-based entity alignment for knowledge graphs, Proceedings of the VLDB Endowment, Volume 13 (2020) no. 12, pp. 2326-2340 | DOI

[16] Danai Symeonidou; Vincent Armant; Nathalie Pernelle; Fatiha Saïs Sakey : Scalable almost key discovery in RDF data, The Semantic Web – ISWC 2014, Springer (2014), pp. 33-49 | DOI

[17] Danai Symeonidou; Luis Galárraga; Nathalie Pernelle; Fatiha Saïs; Fabian Suchanek Vickey : Mining conditional keys on knowledge bases, The Semantic Web –- ISWC 2017, Springer (2017), pp. 661-677 | DOI

[18] Beibei Zhu; Ruolin Wang; Junyi Wang; Fei Shao; Kerun Wang A survey : knowledge graph entity alignment research based on graph embedding, Artif. Intell. Rev., Volume 57 (2024) no. 9, 229 | DOI

Cité par Sources :