Découverte de cardinalités maximales significatives dans des bases de connaissances

Arnaud Giacometti; Béatrice Markhoff; Arnaud Soulet

doi:10.5802/roia.30

Arnaud Giacometti ¹ ; Béatrice Markhoff ¹ ; Arnaud Soulet ¹

¹ Université de Tours - LIFAT, Blois, France

Revue Ouverte d'Intelligence Artificielle, Post-actes de la Conférence Nationale en Intelligence Artificielle (CNIA 2018-2020), Volume 3 (2022) no. 3-4, pp. 223-251.

Résumé (VO)
Abstract

Les bases de connaissances du web sémantique sont générées à partir de plateformes collaboratives ou d’intégration de sources diverses. Cela entraîne évidemment des manques d’information et des erreurs ou incohérences. De plus, dans les programmes d’extraction de connaissances à partir de ces sources il est erroné de considérer que l’absence d’une information dans la base de connaissances équivaut à son inexistence, il faut donc munir la source interrogée d’informations complémentaires permettant de déterminer quand une relation interrogée peut être considérée comme complète. Le volume important de certaines bases nous permet d’utiliser l’inégalité de Hoeffding pour en extraire des règles de cardinalité significatives. Les expérimentations menées sur DBpedia et sur une base de connaissances numismatiques démontrent la faisabilité de l’approche et la pertinence des contraintes extraites.

Big semantic web knowledge bases (KB) are generated from collaborative platforms or by integration of various sources. This naturally induces lack of information, and inconsistencies. Moreover, missing data must not be considered as non existing data. Applications that query these KB’s content need complementary information to decide whether the queried data is complete. Based on KB’s volume, it is possible to discover such kind of information. We present an algorithm for extracting significant maximum cardinality rules from a knowledge base. We use Hoeffding’s inequality to define the likelihood for a constraint to be significant. Experiments conducted on DBpedia and on a numismatic knowledge base resulting from an integration process demonstrate the feasibility of the approach and the relevance of the discovered contextual constraints.

Reçu le : 2018-03-01
Révisé le : 2019-02-08
Accepté le : 2019-02-08
Publié le : 2022-04-08

DOI : 10.5802/roia.30

Mots-clés : Découverte de cardinalité, contraintes contextuelles, bases de connaissances.
Keywords: Cardinality Mining, Contextual Constraints, Knowledge Base.

Affiliations des auteurs :

Arnaud Giacometti ¹ ; Béatrice Markhoff ¹ ; Arnaud Soulet ¹

¹ Université de Tours - LIFAT, Blois, France

Licence :

CC-BY 4.0

Droits d'auteur : Les auteurs conservent leurs droits

@article{ROIA_2022__3_3-4_223_0,
     author = {Arnaud Giacometti and B\'eatrice Markhoff and Arnaud Soulet},
     title = {D\'ecouverte de cardinalit\'es maximales significatives dans des bases de connaissances},
     journal = {Revue Ouverte d'Intelligence Artificielle},
     pages = {223--251},
     publisher = {Association pour la diffusion de la recherche francophone en intelligence artificielle},
     volume = {3},
     number = {3-4},
     year = {2022},
     doi = {10.5802/roia.30},
     language = {fr},
     url = {https://roia.centre-mersenne.org/articles/10.5802/roia.30/}
}

TY  - JOUR
AU  - Arnaud Giacometti
AU  - Béatrice Markhoff
AU  - Arnaud Soulet
TI  - Découverte de cardinalités maximales significatives dans des bases de connaissances
JO  - Revue Ouverte d'Intelligence Artificielle
PY  - 2022
SP  - 223
EP  - 251
VL  - 3
IS  - 3-4
PB  - Association pour la diffusion de la recherche francophone en intelligence artificielle
UR  - https://roia.centre-mersenne.org/articles/10.5802/roia.30/
DO  - 10.5802/roia.30
LA  - fr
ID  - ROIA_2022__3_3-4_223_0
ER  -

%0 Journal Article
%A Arnaud Giacometti
%A Béatrice Markhoff
%A Arnaud Soulet
%T Découverte de cardinalités maximales significatives dans des bases de connaissances
%J Revue Ouverte d'Intelligence Artificielle
%D 2022
%P 223-251
%V 3
%N 3-4
%I Association pour la diffusion de la recherche francophone en intelligence artificielle
%U https://roia.centre-mersenne.org/articles/10.5802/roia.30/
%R 10.5802/roia.30
%G fr
%F ROIA_2022__3_3-4_223_0

Arnaud Giacometti; Béatrice Markhoff; Arnaud Soulet. Découverte de cardinalités maximales significatives dans des bases de connaissances. Revue Ouverte d'Intelligence Artificielle, Post-actes de la Conférence Nationale en Intelligence Artificielle (CNIA 2018-2020), Volume 3 (2022) no. 3-4, pp. 223-251. doi : 10.5802/roia.30. https://roia.centre-mersenne.org/articles/10.5802/roia.30/

Bibliographie
Cité par

[1] El Arby Sidi Aly; Mohamed Lamine Diakité; Arnaud Giacometti; Béatrice Markhoff; Arnaud Soulet Découverte de cardinalité maximale contextuelle dans les bases de connaissances, Actes de la Conférence Nationale d’Intelligence Artificielle et Rencontres des Jeunes Chercheurs en Intelligence Artificielle (CNIA+RJCIA 2018), Nancy, France (2018), pp. 86-93

[2] Manuel Atencia; Jérôme David; François Scharffe Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking., Proc. of the 18th International Conference on Knowledge Engineering and Knowledge Management (2012), pp. 144-153 | DOI

[3] Sören Auer; Christian Bizer; Georgi Kobilarov; Jens Lehmann; Richard Cyganiak; Zachary Ives Dbpedia : A Nucleus for a Web of Open Data, The semantic web, Springer, 2007, pp. 722-735 | DOI

[4] The Description Logic Handbook : Theory, Implementation, and Applications (Franz Baader; Diego Calvanese; Deborah L. McGuinness; Daniele Nardi; Peter F. Patel-Schneider, eds.), Cambridge University Press, New York, NY, USA, 2003 | DOI

[5] Franz Baader; Ulrike Sattler Expressive number restrictions in description logics, Journal of logic and computation, Volume 9 (1999) no. 3, pp. 319-350 | DOI | MR | Zbl

[6] Fariz Darari; Werner Nutt; Giuseppe Pirrò; Simon Razniewski Completeness Statements about RDF Data Sources and Their Use for Query Answering, Proc. of International Semantic Web Conference (2013), pp. 66-83 | DOI

[7] Fariz Darari; Simon Razniewski; Radityo Eko Prasojo; Werner Nutt Enabling Fine-Grained RDF Data Completeness Assessment, Proc. of International Conference on Web Engineering (2016), pp. 170-187 | DOI

[8] Fredo Erxleben; Michael Günther; Markus Krötzsch; Julian Mendez; Denny Vrandečić Introducing Wikidata to the linked data web, Proc. of International Semantic Web Conference (2014), pp. 50-65 | DOI

[9] Achille Felicetti; Philipp Gerth; Carlo Meghini; Maria Theodoridou Integrating Heterogeneous Coin Datasets in the Context of Archaeological Research, Proc. of the Workshop on Extending, Mapping and Focusing the CRM, co-located with 19th ICTPDL conference (2015), pp. 13-27

[10] Luis Antonio Galárraga; Katia Hose; Simon Razniewski Enabling Completeness-aware Querying in SPARQL, Proc. of the 21st Workshop on the Web and Databases (2017), pp. 19-22 | DOI

[11] Luis Antonio Galárraga; Simon Razniewski; Antoine Amarilli; Fabian M. Suchanek Predicting completeness in knowledge bases, Proc. of the 10th ACM International Conference on Web Search and Data Mining (2017), pp. 375-383 | DOI

[12] Luis Antonio Galárraga; Christina Teflioudi; Katja Hose; Fabian Suchanek AMIE : Association Rule Mining Under Incomplete Evidence in Ontological Knowledge Bases, Proc. of World Wide Web Conference (2013), pp. 413-422 | DOI

[13] Wassily Hoeffding Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, Volume 58 (1963) no. 301, pp. 13-30 | DOI | MR

[14] Jonathan Lajus; Fabian M. Suchanek Are all people married ? Determining obligatory attributes in knowledge bases, Proc. of World Wide Web conference (2018), pp. 1115-1124 | DOI

[15] Stephen W. Liddle; David W. Embley; Scott N. Woodfield Cardinality constraints in semantic data models, Data & Knowledge Engineering, Volume 11 (1993) no. 3, pp. 235-270 | DOI | Zbl

[16] Heikki Mannila; Hannu Toivonen Levelwise search and borders of theories in knowledge discovery, Data Mining and Knowledge Discovery, Volume 1 (1997) no. 3, pp. 241-258 | DOI

[17] Paramita Mirza; Simon Razniewski; Fariz Darari; Gerhard Weikum Enriching Knowledge Bases with Counting Quantifiers, Proc. of International Semantic Web Conference (2018), pp. 179-197 | DOI

[18] Amihai Motro Integrity = Validity + Completeness, ACM Transactional Database Systems, Volume 14 (1989) no. 4, pp. 480-502 | DOI

[19] Emir Muñoz On learnability of constraints from RDF data, Proc. of International Semantic Web Conference (2016), pp. 834-844 | DOI

[20] Emir Muñoz; Matthias Nickles Mining cardinalities from knowledge bases, Proc. of International Conference on Database and Expert Systems Applications (2017), pp. 447-462 | DOI

[21] Nathalie Pernelle; Fatiha Saïs; Danai Symeonidou An automatic key discovery approach for data linking, Web Semantics : Science, Services and Agents on the World Wide Web, Volume 23 (2013), pp. 16-30 | DOI

[22] Simon Razniewski; Flip Korn; Werner Nutt; Divesh Srivastava Identifying the Extent of Completeness of Query Answers over Partially Complete Databases, Proc. of the ACM SIGMOD International Conference on Management of Data (2015), pp. 561-576 | DOI

[23] Simon Razniewski; Fabian Suchanek; Werner Nutt But What Do We Actually Know ?, Proc. of the 5th Workshop on Automated Knowledge Base Construction (2016), pp. 40-44 | DOI

[24] Shai Shalev-Shwartz; Shai Ben-David Understanding Machine Learning : From Theory to Algorithms, Cambridge University Press, 2014 | DOI

[25] Arnaud Soulet; Arnaud Giacometti; Béatrice Markhoff; Fabian M. Suchanek Representativeness of Knowledge Bases with the Generalized Benford’s Law, Proc. of International Semantic Web Conference (2018), pp. 374-390 | DOI

[26] Christian Soutou Relational database reverse engineering : algorithms to extract cardinality constraints, Data & Knowledge Engineering, Volume 28 (1998) no. 2, pp. 161-207 | DOI | Zbl

[27] Danai Symeonidou; Vincent Armant; Nathalie Pernelle; Fatiha Saïs SAKey : Scalable almost key discovery in RDF data, In Proc. of International Semantic Web Conference (2014), pp. 33-49 | DOI

[28] Danai Symeonidou; Luis Antonio Galárraga; Nathalie Pernelle; Fatiha Saïs; Fabian Suchanek VICKEY : Mining Conditional Keys on Knowledge Bases, Proc. of International Semantic Web Conference (2017), pp. 661-677 | DOI

[29] Thomas Pellissier Tanon; Daria Stepanova; Simon Razniewski; Paramita Mirza; Gerhard Weikum Completeness-Aware Rule Learning from Knowledge Graphs, Proc. of International Semantic Web Conference (2017), pp. 507-525 | DOI

[30] Bernhard Thalheim Fundamentals of cardinality constraints, Proc. of International Conference on Conceptual Modeling (1992), pp. 7-23 | DOI

[31] Johanna Völker; Mathias Niepert Statistical schema induction, Proc. of Extended Semantic Web Conference (2011), pp. 124-138

[32] Gerhard Weikum; Johannes Hoffart; Fabian M. Suchanek Ten Years of Knowledge Harvesting : Lessons and Challenges, IEEE Data Engineering Bulletin, Volume 39 (2016) no. 3, pp. 41-50 http://sites.computer.org/debull/A16sept/p41.pdf

[33] Dowming Yeh; Yuwen Li; William Chu Extracting entity-relationship diagram from a table-based legacy database, Journal of Systems and Software, Volume 81 (2008) no. 5, pp. 764-771 | DOI

[34] Amrapali Zaveri; Anisa Rula; Andrea Maurino; Ricardo Pietrobon; Jens Lehmann; Sören Auer Quality assessment for linked data : A survey, Semantic Web journal, Volume 7 (2016) no. 1, pp. 63-93 | DOI

Cité par Sources :