Les bases de connaissances du web sémantique sont générées à partir de plateformes collaboratives ou d’intégration de sources diverses. Cela entraîne évidemment des manques d’information et des erreurs ou incohérences. De plus, dans les programmes d’extraction de connaissances à partir de ces sources il est erroné de considérer que l’absence d’une information dans la base de connaissances équivaut à son inexistence, il faut donc munir la source interrogée d’informations complémentaires permettant de déterminer quand une relation interrogée peut être considérée comme complète. Le volume important de certaines bases nous permet d’utiliser l’inégalité de Hoeffding pour en extraire des règles de cardinalité significatives. Les expérimentations menées sur DBpedia et sur une base de connaissances numismatiques démontrent la faisabilité de l’approche et la pertinence des contraintes extraites.
Big semantic web knowledge bases (KB) are generated from collaborative platforms or by integration of various sources. This naturally induces lack of information, and inconsistencies. Moreover, missing data must not be considered as non existing data. Applications that query these KB’s content need complementary information to decide whether the queried data is complete. Based on KB’s volume, it is possible to discover such kind of information. We present an algorithm for extracting significant maximum cardinality rules from a knowledge base. We use Hoeffding’s inequality to define the likelihood for a constraint to be significant. Experiments conducted on DBpedia and on a numismatic knowledge base resulting from an integration process demonstrate the feasibility of the approach and the relevance of the discovered contextual constraints.
Révisé le :
Accepté le :
Publié le :
Keywords: Cardinality Mining, Contextual Constraints, Knowledge Base.
Arnaud Giacometti 1 ; Béatrice Markhoff 1 ; Arnaud Soulet 1
@article{ROIA_2022__3_3-4_223_0, author = {Arnaud Giacometti and B\'eatrice Markhoff and Arnaud Soulet}, title = {D\'ecouverte de cardinalit\'es maximales significatives dans des bases de connaissances}, journal = {Revue Ouverte d'Intelligence Artificielle}, pages = {223--251}, publisher = {Association pour la diffusion de la recherche francophone en intelligence artificielle}, volume = {3}, number = {3-4}, year = {2022}, doi = {10.5802/roia.30}, language = {fr}, url = {https://roia.centre-mersenne.org/articles/10.5802/roia.30/} }
TY - JOUR AU - Arnaud Giacometti AU - Béatrice Markhoff AU - Arnaud Soulet TI - Découverte de cardinalités maximales significatives dans des bases de connaissances JO - Revue Ouverte d'Intelligence Artificielle PY - 2022 SP - 223 EP - 251 VL - 3 IS - 3-4 PB - Association pour la diffusion de la recherche francophone en intelligence artificielle UR - https://roia.centre-mersenne.org/articles/10.5802/roia.30/ DO - 10.5802/roia.30 LA - fr ID - ROIA_2022__3_3-4_223_0 ER -
%0 Journal Article %A Arnaud Giacometti %A Béatrice Markhoff %A Arnaud Soulet %T Découverte de cardinalités maximales significatives dans des bases de connaissances %J Revue Ouverte d'Intelligence Artificielle %D 2022 %P 223-251 %V 3 %N 3-4 %I Association pour la diffusion de la recherche francophone en intelligence artificielle %U https://roia.centre-mersenne.org/articles/10.5802/roia.30/ %R 10.5802/roia.30 %G fr %F ROIA_2022__3_3-4_223_0
Arnaud Giacometti; Béatrice Markhoff; Arnaud Soulet. Découverte de cardinalités maximales significatives dans des bases de connaissances. Revue Ouverte d'Intelligence Artificielle, Volume 3 (2022) no. 3-4, pp. 223-251. doi : 10.5802/roia.30. https://roia.centre-mersenne.org/articles/10.5802/roia.30/
[1] Découverte de cardinalité maximale contextuelle dans les bases de connaissances, Actes de la Conférence Nationale d’Intelligence Artificielle et Rencontres des Jeunes Chercheurs en Intelligence Artificielle (CNIA+RJCIA 2018), Nancy, France (2018), pp. 86-93
[2] Keys and Pseudo-Keys Detection for Web Datasets Cleansing and Interlinking., Proc. of the 18th International Conference on Knowledge Engineering and Knowledge Management (2012), pp. 144-153 | DOI
[3] Dbpedia : A Nucleus for a Web of Open Data, The semantic web, Springer, 2007, pp. 722-735 | DOI
[4] The Description Logic Handbook : Theory, Implementation, and Applications (Franz Baader; Diego Calvanese; Deborah L. McGuinness; Daniele Nardi; Peter F. Patel-Schneider, eds.), Cambridge University Press, New York, NY, USA, 2003 | DOI
[5] Expressive number restrictions in description logics, Journal of logic and computation, Volume 9 (1999) no. 3, pp. 319-350 | DOI | MR | Zbl
[6] Completeness Statements about RDF Data Sources and Their Use for Query Answering, Proc. of International Semantic Web Conference (2013), pp. 66-83 | DOI
[7] Enabling Fine-Grained RDF Data Completeness Assessment, Proc. of International Conference on Web Engineering (2016), pp. 170-187 | DOI
[8] Introducing Wikidata to the linked data web, Proc. of International Semantic Web Conference (2014), pp. 50-65 | DOI
[9] Integrating Heterogeneous Coin Datasets in the Context of Archaeological Research, Proc. of the Workshop on Extending, Mapping and Focusing the CRM, co-located with 19th ICTPDL conference (2015), pp. 13-27
[10] Enabling Completeness-aware Querying in SPARQL, Proc. of the 21st Workshop on the Web and Databases (2017), pp. 19-22 | DOI
[11] Predicting completeness in knowledge bases, Proc. of the 10th ACM International Conference on Web Search and Data Mining (2017), pp. 375-383 | DOI
[12] AMIE : Association Rule Mining Under Incomplete Evidence in Ontological Knowledge Bases, Proc. of World Wide Web Conference (2013), pp. 413-422 | DOI
[13] Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association, Volume 58 (1963) no. 301, pp. 13-30 | DOI | MR
[14] Are all people married ? Determining obligatory attributes in knowledge bases, Proc. of World Wide Web conference (2018), pp. 1115-1124 | DOI
[15] Cardinality constraints in semantic data models, Data & Knowledge Engineering, Volume 11 (1993) no. 3, pp. 235-270 | DOI | Zbl
[16] Levelwise search and borders of theories in knowledge discovery, Data Mining and Knowledge Discovery, Volume 1 (1997) no. 3, pp. 241-258 | DOI
[17] Enriching Knowledge Bases with Counting Quantifiers, Proc. of International Semantic Web Conference (2018), pp. 179-197 | DOI
[18] Integrity = Validity + Completeness, ACM Transactional Database Systems, Volume 14 (1989) no. 4, pp. 480-502 | DOI
[19] On learnability of constraints from RDF data, Proc. of International Semantic Web Conference (2016), pp. 834-844 | DOI
[20] Mining cardinalities from knowledge bases, Proc. of International Conference on Database and Expert Systems Applications (2017), pp. 447-462 | DOI
[21] An automatic key discovery approach for data linking, Web Semantics : Science, Services and Agents on the World Wide Web, Volume 23 (2013), pp. 16-30 | DOI
[22] Identifying the Extent of Completeness of Query Answers over Partially Complete Databases, Proc. of the ACM SIGMOD International Conference on Management of Data (2015), pp. 561-576 | DOI
[23] But What Do We Actually Know ?, Proc. of the 5th Workshop on Automated Knowledge Base Construction (2016), pp. 40-44 | DOI
[24] Understanding Machine Learning : From Theory to Algorithms, Cambridge University Press, 2014 | DOI
[25] Representativeness of Knowledge Bases with the Generalized Benford’s Law, Proc. of International Semantic Web Conference (2018), pp. 374-390 | DOI
[26] Relational database reverse engineering : algorithms to extract cardinality constraints, Data & Knowledge Engineering, Volume 28 (1998) no. 2, pp. 161-207 | DOI | Zbl
[27] SAKey : Scalable almost key discovery in RDF data, In Proc. of International Semantic Web Conference (2014), pp. 33-49 | DOI
[28] VICKEY : Mining Conditional Keys on Knowledge Bases, Proc. of International Semantic Web Conference (2017), pp. 661-677 | DOI
[29] Completeness-Aware Rule Learning from Knowledge Graphs, Proc. of International Semantic Web Conference (2017), pp. 507-525 | DOI
[30] Fundamentals of cardinality constraints, Proc. of International Conference on Conceptual Modeling (1992), pp. 7-23 | DOI
[31] Statistical schema induction, Proc. of Extended Semantic Web Conference (2011), pp. 124-138
[32] Ten Years of Knowledge Harvesting : Lessons and Challenges, IEEE Data Engineering Bulletin, Volume 39 (2016) no. 3, pp. 41-50 http://sites.computer.org/debull/A16sept/p41.pdf
[33] Extracting entity-relationship diagram from a table-based legacy database, Journal of Systems and Software, Volume 81 (2008) no. 5, pp. 764-771 | DOI
[34] Quality assessment for linked data : A survey, Semantic Web journal, Volume 7 (2016) no. 1, pp. 63-93 | DOI
Cité par Sources :