Dans cet article, nous présentons un graphe de connaissances construit à partir d’un corpus d’articles scientifiques portant sur les méthodes de sélection génomique pour la culture du blé. Ces méthodes contribuent à l’amélioration du profil agronomique et de la qualité des variétés de blé. La littérature scientifique sur le sujet est en croissance continue ces vingt dernières années. Dans un premier temps, un outil de traitement automatique du langage nous a permis d’extraire et de normaliser différentes entités nommées en les associant à des concepts préalablement définis dans une ontologie du domaine. Ces entités se réfèrent à des noms de gènes, traits, phénotypes, marqueurs et variétés (cultivars) de blé. Nous avons construit un graphe qui intègre, structure et décrit ces entités en se basant sur l’ontologie W3C Web Annotation Ontology (OA) pour formaliser la description du contexte d’apparition de ces entités au sein du corpus et apporter ainsi des indications sur les liens et les associations fréquentes entre ces éléments. En s’appuyant sur un ensemble de questions de compétence formulées par un expert du domaine, nous avons validé la pertinence du modèle proposé et par conséquent le graphe de connaissances généré. Afin de rendre notre graphe accessible à un grand nombre d’utilisateurs, nous avons développé plusieurs interfaces de recherche et de visualisation permettant d’explorer les contextes d’apparition de plusieurs entités (de différents types) dans le même article. Ce travail contribue à la structuration, à la compréhension et l’exploration des connaissances dans le domaine de la sélection génomique du blé, en fournissant un cadre formel pour la découverte et l’analyse des relations entre les entités pertinentes du domaine. Nous proposons une méthode d’ingénierie des connaissances de bout en bout, générique et adaptable à la valorisation d’un corpus de littérature scientifique quel que soit son domaine scientifique.
In this article, we present a knowledge graph built from a corpus of scientific articles on genomic selection methods for wheat culture. These main purpose of these methods is to improve the agronomic profile and quality of wheat varieties. The scientific literature on the subject has been growing steadily over the last twenty years. Initially, an NLP tool enabled us to extract and normalize various named entities by linking them to concepts previously defined in relevant domain ontologies. These entities refer to the names of genes, traits, phenotypes, markers and varieties (cultivars). The graph presented in this work structures and integrates these entities, based on the W3C Web Annotation Ontology (OA). The use of the OA ontology enables us to formalize the description of the context in which entities appear in the text. In this way, the graph highlights the context of appearance of these entities within the corpus, providing indications of the links and frequent associations between them. Based on a set of competency questions formulated by a domain expert, we validated the relevance of the proposed model and consequently the knowledge graph generated. In order to make our graph accessible to a large number of users, we developed several search and visualization interfaces enabling exploration of the contexts of appearance of several entities (of different types) in the same article. This work contributes to the structuring, understanding and exploration of knowledge in the field of wheat genomic selection, by providing a formal framework for the discovery and analysis of relationships between relevant entities in the domain. We propose an end-to-end knowledge engineering methodology that is both generic and adaptable, designed to facilitate the exploration and the analysis of scientific literature corpora across diverse disciplinary domains.
Keywords: Linked Data, ontologies, semantic annotation, knowledge Graphs, text Mining.
Nadia Yacoubi Ayadi 1 ; Catherine Faron 2 ; Franck Michel 3 ; Robert Bossy 4 ; Arnaud Barbe 2
CC-BY 4.0
@article{ROIA_2025__6_1-2_107_0,
author = {Nadia Yacoubi Ayadi and Catherine Faron and Franck Michel and Robert Bossy and Arnaud Barbe},
title = {Construction et exploitation d{\textquoteright}un graphe de connaissances sur la litt\'erature scientifique en sciences de la vie},
journal = {Revue Ouverte d'Intelligence Artificielle},
pages = {107--129},
year = {2025},
publisher = {Association pour la diffusion de la recherche francophone en intelligence artificielle},
volume = {6},
number = {1-2},
doi = {10.5802/roia.95},
language = {fr},
url = {https://roia.centre-mersenne.org/articles/10.5802/roia.95/}
}
TY - JOUR AU - Nadia Yacoubi Ayadi AU - Catherine Faron AU - Franck Michel AU - Robert Bossy AU - Arnaud Barbe TI - Construction et exploitation d’un graphe de connaissances sur la littérature scientifique en sciences de la vie JO - Revue Ouverte d'Intelligence Artificielle PY - 2025 SP - 107 EP - 129 VL - 6 IS - 1-2 PB - Association pour la diffusion de la recherche francophone en intelligence artificielle UR - https://roia.centre-mersenne.org/articles/10.5802/roia.95/ DO - 10.5802/roia.95 LA - fr ID - ROIA_2025__6_1-2_107_0 ER -
%0 Journal Article %A Nadia Yacoubi Ayadi %A Catherine Faron %A Franck Michel %A Robert Bossy %A Arnaud Barbe %T Construction et exploitation d’un graphe de connaissances sur la littérature scientifique en sciences de la vie %J Revue Ouverte d'Intelligence Artificielle %D 2025 %P 107-129 %V 6 %N 1-2 %I Association pour la diffusion de la recherche francophone en intelligence artificielle %U https://roia.centre-mersenne.org/articles/10.5802/roia.95/ %R 10.5802/roia.95 %G fr %F ROIA_2025__6_1-2_107_0
Nadia Yacoubi Ayadi; Catherine Faron; Franck Michel; Robert Bossy; Arnaud Barbe. Construction et exploitation d’un graphe de connaissances sur la littérature scientifique en sciences de la vie. Revue Ouverte d'Intelligence Artificielle, Post-actes de la conférence Ingénierie des Connaissances (IC 2021-2022-2023), Volume 6 (2025) no. 1-2, pp. 107-129. doi: 10.5802/roia.95
[1] Interoperability of corpus processing work-flow engines : the case of AlvisNLP/ML in OpenMinTeD, Proceedings of the Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability (INTEROP 2016) at LREC 2016, Portorož, Slovenia (2016), 983805, pp. 15-18 | DOI
[2] Linked Data Interfaces : A Survey, Inf., Volume 14 (2023) no. 9, p. 483 | DOI
[3] Bacteria Biotope at BioNLP Open Shared Tasks 2019, 5th Workshop on BioNLP Open Shared Tasks BioNLP-OSTEMNLP-IJCNLP 2019, ACL (2019) | HAL | DOI
[4] Knowledge Graphs for the Life Sciences : Recent Developments, Challenges and Opportunities, Transactions on Graph Data and Knowledge, Volume 1 (2023) no. 1, 5, 33 pages | HAL | DOI
[5] Expression of Core FRBR Concepts in RDF, https://vocab.org/frbr/core | DOI
[6] PromptORE – A Novel Approach Towards Fully Unsupervised Relation Extraction, Proceedings of the 31st ACM International Conference on Information & Knowledge Management, ACM (2022), pp. 561-571 | DOI
[7] Towards reliable named entity recognition in the biomedical domain, Bioinformatics, Volume 36 (2019) no. 1, pp. 280-286 | arXiv | DOI
[8] Construction of Knowledge Graphs : Current State and Challenges, Information, Volume 15 (2024) no. 8, 509, 61 pages | arXiv | DOI
[9] Knowledge Graphs (2020), 509 (https://arxiv.org/abs/2003.02320) | DOI
[10] Surveying biomedical relation extraction : a critical examination of current datasets and the proposal of a new resource, Briefings in Bioinformatics, Volume 25 (2024) no. 3, bbae132, 17 pages | DOI
[11] REBEL : Relation Extraction By End-to-end Language generation, Findings of the Association for Computational Linguistics : EMNLP 2021 (Marie-Francine Moens; Xuanjing Huang; Lucia Specia; Scott Wen-tau Yih, eds.), Association for Computational Linguistics, Punta Cana, Dominican Republic (2021), bbae132, pp. 2370-2381 https://aclanthology.org/2021.findings-emnlp.204 | DOI
[12] Sampo-UI : A full stack JavaScript framework for developing semantic portal user interfaces, Semantic Web, Volume 13 (2021), pp. 69-84 https://api.semanticscholar.org/CorpusID:233555382 | DOI
[13] Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task, BMC bioinformatics, Volume 16 (2015), bay147, p. S3 | DOI
[14] Semantic representation of annotation involving texts and linked data resources (2015) (Semantic Web journal)
[15] AgroLD : A knowledge graph for the plant sciences, ISWC 2021 – 20th International Semantic Web Conference (Lecture Notes in Computer Science), Volume 12922, Springer International Publishing, Virtual, France (2021), pp. 496-510 | DOI
[16] Challenges in clinical natural language processing for automated disorder normalization, Journal of biomedical informatics, Volume 57 (2015), pp. 28-37 | DOI
[17] Distantly Supervised Relation Extraction using Multi-Layer Revision Network and Confidence-based Multi-Instance Learning, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (Marie-Francine Moens; Xuanjing Huang; Lucia Specia; Scott Wen-tau Yih, eds.), Association for Computational Linguistics, Punta Cana, Dominican Republic (2021), pp. 165-174 https://aclanthology.org/2021.emnlp-main.15 | DOI
[18] K-BERT : Enabling Language Representation with Knowledge Graph (2019) (https://arxiv.org/abs/1909.07606) | DOI
[19] WebVOWL : Web-based Visualization of Ontologies, International Conference Knowledge Engineering and Knowledge Management, Springer-Verlag, Berlin, Heidelberg (2014), pp. 154-158 https://api.semanticscholar.org/CorpusID:40280600 | DOI
[20] LDViz : a tool to assist the multidimensional exploration of SPARQL endpoints, Web Information Systems and Technologies : 16th International Conference, WEBIST 2020 (LNBIP - Lecture Notes in Business Information Processing), Volume LNBIP – 469, Springer, 2023, pp. 149-173 | HAL | DOI
[21] Translation of Relational and Non-relational Databases into RDF with xR2RML, WEBIST 2015 – Proceedings of the 11th International Conference on Web Information Systems and Technologies, Lisbon, Portugal, 20-22 May, 2015 (Valérie Monfort; Karl-Heinz Krempels; Tim A. Majchrzak; Ziga Turk, eds.), SciTePress (2015), pp. 443-454 | DOI
[22] WheatGenomicsSLKG Web Application Backend Services, https://github.com/Wimmics/wheatgenomicsslkg-web-backend/tree/1.0, 2024 ([Software] V1.0, DOI : 10.5281/zenodo.10514504)
[23] WheatGenomicsSLKG Web Visualization, https://github.com/Wimmics/wheatgenomicsslkg-web-visualization/tree/1.0, 2024 ([Software] V1.0, DOI : 10.5281/zenodo.10514502) | DOI
[24] Integration of Web APIs and Linked Data Using SPARQL Micro-Services – Application to Biodiversity Use Cases, Information, Volume 9 (2018) no. 12, 310 | HAL | DOI
[25] Covid-on-the-Web : Knowledge Graph and Services to Advance COVID-19 Research, 19th International Semantic Web Conference, Athens, Greece, November 2-6, 2020, Proceedings, Part II (Lecture Notes in Computer Science), Volume 12507, Springer (2020), pp. 294-310 | DOI | HAL
[26] et al. Covid-on-the-Web : Graphe de Connaissances et Services pour faire Progresser la Recherche sur la COVID-19, IC 2021 – 32e Journées francophones d’Ingénierie des Connaissances, Bordeaux, France, Maxime Lefrançois (2021), pp. 1-9 | HAL
[27] Morph-xR2RML, https://github.com/frmichel/morph-xr2rml/tree/morph-xr2rml-1.3.2, 2021 ([Software] V1.3.2, SWHID : swh :1 :rev :2494b1da7b128e38edc7759f090201030c64211b) | DOI
[28] WTO, an ontology for wheat traits and phenotypes in scientific publications, Genomics & Informatics, Volume 18 (2020), e14 | DOI
[29] TaeC : A manually annotated text dataset for trait and phenotype extraction and entity linking in wheat breeding literature – journals.plos.org, PLoS ONE, Volume 19 (2024) no. 6, e0305475
[30] Named Entity Recognition and Relation Detection for Biomedical Information Extraction, Frontiers in Cell and Developmental Biology, Volume 8 (2020), 673, 310 https://www.frontiersin.org/... | HAL | DOI
[31] FaBiO and CiTO : Ontologies for describing bibliographic resources and citations, Journal of Web Semantics, Volume 17 (2012), pp. 33-43 https://www.sciencedirect.com/... | DOI
[32] The Knowledge Graph Explorer for the Virtual Record Treasury of Ireland, Proceedings of the 9th International Workshop on the Visualization and Interaction for Ontologies, Linked Data and Knowledge Graphs co-located with the 23rd International Semantic Web Conference (ISWC 2024), Baltimore, USA, November 12, 2024 (Bo Fu; Patrick Lambrix; Huanyu Li; Susana Nunes; Catia Pesquita, eds.) (CEUR Workshop Proceedings), Volume 3773, CEUR-WS.org (2024), pp. 47-61 https://ceur-ws.org/Vol-3773/paper4.pdf | HAL
[33] BELMiner : adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences, Database, Volume 2017 (2017), baw156, 12 pages | DOI
[34] Web Annotation Ontology, https://www.w3.org/TR/annotation-vocab/, 2017
[35] AGDISTIS – Graph-Based Disambiguation of Named Entities Using Linked Data, International Workshop on the Semantic Web, Springer International Publishing, Cham (2014), pp. 457-471 https://api.semanticscholar.org/CorpusID:14301767 | HAL | DOI
[36] LD-VOWL : Extracting and Visualizing Schema Information for Linked Data Endpoints, Proceedings of the Second International Workshop on Visualization and Interaction for Ontologies and Linked Data co-located with the 15th International Semantic Web Conference, VOILAISWC 2016, Kobe, Japan, October 17, 2016 (Valentina Ivanova; Patrick Lambrix; Steffen Lohmann; Catia Pesquita, eds.) (CEUR Workshop Proceedings), Volume 1704, CEUR-WS.org (2016), e14, pp. 120-127 https://ceur-ws.org/Vol-1704/paper11.pdf | DOI
[37] Multi-Level Visual Tours of Weather Linked Data, Proceedings of VOILA’2022 co-located with the 21st International Semantic Web Conference (ISWC), Hangzhou, China (2022) | HAL
[38] Construction d’un graphe de connaissances à partir des annotations d’articles scientifiques et de leur contenu en sciences de la vie, IC’2022 – PFIA 2022 Journées francophones d’Ingénierie des Connaissances, Saint-Etienne, France (2022) | HAL | DOI
Cité par Sources :