Construction et exploitation d’un graphe de connaissances sur la littérature scientifique en sciences de la vie
Revue Ouverte d'Intelligence Artificielle, Post-actes de la conférence Ingénierie des Connaissances (IC 2021-2022-2023), Volume 6 (2025) no. 1-2, pp. 107-129

Dans cet article, nous présentons un graphe de connaissances construit à partir d’un corpus d’articles scientifiques portant sur les méthodes de sélection génomique pour la culture du blé. Ces méthodes contribuent à l’amélioration du profil agronomique et de la qualité des variétés de blé. La littérature scientifique sur le sujet est en croissance continue ces vingt dernières années. Dans un premier temps, un outil de traitement automatique du langage nous a permis d’extraire et de normaliser différentes entités nommées en les associant à des concepts préalablement définis dans une ontologie du domaine. Ces entités se réfèrent à des noms de gènes, traits, phénotypes, marqueurs et variétés (cultivars) de blé. Nous avons construit un graphe qui intègre, structure et décrit ces entités en se basant sur l’ontologie W3C Web Annotation Ontology (OA) pour formaliser la description du contexte d’apparition de ces entités au sein du corpus et apporter ainsi des indications sur les liens et les associations fréquentes entre ces éléments. En s’appuyant sur un ensemble de questions de compétence formulées par un expert du domaine, nous avons validé la pertinence du modèle proposé et par conséquent le graphe de connaissances généré. Afin de rendre notre graphe accessible à un grand nombre d’utilisateurs, nous avons développé plusieurs interfaces de recherche et de visualisation permettant d’explorer les contextes d’apparition de plusieurs entités (de différents types) dans le même article. Ce travail contribue à la structuration, à la compréhension et l’exploration des connaissances dans le domaine de la sélection génomique du blé, en fournissant un cadre formel pour la découverte et l’analyse des relations entre les entités pertinentes du domaine. Nous proposons une méthode d’ingénierie des connaissances de bout en bout, générique et adaptable à la valorisation d’un corpus de littérature scientifique quel que soit son domaine scientifique.

In this article, we present a knowledge graph built from a corpus of scientific articles on genomic selection methods for wheat culture. These main purpose of these methods is to improve the agronomic profile and quality of wheat varieties. The scientific literature on the subject has been growing steadily over the last twenty years. Initially, an NLP tool enabled us to extract and normalize various named entities by linking them to concepts previously defined in relevant domain ontologies. These entities refer to the names of genes, traits, phenotypes, markers and varieties (cultivars). The graph presented in this work structures and integrates these entities, based on the W3C Web Annotation Ontology (OA). The use of the OA ontology enables us to formalize the description of the context in which entities appear in the text. In this way, the graph highlights the context of appearance of these entities within the corpus, providing indications of the links and frequent associations between them. Based on a set of competency questions formulated by a domain expert, we validated the relevance of the proposed model and consequently the knowledge graph generated. In order to make our graph accessible to a large number of users, we developed several search and visualization interfaces enabling exploration of the contexts of appearance of several entities (of different types) in the same article. This work contributes to the structuring, understanding and exploration of knowledge in the field of wheat genomic selection, by providing a formal framework for the discovery and analysis of relationships between relevant entities in the domain. We propose an end-to-end knowledge engineering methodology that is both generic and adaptable, designed to facilitate the exploration and the analysis of scientific literature corpora across diverse disciplinary domains.

Publié le :
DOI : 10.5802/roia.95
Mots-clés : Données liées, ontologies, annotation sémantique, graphes de connaissances, fouille de textes.
Keywords: Linked Data, ontologies, semantic annotation, knowledge Graphs, text Mining.

Nadia Yacoubi Ayadi 1 ; Catherine Faron 2 ; Franck Michel 3 ; Robert Bossy 4 ; Arnaud Barbe 2

1 Université de Lyon 1, CNRS, LIRIS, UMR 5205 (France)
2 Université Côte d’Azur, Inria, CNRS, I3S, UMR 7271 (France)
3 Université Côte d’Azur, CNRS, Inria, I3S, UMR 7271 (France)
4 MaIAGE, INRAE, Université Paris-Saclay, 78350 Jouy-en-Josas (France)
Licence : CC-BY 4.0
Droits d'auteur : Les auteurs conservent leurs droits
@article{ROIA_2025__6_1-2_107_0,
     author = {Nadia Yacoubi Ayadi and Catherine Faron and Franck Michel and Robert Bossy and Arnaud Barbe},
     title = {Construction et exploitation d{\textquoteright}un graphe de connaissances sur la litt\'erature scientifique en sciences de la vie},
     journal = {Revue Ouverte d'Intelligence Artificielle},
     pages = {107--129},
     year = {2025},
     publisher = {Association pour la diffusion de la recherche francophone en intelligence artificielle},
     volume = {6},
     number = {1-2},
     doi = {10.5802/roia.95},
     language = {fr},
     url = {https://roia.centre-mersenne.org/articles/10.5802/roia.95/}
}
TY  - JOUR
AU  - Nadia Yacoubi Ayadi
AU  - Catherine Faron
AU  - Franck Michel
AU  - Robert Bossy
AU  - Arnaud Barbe
TI  - Construction et exploitation d’un graphe de connaissances sur la littérature scientifique en sciences de la vie
JO  - Revue Ouverte d'Intelligence Artificielle
PY  - 2025
SP  - 107
EP  - 129
VL  - 6
IS  - 1-2
PB  - Association pour la diffusion de la recherche francophone en intelligence artificielle
UR  - https://roia.centre-mersenne.org/articles/10.5802/roia.95/
DO  - 10.5802/roia.95
LA  - fr
ID  - ROIA_2025__6_1-2_107_0
ER  - 
%0 Journal Article
%A Nadia Yacoubi Ayadi
%A Catherine Faron
%A Franck Michel
%A Robert Bossy
%A Arnaud Barbe
%T Construction et exploitation d’un graphe de connaissances sur la littérature scientifique en sciences de la vie
%J Revue Ouverte d'Intelligence Artificielle
%D 2025
%P 107-129
%V 6
%N 1-2
%I Association pour la diffusion de la recherche francophone en intelligence artificielle
%U https://roia.centre-mersenne.org/articles/10.5802/roia.95/
%R 10.5802/roia.95
%G fr
%F ROIA_2025__6_1-2_107_0
Nadia Yacoubi Ayadi; Catherine Faron; Franck Michel; Robert Bossy; Arnaud Barbe. Construction et exploitation d’un graphe de connaissances sur la littérature scientifique en sciences de la vie. Revue Ouverte d'Intelligence Artificielle, Post-actes de la conférence Ingénierie des Connaissances (IC 2021-2022-2023), Volume 6 (2025) no. 1-2, pp. 107-129. doi: 10.5802/roia.95

[1] Mouhamadou Ba; Robert Bossy Interoperability of corpus processing work-flow engines : the case of AlvisNLP/ML in OpenMinTeD, Proceedings of the Workshop on Cross-Platform Text Mining and Natural Language Processing Interoperability (INTEROP 2016) at LREC 2016, Portorož, Slovenia (2016), 983805, pp. 15-18 | DOI

[2] Eleonora Bernasconi; Miguel Ceriani; Davide Di Pierro; Stefano Ferilli; Domenico Redavid Linked Data Interfaces : A Survey, Inf., Volume 14 (2023) no. 9, p. 483 | DOI

[3] Robert Bossy; Louise Deleger; Estelle Chaix; Mouhamadou Ba; Claire Nédellec Bacteria Biotope at BioNLP Open Shared Tasks 2019, 5th Workshop on BioNLP Open Shared Tasks BioNLP-OSTEMNLP-IJCNLP 2019, ACL (2019) | HAL | DOI

[4] Jiaoyan Chen; Hang Dong; Janna Hastings; Ernesto Jiménez-Ruiz; Vanessa López; Pierre Monnin; Catia Pesquita; Petr Å koda; Valentina Tamma Knowledge Graphs for the Life Sciences : Recent Developments, Challenges and Opportunities, Transactions on Graph Data and Knowledge, Volume 1 (2023) no. 1, 5, 33 pages | HAL | DOI

[5] Ian Davis; Richard Newman Expression of Core FRBR Concepts in RDF, https://vocab.org/frbr/core | DOI

[6] Pierre-Yves Genest; Pierre-Edouard Portier; Elöd Egyed-Zsigmond; Laurent-Walter Goix PromptORE – A Novel Approach Towards Fully Unsupervised Relation Extraction, Proceedings of the 31st ACM International Conference on Information & Knowledge Management, ACM (2022), pp. 561-571 | DOI

[7] John M. Giorgi; Gary D. Bader Towards reliable named entity recognition in the biomedical domain, Bioinformatics, Volume 36 (2019) no. 1, pp. 280-286 | arXiv | DOI

[8] Marvin Hofer; Daniel Obraczka; Alieh Saeedi; Hanna Köpcke; Erhard Rahm Construction of Knowledge Graphs : Current State and Challenges, Information, Volume 15 (2024) no. 8, 509, 61 pages | arXiv | DOI

[9] Aidan Hogan; Eva Blomqvist; Michael Cochez; Claudia d’Amato; Gerard de Melo; Claudio Gutierrez; José Emilio Labra Gayo; Sabrina Kirrane; Sebastian Neumaier; Axel Polleres; Roberto Navigli; Axel-Cyrille Ngonga Ngomo; Sabbir M. Rashid; Anisa Rula; Lukas Schmelzeisen; Sequedan Juan F.; Steffen Staab; Antoine Zimmermann Knowledge Graphs (2020), 509 (https://arxiv.org/abs/2003.02320) | DOI

[10] Ming-Siang Huang; Jen-Chieh Han; Pei-Yen Lin; Yu-Ting You; Richard Tzong-Han Tsai; Wen-Lian Hsu Surveying biomedical relation extraction : a critical examination of current datasets and the proposal of a new resource, Briefings in Bioinformatics, Volume 25 (2024) no. 3, bbae132, 17 pages | DOI

[11] Pere-Lluís Huguet Cabot; Roberto Navigli REBEL : Relation Extraction By End-to-end Language generation, Findings of the Association for Computational Linguistics : EMNLP 2021 (Marie-Francine Moens; Xuanjing Huang; Lucia Specia; Scott Wen-tau Yih, eds.), Association for Computational Linguistics, Punta Cana, Dominican Republic (2021), bbae132, pp. 2370-2381 https://aclanthology.org/2021.findings-emnlp.204 | DOI

[12] Esko Ikkala; Eero Hyvönen; Heikki Rantala; Mikko Koho Sampo-UI : A full stack JavaScript framework for developing semantic portal user interfaces, Semantic Web, Volume 13 (2021), pp. 69-84 https://api.semanticscholar.org/CorpusID:233555382 | DOI

[13] Jin-Dong Kim; Jung-Jae Kim; Xu Han; Dietrich Rebholz-Schuhman Extending the evaluation of Genia Event task toward knowledge base construction and comparison to Gene Regulation Ontology task, BMC bioinformatics, Volume 16 (2015), bay147, p. S3 | DOI

[14] Jin-Dong Kima; Karin Verspoorb; Michel Dumontierc; K Bretonnel Cohend Semantic representation of annotation involving texts and linked data resources (2015) (Semantic Web journal)

[15] Pierre Larmande; Konstantin Todorov AgroLD : A knowledge graph for the plant sciences, ISWC 2021 – 20th International Semantic Web Conference (Lecture Notes in Computer Science), Volume 12922, Springer International Publishing, Virtual, France (2021), pp. 496-510 | DOI

[16] Robert Leaman; Ritu Khare; Zhiyong Lu Challenges in clinical natural language processing for automated disorder normalization, Journal of biomedical informatics, Volume 57 (2015), pp. 28-37 | DOI

[17] Xiangyu Lin; Tianyi Liu; Weijia Jia; Zhiguo Gong Distantly Supervised Relation Extraction using Multi-Layer Revision Network and Confidence-based Multi-Instance Learning, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (Marie-Francine Moens; Xuanjing Huang; Lucia Specia; Scott Wen-tau Yih, eds.), Association for Computational Linguistics, Punta Cana, Dominican Republic (2021), pp. 165-174 https://aclanthology.org/2021.emnlp-main.15 | DOI

[18] Weijie Liu; Peng Zhou; Zhe Zhao; Zhiruo Wang; Qi Ju; Haotang Deng; Ping Wang K-BERT : Enabling Language Representation with Knowledge Graph (2019) (https://arxiv.org/abs/1909.07606) | DOI

[19] Steffen Lohmann; Vincent Link; Eduard Marbach; Stefan Negru WebVOWL : Web-based Visualization of Ontologies, International Conference Knowledge Engineering and Knowledge Management, Springer-Verlag, Berlin, Heidelberg (2014), pp. 154-158 https://api.semanticscholar.org/CorpusID:40280600 | DOI

[20] Aline Menin; Pierre Maillot; Catherine Faron; Olivier Corby; Carla Maria Dal Sasso Freitas; Fabien Gandon; Marco Winckler LDViz : a tool to assist the multidimensional exploration of SPARQL endpoints, Web Information Systems and Technologies : 16th International Conference, WEBIST 2020 (LNBIP - Lecture Notes in Business Information Processing), Volume LNBIP – 469, Springer, 2023, pp. 149-173 | HAL | DOI

[21] Franck Michel; Loïc Djimenou; Faron-ZuckerCatherine; Johan Montagnat Translation of Relational and Non-relational Databases into RDF with xR2RML, WEBIST 2015 – Proceedings of the 11th International Conference on Web Information Systems and Technologies, Lisbon, Portugal, 20-22 May, 2015 (Valérie Monfort; Karl-Heinz Krempels; Tim A. Majchrzak; Ziga Turk, eds.), SciTePress (2015), pp. 443-454 | DOI

[22] Franck Michel; Steven Essam WheatGenomicsSLKG Web Application Backend Services, https://github.com/Wimmics/wheatgenomicsslkg-web-backend/tree/1.0, 2024 ([Software] V1.0, DOI : 10.5281/zenodo.10514504)

[23] Franck Michel; Steven Essam WheatGenomicsSLKG Web Visualization, https://github.com/Wimmics/wheatgenomicsslkg-web-visualization/tree/1.0, 2024 ([Software] V1.0, DOI : 10.5281/zenodo.10514502) | DOI

[24] Franck Michel; Catherine Faron Zucker; Olivier Gargominy; Fabien Gandon Integration of Web APIs and Linked Data Using SPARQL Micro-Services – Application to Biodiversity Use Cases, Information, Volume 9 (2018) no. 12, 310 | HAL | DOI

[25] Franck Michel; F. Gandon; V. Ah-Kane; A. Bobasheva; E. Cabrio; O. Corby; R. Gazzotti; A. Giboin; S. Marro; T. Mayer; M. Simon; S. Villata; M. Winckler Covid-on-the-Web : Knowledge Graph and Services to Advance COVID-19 Research, 19th International Semantic Web Conference, Athens, Greece, November 2-6, 2020, Proceedings, Part II (Lecture Notes in Computer Science), Volume 12507, Springer (2020), pp. 294-310 | DOI | HAL

[26] Franck Michel; Fabien Gandon; Valentin Ah-Kane; Anna Bobasheva et al. Covid-on-the-Web : Graphe de Connaissances et Services pour faire Progresser la Recherche sur la COVID-19, IC 2021 – 32e Journées francophones d’Ingénierie des Connaissances, Bordeaux, France, Maxime Lefrançois (2021), pp. 1-9 | HAL

[27] Franck Michel; Freddy Priyatna; Oscar Corcho Morph-xR2RML, https://github.com/frmichel/morph-xr2rml/tree/morph-xr2rml-1.3.2, 2021 ([Software] V1.3.2, SWHID : swh :1 :rev :2494b1da7b128e38edc7759f090201030c64211b) | DOI

[28] Claire Nédellec; Liliana Ibanescu; Robert Bossy; Pierre Sourdille WTO, an ontology for wheat traits and phenotypes in scientific publications, Genomics & Informatics, Volume 18 (2020), e14 | DOI

[29] C Nédellec; C Sauvion; R Bossy; M Borovikova; L Deléger TaeC : A manually annotated text dataset for trait and phenotype extraction and entity linking in wheat breeding literature – journals.plos.org, PLoS ONE, Volume 19 (2024) no. 6, e0305475

[30] Nadeesha Perera; Matthias Dehmer; Frank Emmert-Streib Named Entity Recognition and Relation Detection for Biomedical Information Extraction, Frontiers in Cell and Developmental Biology, Volume 8 (2020), 673, 310 https://www.frontiersin.org/... | HAL | DOI

[31] Silvio Peroni; David Shotton FaBiO and CiTO : Ontologies for describing bibliographic resources and citations, Journal of Web Semantics, Volume 17 (2012), pp. 33-43 https://www.sciencedirect.com/... | DOI

[32] Alex Randles; Lucy McKenna; Lynn Kilgallon; Beyza Yaman; Peter Crooks; Declan O’Sullivan The Knowledge Graph Explorer for the Virtual Record Treasury of Ireland, Proceedings of the 9th International Workshop on the Visualization and Interaction for Ontologies, Linked Data and Knowledge Graphs co-located with the 23rd International Semantic Web Conference (ISWC 2024), Baltimore, USA, November 12, 2024 (Bo Fu; Patrick Lambrix; Huanyu Li; Susana Nunes; Catia Pesquita, eds.) (CEUR Workshop Proceedings), Volume 3773, CEUR-WS.org (2024), pp. 47-61 https://ceur-ws.org/Vol-3773/paper4.pdf | HAL

[33] K. E. Ravikumar; Majid Rastegar-Mojarad; Hongfang Liu BELMiner : adapting a rule-based relation extraction system to extract biological expression language statements from bio-medical literature evidence sentences, Database, Volume 2017 (2017), baw156, 12 pages | DOI

[34] Robert Sanderson; Paolo Ciccarese; Benjamin Young Web Annotation Ontology, https://www.w3.org/TR/annotation-vocab/, 2017

[35] Ricardo Usbeck; Axel-Cyrille Ngonga Ngomo; Michael Röder; Daniel Gerber; Sandro Coelho; S. Auer; Andreas Both AGDISTIS – Graph-Based Disambiguation of Named Entities Using Linked Data, International Workshop on the Semantic Web, Springer International Publishing, Cham (2014), pp. 457-471 https://api.semanticscholar.org/CorpusID:14301767 | HAL | DOI

[36] Marc Weise; Steffen Lohmann; Florian Haag LD-VOWL : Extracting and Visualizing Schema Information for Linked Data Endpoints, Proceedings of the Second International Workshop on Visualization and Interaction for Ontologies and Linked Data co-located with the 15th International Semantic Web Conference, VOILAISWC 2016, Kobe, Japan, October 17, 2016 (Valentina Ivanova; Patrick Lambrix; Steffen Lohmann; Catia Pesquita, eds.) (CEUR Workshop Proceedings), Volume 1704, CEUR-WS.org (2016), e14, pp. 120-127 https://ceur-ws.org/Vol-1704/paper11.pdf | DOI

[37] Nadia Yacoubi; Damien Graux; Catherine Faron Multi-Level Visual Tours of Weather Linked Data, Proceedings of VOILA’2022 co-located with the 21st International Semantic Web Conference (ISWC), Hangzhou, China (2022) | HAL

[38] Nadia Yacoubi Ayadi; Catherine Faron; Franck Michel; Robert Bossy; Arnaud Barbe Construction d’un graphe de connaissances à partir des annotations d’articles scientifiques et de leur contenu en sciences de la vie, IC’2022 – PFIA 2022 Journées francophones d’Ingénierie des Connaissances, Saint-Etienne, France (2022) | HAL | DOI

Cité par Sources :