Romain David 1 Jean-Pierre Féral 1 Sophie Archambeau Fanny Arnaud 2 David Auber 3 Nicolas Bailly 4 Loup Bernard 5 Laure Berti-Équille 6, 7 Cyrille Blanpain 8 Vincent Breton 9 Anne Chenuil-Maurel 1 Anna Cohen Nabeiro Alrick Dias 1 Aurélie Delavaud 10 Robin Goffaud Sophie Gachet 1 Karina Gibert 11 Manuel Herrera Fernandez Luc Hogie 12 Dino Ienco 13 Romain Julliard 14 Yvan Le Bras 14 Julien Lecubin 8 Yannick Legre 9 Michelle Leydet 1 Grégoire Lois 14 Bénédicte Madon 15 François Marchal 16 Víctor Méndez Muñoz 17 Jean-Charles Meunier 18 Jean-Baptiste Mihoub 14 Isabelle Mougenot 19 Sophie Pamerlon Eric Peletier 20 Geneviève Romier 21 Dad Roux-Michollet 22 Alison Specht 23 Christian Surace 18 Jean-Claude Raynal 24 Thierry Tatoni 1
Abstract : Data produced within marine and terrestrial biodiversity research projects that evaluate and monitor Good Environmental Status, have a high potential for use by stakeholders involved in environmental management. However, environmental data, especially in ecology, are not readily accessible to various users. The specific scientific goals and the logics of project organization and information gathering lead to a decentralized data distribution. In such a heterogeneous system across different organizations and data formats, it is difficult to efficiently harmonize the outputs. Few tools are available to assist. For instance standards and specific protocols can be applied to interconnect databases. Such semantic approaches greatly increase data interoperability. This communication present the recent results and the consortium IndexMEED (Indexing for Mining Ecological and Environmental Data) activity that aims to build new approaches to investigate complex research questions, and support the emergence of new scientific hypotheses based on graph theory Auber et al. 2014). Current developments in data mining based on graphs, as well as the potential for relevant contributions to environmental research, particularly about strategic decision-making, and new ways of organizing data will be presented (David et al. 2015). In particular, the consortium makes decisions on how i) to analyze heterogeneous distributed data spread throughout different databases combining molecular and habitat characteristics data [3], ii) to create matches and incorporate some approximations, iii) to identify statistical relationships between observed data and the emergence of contextual patterns using a calculation library and distributed calculation center at the European level, iv) to encourage openness and sharing data while complying with the general principles of FAIR (Findable, Accessible, Interoperable, Re-usable and citable) in order to enhance data value and their utilization. IndexMEED participants are now exploring the ability of two scientific communities (ecology sensu lato and computer sciences) to work together, using several studies cases. The ECOSCOPE project aims to meet the need to access structured and complementary omics-datasets to better understand biodiversity state and its dynamics. Indeed, the ECOSCOPE case study targets to visualize, through the graph approach, links between datasets and databases from genetics to ecosystems. Another case study, displaying anthropology fossils and omics on the same graph, will also be presented. DEVOTES (DEVelopment Of innovative Tools for understanding marine biodiversity and assessing good Environmental Status) and CIGESMED (Coralligenous based Indicators to evaluate and monitor the "Good Environmental Status" of the MEDiterranean coastal water) European projects, conducted by IMBE, are focused on photo quadrats, cartography and omics data of the marine hard bottom in order to discover context patterns helpful to build decision support system building. Study case “65 Millions d’observateurs” French project is testing AskOmics to provide a graph-based querying interface using RDF (Resource Description Framework) and SPARQL technologies. Scientific questions can be resolved by the new data mining approaches that offer new ways to investigate heterogeneous environmental data with graph mining (Muñoz et al. 2017). The uses of data from biodiversity research demonstrate the prototype functionalities (David et al. 2016) and introduce new perspectives to analyze environmental and societal responses including decision-making at large scale, both at the information system level and the observing system level than at the observed system level.
