Identify yourself

Home page >  Research Unit >  Data integration >  Presentation

Presentation

Institutional context

 

The team is made up of staff from INRA and AgroParisTech. It forms part of the Met@risk Research Unit, which in turn belongs to the Applied Mathematics and Informatics Division (MIA) of INRA (http://www.inra.fr/mia/) and the Mathematical Modeling, Informatics and Physics Department (MMIP) of AgroParisTech (http://www.agroparistech.fr/spip.php?rubrique652).

 

General context and field of application

 

Our research lies at the interface between artificial intelligence and databases. We are particularly interested in the representation of heterogeneous and imprecise data, their validation and their extended querying through the use of techniques borrowed from fuzzy logic. We are also studying the automatic completion of databases using information extracted from the Web.
The field of application of our work is the prevention of microbiological and chemical risks in foods. In this context, our team is working on the construction of a warehouse of thematic data which aims to contain data from different sources: scientific literature, information from industrial partners, etc.

 

Our research themes are as follows:

 

Data integration

We have chosen to deal with the heterogeneity of data by constructing a data warehouse that integrates three separate data sources by using the same query interface (cf. Figure showing the architecture of the MIEL++ query system): a relational database that contains stable data, a conceptual graph database containing semi-structured data and an XML/RDF database containing data extracted semi-automatically from the Web. These three databases are indexed using the same ontology containing knowledge on the area of application studied (in this case, the prevention of microbiological and chemical risks in foods). A MIEL++ query is executed simultaneously by the three sub-systems. The responses from these sub-systems are presented in a single format (table) to the user who does not need to know the internal organization of the warehouse.

Architecture of the MIEL++ query system
Architecture of the MIEL++ query system  

Representation of imprecise data by means of fuzzy sets

We propose to represent imprecise data in the three databases of our warehouse in the form of possibility distributions. To achieve this, we have based our work on Zadeh's theory of fuzzy sets. In particular, we propose extending the conceptual graph model to the representation of imprecise data.
In the context of our application, imprecision manifests itself in predictive microbiology data in different forms: (i) variability linked to the intrinsic complexity of biological processes; (ii) limit of detection of sensors; (iii) imprecision of the expression of results in data sources (scientific publications).

Flexible mechanisms for database queries to make up for incompleteness

To make up for the incompleteness of our data warehouse, we provide users with an opportunity to express preferences (represented by fuzzy sets) in the selection criteria of their queries to the data warehouse. A degree of pertinence is associated with each of the answers corresponding to their query. This degree measures the level of adequacy of the answer to the fuzzy selection criteria of the query. We have focused in particular on extending this query mechanism to data whose definition domain is a specialization hierarchy.
In the context of our application, the incompleteness of the warehouse is linked to the rarity of experimental data on contamination; it is unrealistic to think that information for all food product/contaminant of interest pairs can be found in the database.

Automatic enrichment of the data warehouse with data extracted from the Web

This automatic enrichment of our warehouse also provides a solution to its incompleteness. We are working on extracting information from scientific publications containing tables of data. These documents, which are mainly available in a pdf or html format, are transformed into XML/RDF documents labeled semantically using the ontology of the warehouse so that they can be interrogated.

 

Architecture of the WEB data acquisition system, AQWEB

Architecture of the WEB data acquisition system, AQWEB

Data validation

Our work has also focused on the syntactic and semantic validation of a database expressed in terms of conceptual graphs. The semantic validation of a database is achieved relative to a series of constraints that we consider as expert knowledge external to the database, only supplied for the purposes of validation. These constraints are placed in two categories: (i) negative constraints that allow the study of database coherence, and (ii) positive constraints that allow the study of database completeness.

 

 

Writing: Unité Mét@risk, P. Buche
Creation date: 23 August 2008
Update: 06 March 2009