PresentationInstitutional context
The team is made up of staff from INRA and AgroParisTech. It forms part of the Met@risk Research Unit, which in turn belongs to the Applied Mathematics and Informatics Division (MIA) of INRA (http://www.inra.fr/mia/) and the Mathematical Modeling, Informatics and Physics Department (MMIP) of AgroParisTech (http://www.agroparistech.fr/spip.php?rubrique652).
General context and field of application
Our research lies at the interface between artificial intelligence and databases. We are particularly interested in the representation of heterogeneous and imprecise data, their validation and their extended querying through the use of techniques borrowed from fuzzy logic. We are also studying the automatic completion of databases using information extracted from the Web.
Our research themes are as follows:
Data integrationWe have chosen to deal with the heterogeneity of data by constructing a data warehouse that integrates three separate data sources by using the same query interface (cf. Figure showing the architecture of the MIEL++ query system): a relational database that contains stable data, a conceptual graph database containing semi-structured data and an XML/RDF database containing data extracted semi-automatically from the Web. These three databases are indexed using the same ontology containing knowledge on the area of application studied (in this case, the prevention of microbiological and chemical risks in foods). A MIEL++ query is executed simultaneously by the three sub-systems. The responses from these sub-systems are presented in a single format (table) to the user who does not need to know the internal organization of the warehouse.
Representation of imprecise data by means of fuzzy setsWe propose to represent imprecise data in the three databases of our warehouse in the form of possibility distributions. To achieve this, we have based our work on Zadeh's theory of fuzzy sets. In particular, we propose extending the conceptual graph model to the representation of imprecise data. Flexible mechanisms for database queries to make up for incompletenessTo make up for the incompleteness of our data warehouse, we provide users with an opportunity to express preferences (represented by fuzzy sets) in the selection criteria of their queries to the data warehouse. A degree of pertinence is associated with each of the answers corresponding to their query. This degree measures the level of adequacy of the answer to the fuzzy selection criteria of the query. We have focused in particular on extending this query mechanism to data whose definition domain is a specialization hierarchy. Automatic enrichment of the data warehouse with data extracted from the WebThis automatic enrichment of our warehouse also provides a solution to its incompleteness. We are working on extracting information from scientific publications containing tables of data. These documents, which are mainly available in a pdf or html format, are transformed into XML/RDF documents labeled semantically using the ontology of the warehouse so that they can be interrogated.
Architecture of the WEB data acquisition system, AQWEB Data validationOur work has also focused on the syntactic and semantic validation of a database expressed in terms of conceptual graphs. The semantic validation of a database is achieved relative to a series of constraints that we consider as expert knowledge external to the database, only supplied for the purposes of validation. These constraints are placed in two categories: (i) negative constraints that allow the study of database coherence, and (ii) positive constraints that allow the study of database completeness.
Writing:
Unité Mét@risk, P. Buche
Creation date: 23 August 2008 Update: 06 March 2009 |