17th Italian Symposium on Advanced Database Systems

June 21st - 24th 2009, Camogli (Genova), Italy

Tutorials

Querying Large-Scale Ontologies

Ralf Möller, Hamburg University of Technology, Germany

Terabytes of data will be a common size to be managed on personal computers in the not too far future, and database technology has matured in such a way that this amount of data is manageable to a large extent even if queries are defined in an ad hoc manner. In particular, a common assumption in database applications is that the conceptual schema is used for deriving the implementation schema (e.g., relational schema), which is then used for storing data. Views might relate notions from the conceptual to the implementation schema, and if updates are neglected, views can easily be used to manage a database w.r.t. the conceptual schema. In almost all practical database systems, data is considered to be complete, with a corresponding impact on query answering. In new application contexts such as the semantic web, however, software agents migrate to new sites, and they use their conceptual schema for querying large data repositories found at di erent sites. With new schemata being used, data can hardly be seen as complete, and thus, query answering w.r.t. incomplete information becomes more and more important. Using ontologies, query answering with respect to views and incomplete data descriptions becomes possible. In the tutorial we present recent advances in query answering techniques for for large sets of data descriptions w.r.t. large and expressive ontologies (large sets of axioms specified in an expressive language) under the assumption that data descriptions are assumed to be incomplete. We present query languages which can be practically used in combination with ontologies of varying expressiveness.

About the Speaker

Ralf Möller is Professor for Computer Science at Hamburg University of Technology (since 2003). From 2001 until 2003 he was Professor for Computer Science at the University of Applied Sciences in Wedel/Germany. In 1996 he received the degree Dr. rer. nat. from the University of Hamburg and successfully submitted his Habilitation thesis in 2001 also at the University of Hamburg. His research interests include software technology for distributed systems as well as the application and theory of conceptual modeling and knowledge representation languages. His research goals encompass the development practical inference algorithms for embedding description logic systems into software engineering and web technology. Together with Prof. Volker Haarslev (Concordia Univ. Montreal) and Michael Wessel he is the principal architect of the description logic reasoner Racer, which is being used as a core engine for building ontology development tools as well as agent systems for the semantic web by many research groups all around the world. Racer includes an abduction component which is used in the BOEMIE project to formalize multimedia content interpretation.

Prof. Möller was the co-organizer of several international workshops on description logics and is the author of numerous workshop and conference papers as well as several book and journal contributions in this research area. From 2001 to 2004 he was the co-project leader of a DFG project for developing description logic inference systems in particular for supporting Aboxes and spatial applications (the project was organized in collaboration with B. Neumann and V. Haarslev). Prof. lead the TUHH part of the DFG project PRESINT (PREference-based Scene INTerpretation). Prof. Möller also leads the TUHH group of the EU-funded research projects TONES (FP6-7603), BOEMIE (FP6-027538) and CASAM (FP7-217061)

Data Mining in Drug Discovery

Luca Sartori, Computational Science Group, DAC s.r.l (Genextra Group), Italy

Pharmaceutical Industry is facing with various needs and high data volumes coming from the most diverse sources. Historically, all started with chemical structure databases, in order to store, search and browse in a chemically wise manner all the compounds and related data. In this environment data integration and data mining are far more complexes than in other industries, and the underlying data models, functions and procedures require a major effort in integration, both for contents and data type. This lecture will mainly focus on the following topics:

  • Databases in the Pharmaceutical Industry: history, types, needs
  • Data Mining in Drug Discovery
  • Examples from various drug discovery projects, following their main steps: Target Identification (usually a protein or a gene), Hit Identification (usually a small chemical structure), Hit to Lead (optimization of its activity), Lead Optimization (drug profile enhancements), Drug Candidate Selection

About the Speaker

Luca Sartori is a chemist by education; he studied at the University of Milan in 1980-1986. In Pharmacia, as Head of Research Informatics Group, Luca reported to Pieter Stouten, Head of Computational Sciences Unit, and he was a member of the Global Discovery Database Steering Committee and of the GDD/ChemLink Team. He joined Genextra in March 2006 to build the Computational Science Unit in the Chemistry Department. His main focus is on the design and implementation of chemical, analytical and biological data registration systems and of informatics systems for lab automations in Genextra. Main interest in the support of Genextra projects with computational techniques, in close collaboration with Medicinal Chemistry, Biology and Structural Biology. Luca has a strong background in chemometrics, database design and software development project management. He gave about fifteen invited lectures on modelling and data mining.