Dmitry KOVALEV, Institut des problèmes informatiques de l’Académie des Sciences de Russie, Moscou
Résumé :
Hypotheses remain the core unit of scientific experiment and knowledge discovery. Recent advances in hypotheses management enabled their deep integration with scientific models and observed data. Together with the advent of data intensive sciences and “big data movement” such integration leads to new ways of scientific research automation. We investigate novel approaches to implementation of triangle “hypotheses – models – data” applying big data platforms. Our research infrastructure is based on combination of ideas from database management, ontologies, knowledge representation and Semantic Web. We are looking for the generalization of approaches to hypothesis driven scientific experiment, considering different data intensive domains (e.g., astronomy, neuroscience, finance). To make such a generalization we collect sets of hypothesis representations in the domain (mathematical equations, logic formulas, database relations, etc.), data, experiment workflows, statistical tests to evaluate hypotheses, methods to constrain/estimate models and their parameters, simulations, various underlying infrastructure details. We illustrate our ideas with several use-cases and hope that the Besancon Galaxy Model might be one of the basic cornerstones of the generalized infrastructure.