Aller au contenu principal

Mehrdad Farokhnejad

Vendredi 15 Octobre 2021

Human-guided exploration of data collections


Data exploration aims to guide the understanding of data collections and define the type of questions that can be asked on top, often in interactive exploration processes. Data exploration deals with raw digital data collections coping with the uncertainty of data content and analysis where query results cannot be necessarily correct and complete (i.e., results consisting in all the data tuples respecting requirements expressed by a question). Data exploration engines will be next-generation systems promoting a new querying philosophy that gradually converges into queries that can exploit raw data collections that cope with data explorers (i.e., users) expectations.
This thesis proposes HILDEX, a human-in-the-loop based data exploration system that enables users to explore textual data collections by gradually refining queries and associated results. Textual data collections are pre-processed using Machine Learning and Artificial Intelligence text processing algorithms.
HILDEX implements exploration algorithms proposed in this work (query morphing, query-by example, queries-as-answers) that allow refining an initial query by considering the content of the collections to be explored to increase the possibility to explore the data better. Therefore, HILDEX proposes a workflow to explore texts by analysing data samples obtained by queries that can be refined through human in the loop-based tasks. Partial exploration results are assessed through metrics (precision, similarity) and information that explains why some documents are contained in these results. By exploring documents in partial results, explanations and metrics, the user can decide to continue interacting with HILDEX for rewriting queries until she is satisfied with both queries and results. The algorithms and HILDEX have been experimented on data related to crises in urban computing and the exploration of information on COVID-19.

Date et Lieu

Vendredi 15 Octobre à 16h00


Principal Scientist, CNRS-LIRIS
Postdoctoral Scientist, Université Lumière Lyon 2-ERIC

Composition du Jury

Laurent D’ORAZIO
Professor, Université de Rennes, France, Reviewer
Professor, UVSQ-Université Paris-Saclay, France, Reviewer
Professor, Université de Sorbonne, France, Examiner
Assistant Professor, Politecnico di Torino, Italy, Examiner
Professor, ENSEA, France, Examiner
Professor, Grenoble INP, France, Examiner
María Esther VIDAL
Professor, TIB Technische Informationsbibliothek, Germany, Examiner

Publié le 7 octobre 2021

Mis à jour le 7 octobre 2021