Friday 5 February 2021
Incorporation of prior knowledge for Neural Textual Information Retrieval
This thesis work is in the fields of textual information retrieval (IR) and deep learning using neural networks. The motivation for this thesis work is that the use of neural networks in textual IR has proven to be efficient under certain conditions but that their use still presents several limitations that can greatly restrict their application in practice.
In this thesis work, we propose to study the incorporation of prior knowledge to address 3 limitations of the use of neural networks for textual IR: (1) the need to have large amounts of labeled data, (2) a representation of the text-based only on statistical analysis, (3) the lack of efficiency. 
We focused on three types of prior knowledge to address the limitations mentioned above: (1) knowledge from a semi-structured resource: Wikipedia; (2) knowledge from structured resources in the form of semantic resources such as ontologies or thesauri; (3) knowledge from unstructured text.
At first, we propose WIKIR: an open-access toolkit to automatically build IR collections from Wikipedia. The neural networks trained on the collections created automatically need less labeled data afterward to achieve good performance. Secondly, we developed neural networks for IR that use semantic resources. The integration of semantic resources into neural networks allows them to achieve better performance for information retrieval in the medical field. Finally, we present neural networks that use knowledge from unstructured text to improve the performance and efficiency of non-learning baseline IR models.
Mis à jour le 28 January 2021