Minh Tien Nguyen - Detection of Automatically Generated Texts

Minh Tien Nguyen
Detailed information: 

Jury :

  • Catherine Berrut, professeur, Université Grenoble Alpes , président
  • Jacques Savoy, professeur, Université de Neuchâtel , rapporteur
  • Guillaume Cabanac, maître de conférences, Université Toulouse 3 - Paul Sabatier, rapporteur
  • Sylvie Calabreto, professeur, LIRIS-INSA Lyon , membre
  • Cyril Labbé, maître de conférences, Université Grenoble Alpes, directeur de thèse
  • Jeff Iezzi, Springer-Nature, Berlin, invité



Automatically generated text has been used in numerous occasions with distinct intentions. It can simply go from generated comments in an online discussion to a much more mischievous task, such as manipulating bibliography information. So, this thesis first introduces different methods of generating free texts that resemble a certain topic and how those texts can be used.
Therefore, we try to tackle multiple research questions. The first question is how and what is the best method to detect a fully generated document? Then, we take it one step further to address the possibility of detecting a couple of sentences or a small paragraph of automatically generated text by proposing a new method to calculate sentences similarity using their grammatical structure. The last question is how to detect an automatically generated document without any samples. This is used to address the case of a new generator or a generator from which it is impossible to collect samples.
This thesis also deals with the industrial aspect of development. A simple overview of a publishing workflow from a high-profile publisher is presented. From there, an analysis is carried out to be able to best incorporate our method of detection into the production workflow.
In conclusion, this thesis has shed light on multiple important research questions about the possibility of detecting automatically generated texts in different settings. Besides the research aspect, important engineering work in a real-life industrial environment is also carried out to demonstrate that it is important to have real application along with fundamental research.