Capuozzo Pasquale

Curriculum
Neuroscience, Technology, and Society, XXXIII series
Grant sponsor
UNIPD
Supervisor 
Giuseppe Sartori
Co-supervisors
Mauro Conti

Project description
Recently, the detection of deceptive behaviours is receiving a growing interest both in the scientific research setting and in the real-world field. Indeed, reviews-based companies (i.e. Amazon, Airbnb, Booking etc.) can be damaged by people who intentionally write fake reviews (for instance, a competitor interested in damaging the adversary). Furthermore, a recent report states that social media now outperforms television as the major news source. For this reason, systems able to spot fake news are necessary to avoid the establishment of false beliefs in the population. Since the human judgement is poor in identifying cues of deception and detecting lies, tools able to improve the deception detection represent a pressing request. The main goal of the present research project is to spot verbal deception cues able to differentiate at best between truthful and deceptive narratives. The research plan is divided into three main different phases: 1) building a new large Italian dataset that will be made available to the international scientific community; 2) using the data collected in the first phase we will extract the general domain features indicative of deception through Natural Language Processing analysis and then we will build a Machine Learning model for classifying a narrative as truthful or deceptive, 3) we will try to generalize our results to other online freely available datasets. Furthermore, we know that when the topic of investigation occurs both in the training of Machine Learning model and the testing phase the accuracy rate is significantly better, but the model is not able to generalize that results in a new classification task on a different topic. For this reason, our goal is to find general-domain features (i.e. syntactical features) indicative of deception. From a theoretical point of view, using general-domain features should allow us to build a Machine Learning classification model able to generalize the results to other datasets, never seen before by the system.