Curriculum
Computer Science and Innovation for Societal Challenges, XXXVI series
Grant sponsor
CSC (China Scholarship Council)
Supervisor
Mauro Conti
Co-supervisor
s
Anna Spagnolli
Project: Machine Learning for Phishing Website Detection
Full text of the dissertation book can be downloaded from: https://www.research.unipd.it/handle/11577/3511376
Abstract: Phishing attacks are on the rise and phishing websites are everywhere, denoting the brittleness of security mechanisms reliant on blocklists. Prior work proposed enhancing Phishing Website Detectors (PWD) to mitigate this threat with data-driven techniques powered by Machine Learning (ML). The main advantage of ML models is their intrinsic ability of noticing weak patterns in the data that are overlooked by a human, and then leveraging such patterns to devise ‘flexible’ detectors that can counter even adaptive attackers. This dissertation addresses three significant aspects arising from the interaction between machine learning and phishing website detection: (i) Adversarial attack for machine learning-based phishing website detection (ML-PWD), (ii) User perceptions of Phishing webpages, and (iii) Phishing website detection in multi-language environment (i.e., Chinese and Western) The first part presents the security of ML-based phishing website detection. Existing literature on adversarial Machine Learning (ML) focuses either on showing attacks that break every ML model, or defenses that withstand most attacks. Unfortunately, little consideration is given to the actual cost of the attack or the defense. We formalize the “evasion space" in which an adversarial perturbation can be introduced to fool an ML-PWD and propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage. Our contribution paves the way for a much-needed re-assessment of adversarial attacks against ML systems for cybersecurity. The second part of the dissertation presents a study to understand user perceptions of phishing and adversarial phishing webpages. Adversarial phishing webpages containing perturbations can easily fool ML-based PWD, but it remains uncertain whether these perturbations enhance individuals’ ability to identify phish- ing webpages. Our study indicates adversarial phishing webpages containing typos are more likely to be perceived by users. The third - and last - part of the dissertation reveals the gap between Chinese and Western ML-based PWD, aiming to urge that future work in PWD should take into account the applicability of multilingual environments and pave the way for PWD systems that can protect users having different backgrounds.