Mobile Encrypted Traffic Classification Using Deep Learning: Experimental Evaluation, Lessons Learned, and Challenges 论文

2019IEEE Transactions on Network and Service Management引用 492
Internet Traffic Analysis and Secure E-votingNetwork Security and Intrusion DetectionAdvanced Malware Detection Techniques

详细信息

发表期刊/会议
IEEE Transactions on Network and Service Management
发表日期
2019-02-12
发表年份
2019

关键词

Internet Traffic Analysis and Secure E-votingNetwork Security and Intrusion DetectionAdvanced Malware Detection Techniques

摘要

The massive adoption of hand-held devices has led to the explosion of mobile traffic volumes traversing home and enterprise networks, as well as the Internet. Traffic classification (TC), i.e., the set of procedures for inferring (mobile) applications generating such traffic, has become nowadays the enabler for highly valuable profiling information (with certain privacy downsides), other than being the workhorse for service differentiation/blocking. Nonetheless, the design of accurate classifiers is exacerbated by the raising adoption of encrypted protocols (such as TLS), hindering the suitability of (effective) deep packet inspection approaches. Also, the fast-expanding set of apps and the moving-target nature of mobile traffic makes design solutions with usual machine learning, based on manually and expert-originated features, outdated and unable to keep the pace. For these reasons deep learning (DL) is here proposed, for the first time, as a viable strategy to design practical mobile traffic classifiers based on automatically extracted features, able to cope with encrypted traffic, and reflecting their complex traffic patterns. To this end, different state-of-the-art DL techniques from (standard) TC are here reproduced, dissected (highlighting critical choices), and set into a systematic framework for comparison, including also a performance evaluation workbench. The latter outcome, although declined in the mobile context, has the applicability appeal to the wider umbrella of encrypted TC tasks. Finally, the performance of these DL classifiers is critically investigated based on an exhaustive experimental validation (based on three mobile datasets of real human users' activity), highlighting the related pitfalls, design guidelines, and challenges.