Acta Informatica Pragensia 2015, 4(3), 206-225 | DOI: 10.18267/j.aip.704016

It Leaks More Than You Think: Fingerprinting Users from Web Traffic Analysis

Xujing Huang
School of Electronic Engineering and Computer Science, Queen Mary University of London, Mile End Road, E1 4NS, London, United Kingdom

We show how, in real-world web applications, confidential information about user identities can be leaked through "non-intuitive communications", in particular web traffic which appear to be not related to the user information. In fact, our experiments on Google users demonstrate that even Google accounts are vulnerable on traffic attacks against user identities, using packet sizes and directions. And this work shows this kind of non-intuitive communication can leak even more information about user identities than the traffic explicitly using confidential information. Our work highlights possible side-channel leakage through cookies and more generally discovers fingerprints in web traffic which can improve the probability of correctly guessing a user identity. Our analysis is motivated by Hidden Markov Model, distance metric and guessing probability to analyse and evaluate these side-channel vulnerabilities.

Keywords: Side-channel leakages, User identities, Web applications, Google accounts

Received: October 7, 2015; Revised: November 25, 2015; Accepted: November 30, 2015; Published: December 31, 2015  Show citation

ACS AIP APA ASA Harvard Chicago Chicago Notes IEEE ISO690 MLA NLM Turabian Vancouver
Huang, X. (2015). It Leaks More Than You Think: Fingerprinting Users from Web Traffic Analysis. Acta Informatica Pragensia4(3), 206-225. doi: 10.18267/j.aip.70
Download citation

References

  1. Alexa. (2015). Alexa Top websites. Retrieved from http://www.alexa.com/topsites
  2. Baum, L. E., & Petrie, T. (1966). Statistical inference for probabilistic functions of finite state Markov chains. The annals of mathematical statistics, 37(6), 1554-1563. Go to original source...
  3. Baum, L. E., & Eagon, J. A. (1967). An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bulletin of the American Mathematical Society, 73(3), 360-363. doi: 10.1090/S0002-9904-1967-11751-8 Go to original source...
  4. Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The annals of mathematical statistics, 41(1), 164-171. Go to original source...
  5. Bland, J. M., & Altman, D. G. (1996). Statistics notes: measurement error. British Medical Journal, 313(7059), 744. Go to original source...
  6. Cai, X., Zhang, X. C., Joshi, B., & Johnson, R. (2012). Touching from a distance: Website fingerprinting attacks and defenses. In Proceedings of the ACM conference on Computer and communications security (pp. 605-616). New York: ACM. Go to original source...
  7. Chakravarty, S., Stavrou, A., & Keromytis, A. D. (2010). Traffic analysis against low-latency anonymity networks using available bandwidth estimation. In Proceedings of the 15th European conference on Research in computer security (pp. 249-267). Berlin: Springer. Go to original source...
  8. Chapman, P., & Evans, D. (2011). Automated black-box detection of side-channel vulnerabilities in web applications. In Proceedings of the 18th ACM conference on Computer and communications security (pp. 263-274). New York: ACM. Go to original source...
  9. Cheng, H., & Avnur, R. (1998). Traffic analysis of ssl encrypted web browsing. Retrieved from https://www.cs.berkeley.edu/~daw/teaching/cs261-f98/projects/final-reports/ronathan-heyning.ps
  10. Danezis, G. (2010). Traffic Analysis of the HTTP Protocol over TLS. Retrieved from http://research.microsoft.com/en-us/um/people/gdane/papers/TLSanon.pdf
  11. Eckersley, P. (2010). How unique is your web browser?. In Proceedings of the 10th international conference on Privacy enhancing technologies (pp. 1-18). Berlin: Springer. Go to original source...
  12. Felten, E. W., & Schneider, M. A. (2000). Timing attacks on web privacy. In Proceedings of the 7th ACM conference on Computer and communications security (pp. 25-32). New York: ACM. Go to original source...
  13. Forney Jr, G. D. (1973). The viterbi algorithm. Proceedings of the IEEE, 61(3), 268-278. doi: 10.1109/PROC.1973.9030 Go to original source...
  14. Huang, X., & Malacaria, P. (2013). SideAuto: quantitative information flow for side-channel leakage in web applications. In Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society (pp. 285-290). New York: ACM. Go to original source...
  15. Backes, M., Doychev, G., & Kopf, B. (2013). Preventing side-channel leaks in web traffic: A formal approach. Retrieved from http://www.internetsociety.org/sites/default/files/04_2_0.pdf
  16. Jpcap. (2015). Jpcap. Retrieved from http://jpcap.sourceforge.net/
  17. Malacaria, P. (2015). Algebraic foundations for quantitative information flow. Mathematical Structures in Computer Science, 25(2), 404-428. doi: 10.1017/S0960129513000649 Go to original source...
  18. Navarro, G. (2001). A guided tour to approximate string matching. ACM computing surveys, 33(1), 31-88. doi: 10.1145/375360.375365 Go to original source...
  19. PlanetLab. (2015). PlanetLab Euorpe. Retrived from https://www.planet-lab.eu/
  20. Selenium. (2015). Selenium Webdriver. Retrieved from http://docs.seleniumhq.org/projects/webdriver/
  21. Smith, G. (2009). On the foundations of quantitative information flow. In Proceedings of the 12th International Conference on Foundations of Software Science and Computational Structures (pp. 288-302). Berlin: Springer. doi: 10.1007/978-3-642-00596-1_21 Go to original source...
  22. Wagner, D., & Schneier, B. (1996). Analysis of the SSL 3.0 protocol. In Proceedings of the 2nd conference USENIX Workshop on Electronic Commerce (pp. 29-40).
  23. Wondracek, G., Holz, T., Kirda, E., & Kruegel, C. (2010). A practical attack to de-anonymize social network users. In IEEE Symposium on Security and Privacy (pp. 223-238). New York: IEEE. Go to original source...
  24. Zhang, K., Li, Z., Wang, R., Wang, X., & Chen, S. (2010). Sidebuster: automated detection and quantification of side-channel leaks in web application development. In Proceedings of the 17th ACM conference on Computer and communications security (pp. 595-606). New York: ACM. Go to original source...

This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.