Monday, November 21, 2011

Tudor Dumitras of Symantec

Tudor Dumitras of Symantec's Research Lab in Herndon, VA (near Dulles airport) gave a talk on Friday morning about Symantec's WINE (Worldwide Intelligence Network Environment) dataset.  


Slides for a longer version of the talk are available, but here's a brief synopsis.


WINE's data sets are collected primarily by Symantec’s anti-virus products installed on millions of hosts worldwide.  Collected data contains malware samples, A/V and IPS results (i.e., which positive signatures have been discovered, what IP address they were from, what process they were targeting, etc.), spam samples, and binary and URL reputation data.  The latter is gathered by tracking unknown binaries installed on a user's machine, or URLs visited, and then correlating those installs/visits with negative or positive behavior that ensues.  They also gather URL data with a web crawler that looks for web sites engaged in attacks such as drive-by downloads (also accessible at safeweb.norton.com).  They are in the process of adding data from Android Norton, too, and have included some open source data sets in the collection (e.g., the open source vulnerability database).

All of this data is available for periods of several years, enabling historical comparisons to be made.

Researchers access the data at Symantec's site in Herndon (or at another site in California) and set up experiment scripts aimed to be reproducible by running on virtual machines in tightly controlled environments.  So far the data has been used for a variety of purposes, e.g., tracking the dissemination of Stuxnet and a variant of it, and for the development of a "cyber benchmark" (work in progress with  Leyla Bilge), among other applications (which he didn't talk about).

If you are interested in this data set and potential collaborative research with Symantec Labs, please let me know.  I think this is an interesting opportunity.

No comments:

Post a Comment