Salvatore J. Stolfo

From Infogalactic: the planetary knowledge core
Jump to: navigation, search

Salvatore J. Stolfo is a tenured professor of computer science at Columbia University in New York and a leading expert in computer security. He is known for his research in machine learning applied to computer security, intrusion detection systems, anomaly detection algorithms and systems, fraud detection, and parallel computing.

Image of Dr Salvatore Stolfo

Early life and education

Born in Brooklyn, New York, Stolfo received a Bachelor of Science degree in Computer Science and Mathematics from Brooklyn College in 1974. He received his Ph.D. from NYU Courant Institute in 1979 and has been on the faculty of Columbia ever since,[1] where he’s taught courses in Artificial Intelligence, Intrusion and Anomaly Detection Systems, Introduction to Programming, Fundamental Algorithms, Data Structures, and Knowledge-Based Expert Systems.[2]


While at Columbia, Stolfo has received close to $50M in funding [3] for research that has broadly focused on Security, Intrusion Detection, Anomaly Detection, Machine Learning and includes early work in parallel computing and artificial intelligence.[4] He has published or co-authored over 250 papers and has over 21,000 citations with an H-index of 67.[5] He pioneered research in a number of areas within computer security that are widely in use today. In 1996 he proposed a project with DARPA that applies machine learning to behavioral patterns to detect fraud or intrusion in networks.[6] This approach to security has recently emerged within the industry as user behavior analytics.[7] His earlier research on machine learning algorithms applied to credit card fraud was adopted throughout the financial industry.[8]

Intrusion Detection Systems (IDS) Lab

With a research grant from DARPA’s Cyber Panel program Stolfo established the Intrusion Detection Systems (IDS) lab at Columbia University in 1996.[9] The lab builds next-generation tools to detect stealthy and malicious intruders in computer systems. This includes research into anomaly detection, collaborative intrusion detection, attacker modeling, malicious code, and secure wireless networks. The lab has also pioneered the use of data analysis and machine learning techniques for the adaptive generation of novel sensors and anomaly detectors for a variety of tasks in computer security. To date, Stolfo has graduated 28 PhD students [10] who have gone on to make significant contributions to the field such as Wenke Lee, Eleazar Eskin, Daniel Miranker, and Ang Cui.


Parallel Computing for Artificial Intelligence

DADO Parallel Computer

Stolfo and students Dan Miranker, Mike van Biema, Alexander Pasik and Steve Taylor, designed the architecture and software systems for the DADO parallel computer,[11] an example "fifth generation computer" sponsored by DARPA's high performance parallel computing initiative in the mid-1980s. The DADO research group designed and built in a lab at Columbia University a fully functional a 1023-processor version of the machine that was the first parallel machine providing large-scale commercial speech recognition services.[12] The DADO occupied about 2 cubic feet of cabinet space. The DADO was tested at sea in a Navy research vessel to test its capabilities for related acoustic analyses and detection capabilities. A parallel broadcast and resolve/report function introduced by the DADO machine apparently influenced part of the design of the IBM Blue Gene parallel computer.[13]

The DADO technology was the first invention claimed by Columbia University for ownership of a faculty member's intellectual property under the 1980 Bayh-Dole Act. A company called Fifth Generation Computer was formed by Columbia and outside investors to commercialize the DADO machine. The company subsequently developed a commercially deployed speech recognition system operated by Qwest. A dispute between the small company and a large telecommunications provider and Columbia University caused a six-year detour into the US court system where ultimately Stolfo prevailed.[14]

DADO introduced the parallel computing primitve: “Broadcast, Resolve, Report”, a hardwire implemented mechanism that today is called MapReduce.[15][16]

Data Mining of Big Data

ACE Expert system: the First Deductive Database System and Application

Among his earliest work, Stolfo along with colleague Greg Vesonder of Bell Labs, developed a large-scale expert data analysis system, called ACE (Automated Cable Expertise) for the nation's phone system. AT&T Bell Labs distributed ACE to a number of telephone wire centers to improve the management and scheduling of repairs in the local loop.[17] ACE is likely to have been the first system to combine rule-based inference (an AI expert system) with a relational database management system, the AT&T CRAS system, and serves as a model for deductive data base systems that were the subject matter of research for many years in the database community. ACE was the first expert system of its kind that was commercialized and widely distributed.[18]

Merge/Purge, De-duplication of large datasets

In other work related to the "merge/purge" problem (sometimes referred to as “record linkage” or “data deduplication”) an algorithm developed by him and student Mauricio Hernandez has been used in large-scale commercial systems for data cleansing.[19] Identifying and purging duplicates from large data sets is a very important part of large-scale data analysis systems, especially in commercial data analytics. The algorithms invented provided a means of scaling to very large data sets while balancing the requirement to produce accurate results in the presence of arbitrary noise and error in the data base. The patented technology was licensed by Informix, a company that was later acquired by IBM.[20]

KDD CUP Data set

The DARPA IDS evaluation datasets were constructed by Lincoln Labs in 1998 and 1999 for the DARPA Cyber Panel program.[21] These network trace data sets were used to evaluate the performance of different intrusion detection systems; they were the only network trace data with ground truth available to the open research community. The data, however, were difficult to use directly by a wider community of data mining researchers. Stolfo and his associates in the IDS lab including Wenke Lee created the KDD Cup dataset derived from the DARPA IDS datasets.[22] The DARPA network trace data were converted to "connection records" making the data more suitable for data mining researchers to test various machine learning algorithms. This data created as a community service is extensively used in IDS research, even today.[23]

Machine Learning Applied to Cybersecurity

Improved Credit Card Fraud Detection

Stolfo consulted to the CTO of Citibank for several years and conducted research on machine learning algorithms applied to the credit card fraud problem. Much of that work with students Phil Chan and Andreas Prodromidis published as "meta-learning"- based strategies, demonstrated how to improve the accuracy of fraud detectors and substantially reduce loss due to fraud.[24][25]


Stolfo was an early proponent of collaborative security and distributed IDS technology and systems. Stolfo and students Ke Wang and Janek Parehk developed a fully functional IDS alert exchange system that introduced a new means of sharing sensitive data in a privacy-preserving manner. The technique involved communicating network packet content found to be anomalous or verified as an attack after converting the raw packet content into a statistical representation allowing accurate correlation of common attacks across sites.[26] The method invented by Stolfo and students to share and correlate content across administrative domains without disclosing sensitive information introduced the use of Bloom filters storing n-gram content of network packet datagrams.[27] The method was extensively studied and continues to be used in several ongoing experiments. The method also formed the basis of a recent project with colleagues Steve Bellovin and Tal Malkin for the secure querying of encrypted document databases without requiring the insecure decryption of any document when searching for relevant content.[28][29]

Decoys and FOG Computing

Stolfo coined the term FOG computing where technology is used “to launch disinformation attacks against malicious insiders, preventing them from distinguishing the real sensitive customer data from fake worthless data.”[30] Stolfo’s proposed approach is to confuse and confound a traitor by leveraging uncertainty, to reduce the knowledge they ordinarily have of the systems and data they now gain access to without authorization. FOG computing systems integrate bait information with systems that generate alerts when a decoy is misused.[31][32]

After being embraced by Cisco and other start ups, the meaning has shifted somewhat: “Similar to Cloud, Fog provides data, compute, storage, and application services to end-users. The distinguishing Fog characteristics are its proximity to end-users, its dense geographical distribution, and its support for mobility.”[33]

The Insider Threat: RUU?

In 2005 Stolfo received funding from the Army Research Office to conduct a workshop to bring together a group of researchers to help identify a research program to focus on insider threats.[34] Since then the IDS group at Columbia working with other researchers at I3P developed several demonstration systems evidence of insider malfeasance. The work includes user profiling techniques (especially for masquerader detection. "RUU" is a spoken acronym for Are You You?) studied by Stolfo and student Malek Ben Salem, and a number of decoy generation facilities studied jointly with co-PI Angelos Keromytis and student Brian Bowen.[35]

Email Mining Toolkit (EMT)

The EMT system sponsored by DARPA contracts was among the first machine learning system to incorporate social network analyses in important security problems, including spam detection and virus propagation.[36] The extensive set of analyses in EMT, developed by Stolfo and student Shlomo Herskhop and others, allowed analysts, forensics experts, students and researchers the opportunity to explore large corpora of bail messages and discover a wide range of important derivative knowledge about the communication dynamics of a user or an organization. Among its applications, EMT models user behavior to identify uncharacteristic bail flows indicative of spam bots and viral propagations.[37] The toolkit has been downloaded by well over a 100 users and elements of the analyses introduced by EMT serve as a model for other analytical systems. The entire body of analyses demonstrated a general description of all IDS network and communication analysis systems conveniently described by the acronym, CV5.[38]

Embedded Device Security

Symbiotic Embedded Machines (SEM) and Insecure Embedded Systems

Student Ang Cui working with Stolfo in the IDS lab invented a concept to embed arbitrary code into legacy embedded devices. The symbiotic embedded machine technology has been demonstrated to provide a direct means to inject security features into operational Cisco IOS routers in situ without any significant performance degradation and without any negative impact on the routers primary function.[39] The Symbiote technology is being explored for use in a number of different platforms and devices (ARM architecture, X86, MIPS instruction set) and several interesting applications, especially for a large set of existing insecure embedded devices found on the internet.[40] This line of work is supported by the DARPA CRASH program that has brought together a very large number of computer science researchers focused on clean slate design for a new generation of safe and secure computer systems.[41] Preliminary work performed by Cui and Stolfo in the IDS lab performed a wide area scan of the internet counting the number of vulnerable devices. To date over 1.1 million have been found.[42]

Service to the US Government

· High Tech Subcommittee of the New York City Partnership, 1987 (chaired by J. Lederberg of Rockefeller Univ.).[43]

· NSF Experimental System Program Oversight Committee, July 7, 1989.

· New York State Science and Technology Foundation, New Business Evaluation, consultant 1989.[43]

· NSF site visit reviewer for Electrical and Computer Engineering Department of the University of Puerto Rico, 1995.

· DARPA IPTO Futures Panel, 2007, 2008.[44]

· Cyber Security Research Roadmap Invitational Workshop, Oct 7-9, 2008.

· National Academies National Research Council/Naval Studies Board Committee on Information Assurance for Network-Centric Naval Forces, 2008, 2009.[45]

· NSA R6 Steering Committee on Analytics for Cyber Defense, a CNCI Research Workshop, 2009.

· DARPA TCTO Office CyberBio Idea Summit Member, March 2010.

· Testimony before the DNI Cybersecurity Research Commission, Jan, 2013

· National Academies Panel on Information Science at the Army Research Laboratory, 2012-2014, 2015-2016.

· Member of the National Academies of Sciences, Engineering, and Medicine Intelligence Science and Technology Experts Group (ISTEG) to support the Office of the Director of National Intelligence (ODNI), 2015-present


Spin out companies

Red Balloon Security

Founded in 2011, Red Balloon Security (or RBS) is a cyber security company founded by Dr Sal Stolfo and Dr Ang Cui. A spinout from the IDS lab, RBS developed a Symbiote technology called FRAK as a host defense for embedded systems under the sponsorship of DARPA's Cyber Fast Track program. FRAK is a system that provides the core capability to automatically unpack, modify and repack embedded system firmware to install Symbiote defenses. Currently, they are developing products and services that are based upon the Software Symbiote technology.[46]

Allure Security Technology

Created based on their IDS lab research for the DARPA Active Authentication and the Anomaly Detection at Multiple Scales program, Dr Sal Stolfo and Dr. Angelos Keromytis founded Allure Security Technologies. Using active behavioral authentication and decoy technology Stolfo pioneered and patented in 1996.[47][48][49][50][51] Allure brought those technologies together into Novo, an active user behavior analytics security solution that protects devices from data loss and intrusion. Allure’s research has been supported by Columbia University, the National Science Foundation, DARPA, DHS, and others.[52][53][54]

Founded in 2009, Allure Security Technology was created based on work done under DARPA sponsorship in Columbia’s IDS lab based on DARPA prompts to research how to detect hackers once they are inside an organization's perimeter and how to continuously authenticate a user without a password.

Acquired companies/technologies

Electronic Digital Documents

Stolfo’s company Electronic Digital Documents produced a “DataBlade” technology, which Informix marketed during their strategy of acquisition and development in the mid 80’s.[55] Stolfo’s patented merge/purge technology called EDD DataCleanser DataBlade was licensed by Informix.[56][57] Since its acquisition by IBM in 2005, IBM Informix is one of the world’s most widely used database servers, with users ranging from the world’s largest corporations to startups.

System Detection Inc

System Detection was one of the companies founded by Prof. Stolfo to commercialize the Anomaly Detection technology developed in the IDS lab. The company ultimately reorganized and was rebranded as Trusted Computer Solutions. That company was recently acquired by Raytheon.[58][59]

Media/Popular Culture

In 2013, The Washington Post interviewed Dr. Stolfo about his technology that uses decoy data to mislead hackers, a product soon to be vended by Allure Security Technology.[60] In 2013, The New York Times reported that Dr. Stolfo and his advisee Ang Cui had intercepted the operating system of Cisco’s VoIP phones in order to spy remotely, enabling them to transcribe conversations using Google’s voice-to-text translation.[61] In 2012, The Scientific American covered the pair’s new “symbiote” program that would detect invasions of firmware code without slowing down a computer’s speed.[62]

In 2011, MSNBC broke the story that while spearheading Columbia University’s Intrusion Detection Systems Laboratory, Dr. Stolfo worked with Cui on a budget of $2,000 to orchestrate an attack on printers. By remotely installing malware onto Laser Jet devices, they exposed a security flaw.[63] Reports of the research went viral and according to Dr. Stolfo, "many persist in thinking printers can be commanded to burn." [64] However, Dr. Stolfo's research shows just the opposite.[65] On March 25, 2015, the fourth episode of Crime Scene Investigation: Cyber made allusions to Dr. Stolfo’s orchestrated attack on printers. Titled “Fire Code,” the episode follows FBI Special Agent Avery Ryan (Patricia Arquette) whose home printer catches fire after an attacker uses malicious printer firmware to break into her Wi-Fi account.[66]


Patents Issued as of February 2016
Patent Number Description
9,275,345 System level user behavior biometrics using feature extraction and modeling
9,253,201 Detecting network anomalies by probabilistic modeling of argument strings with markov chains
9,143,518 Systems, methods, and media protecting a digital data processing device from attack
9,009,829 Methods, systems, and media for baiting inside attackers
9,003,528 Apparatus method and medium for tracing the origin of network transmissions using N-gram distribution of data
9,003,523 Systems, methods, and media for outputting data based upon anomaly detection
8,931,094 System and methods for detecting malicious email transmission
8,893,273 Systems and methods for adaptive model generation for detecting intrusions in computer systems
8,887,281 System and methods for adaptive model generation for detecting intrusion in computer systems
8,844,033 Systems, methods, and media for detecting network anomalies using a trained probabilistic model
8,819,825 Systems, methods, and media for generating bait information for trap-based defenses
8,789,172 Methods, media, and systems for detecting attack on a digital processing device
8,769,684 Methods, systems, and media for masquerade attack detection by monitoring computer user behavior
8,763,103 Systems and methods for inhibiting attacks on applications
8,694,833 Methods, media, and systems for detecting an anomalous sequence of function calls
8,667,588 Systems and methods for correlating and distributing intrusion alert information among collaborating computer systems
8,644,342 Apparatus method and medium for detecting payload anomaly using N-gram distribution of normal data
8,601,322 Methods, media, and systems for detecting anomalous program executions
8,544,087 Methods of unsupervised anomaly detection using a geometric framework
8,528,091 Methods, systems, and media for detecting covert malware
8,489,931 Methods, media, and systems for detecting an anomalous sequence of function calls
8,468,445 Systems and methods for content extraction
8,448,242 Systems, methods, and media for outputting data based upon anomaly detection
8,443,441 System and methods for detecting malicious email transmission
8,407,785 Systems, methods, and media protecting a digital data processing device from attack
8,407,160 Systems, methods, and media for generating sanitized data, sanitizing anomaly detection models, and/or generating sanitized anomaly detection models
8,381,299 Systems, methods, and media for outputting a dataset based upon anomaly detection
8,381,295 Systems and methods for correlating and distributing intrusion alert information among collaborating computer systems
8,239,687 Apparatus method and medium for tracing the origin of network transmissions using n-gram distribution of data
8,135,994 Methods, media, and systems for detecting an anomalous sequence of function calls
8,074,115 Methods, media and systems for detecting anomalous program executions
7,996,288 Method and system for processing recurrent consumer transactions
7,979,907 Systems and methods for detection of new malicious executables
7,962,798 Methods, systems and media for software self-healing
7,913,306 System and methods for detecting intrusions in a computer system by monitoring operating system registry accesses
7,818,797 Methods for cost-sensitive modeling for intrusion detection and response
7,784,097 Systems and methods for correlating and distributing intrusion alert information among collaborating computer systems
7,779,463 Systems and methods for correlating and distributing intrusion alert information among collaborating computer systems
7,752,665 Detecting probes and scans over high-bandwidth, long-term, incomplete network traffic information using limited memory
7,657,935 System and methods for detecting malicious email transmission
7,639,714 Apparatus method and medium for detecting payload anomaly using n-gram distribution of normal data
7,487,544 System and methods for detection of new malicious executables
7,448,084 System and methods for detecting intrusions in a computer system by monitoring operating system registry accesses
7,424,619 System and methods for anomaly detection and adaptive learning
7,277,961 Method and system for obscuring user access patterns using a buffer memory
7,225,343 System and methods for adaptive model generation for detecting intrusions in computer systems
7,162,741 System and methods for intrusion detection with dynamic window sizes
5,920,848 Method and system for using intelligent agents for financial transactions, services, accounting, and advice
5,748,780 Method and apparatus for imaging, image processing and data compression
5,717,915 Method of merging large databases in parallel
5,668,897 Method and apparatus for imaging, image processing and data compression merge/purge techniques for document image databases
5,563,783 Method and system for securities pool allocation
5,497,486 Method of merging large databases in parallel
5,363,473 Incremental update process and apparatus for an inference system
4,860,201 Binary tree parallel processor
4,843,540 Parallel processing method


  1. "Professor Salvatore J. Stolfo". 2015-02-09. Retrieved 2015-06-26. 
  2. "Recent Courses". Retrieved 2015-06-26. 
  3. "Salvatore J. Stolfo CV" (PDF). Retrieved 2015-06-26. 
  4. "Salvatore Stolfo - Google Scholar Citations". Retrieved 2015-07-01. 
  5. "Salvatore Stolfo - Google Scholar Citations". Retrieved 2015-06-26. 
  6. "The JAM Project: Fraud and Intrusion Detection Using Meta-learning Agents". Retrieved 2015-06-26. 
  7. CiteSeerX — Agent-based fraud and intrusion detection in financial information systems
  8. "United States Patent: 5920848". Retrieved 2015-06-26. 
  9. "Sponsors | The Columbia University Intrusion Detection Systems Lab". Retrieved 2015-06-26. 
  10. "Ph.D. Students". Retrieved 2015-06-26. 
  11. "The DADO Parallel Computer (PDF Download Available)". 2015-01-07. Retrieved 2015-08-05. 
  12. "Columbia Engineering Magazine - Fall 2014 by Columbia Engineering School". ISSUU. Retrieved 2015-08-05. 
  13. Zeveloff, Julie (2008-10-30). "IBM Faces Patent Suit Over Supercomputer". Law360. Retrieved 2015-08-05. 
  14. Getting Up to Speed:: The Future of Supercomputing, By Committee on the Future of Supercomputing, Computer Science and Telecommunications Board, Division on Engineering and Physical Sciences, National Research Council
  15. Strategic Computing: DARPA and the Quest for Machine Intelligence, 1983-1993, By Alex Roland, Philip Shiman, Pages 173-175.
  16. Author. "DADO: A Parallel Processor for Expert Systems - Academic Commons". Retrieved 2015-08-05. 
  17. Author. "Are maintenance expert systems practical now? - Academic Commons". Retrieved 2015-07-01. 
  18. Author. "ACE: An Expert System Supporting Analysis and Management Decision Making - Academic Commons". Retrieved 2015-07-01. 
  20. "Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem." (PDF). ResearchGate. Retrieved 2015-07-01. 
  21. "MIT Lincoln Laboratory: DARPA Intrusion Detection Evaluation". Retrieved 2015-07-01. 
  22. "KDD-CUP-99 Task Description". Retrieved 2015-07-01. 
  23. "Handbook of Statistical Analysis and Data Mining Applications - Gary Miner, Robert Nisbet, John Elder IV - Google Books". 2009-05-14. Retrieved 2015-07-01. 
  25. "CiteSeerX — On the Accuracy of Meta-learning for Scalable Data Mining". Retrieved 2015-07-01. 
  28. Janak J. Parekh, Ke Wang, Salvatore J. Stolfo, Janak J. Parekh, Ke Wang; "Privacy-Preserving Payload-Based Correlation for Accurate Malicious Traffic Detection;" SIGCOMM Workshop on Large Scale Attack Defence; 2006.
  29. Mariana Raykova, Ang Cui, Binh Vo, Bin Liu, Tal Malkin, Steven Bellovin, Salvatore J. Stolfo; "Usable Secure Private Search;" IEEE Security and Privacy; 2011/07/01
  30. Stolfo, Salvatore J. (2012-05-25). "Fog Computing: Mitigating Insider Data Theft Attacks in the Cloud - Academic Commons". doi:10.1109/SPW.2012.19. Retrieved 2015-07-01. 
  32. "Insider Attack and Cyber Security - Beyond the Hacker". Springer. Retrieved 2015-07-01. 
  33. IoT, from Cloud to Fog Computing
  34. "Research in Attacks, Intrusions, and Defenses: 16th International Symposium ... - Google Books". 2013-10-23. Retrieved 2015-07-01. 
  36. [1]
  37. [2]
  38. [3]
  40. Choi, Charles Q. (2012-11-26). "Auto-Immune: "Symbiotes" Could Be Deployed to Thwart Cyber Attacks". Scientific American. Retrieved 2015-07-01. 
  42. Kim Zetter (2009-10-23). "Scan of Internet Uncovers Thousands of Vulnerable Embedded Devices". WIRED. Retrieved 2015-07-01. 
  43. 43.0 43.1 Salvatore Joseph Stolfo - Nomination and Bio
  44. Committee: Panel on Information Science at the Army Research Laboratory
  45. Information Assurance for Network-Centric Naval Forces
  46. Mark Piesing. "Hacking attacks on printers still not being taken seriously | Technology". The Guardian. Retrieved 2015-07-01. 
  47. "Patent US8528091 - Methods, systems, and media for detecting covert malware - Google Patents". Retrieved 2015-07-01. 
  49. DARPA - Open Catalog
  50. Patent US8769684 - Methods, systems, and media for masquerade attack detection by monitoring ... - Google Patents
  52. Sponsors | The Columbia University Intrusion Detection Systems Lab
  53. DARPA-BAA-13-16 Active Authentication (AA) Phase 2 - Federal Business Opportunities: Opportunities
  54. DARPA PM Richard Guidorizzi, Brief Overview of Active Authentication - YouTube
  55. Matching Records in Multiple Databases Using a Hybridization of Several ... - Google Books
  56. "Salvatore Joseph Stolfo - Nomination and Bio". Retrieved 2015-06-26. 
  57. Data Mining and Knowledge Discovery Handbook - Google Books
  58. "CounterStorm, Inc.: Private Company Information - Businessweek". 2008-09-05. Retrieved 2015-06-26. 
  59. Raytheon Company : Investor Relations : News Release
  60. To thwart hackers, firms salting their servers with fake data - The Washington Post
  62. New "Symbiote" May Protect Microchips from Cyber Attack - Scientific American
  63. CSI:Cyber and the Sexy, Spooky Killer Hacker Printer Fire
  65. Exclusive: Millions of printers open to devastating hack attack, researchers say - NBC News
  66. CSI: Cyber Recap: Wi-Fi Will Burn Down Your Home - Vulture