As the Internet continues to grow at a rapid pace as
the primary medium for communications and commerce and as
telecommunication networks and systems continue to expand their
global reach, digital information has become the most popular and
important information resource and our dependence upon the
underlying cyber infrastructure has been increasing significantly.
Unfortunately, as our dependency has grown, so has the threat to the
cyber infrastructure from spammers, attackers and criminal
enterprises. In this paper, we propose a new machine learning based
network intrusion detection framework for cyber security. The
detection process of the framework consists of two stages: model
construction and intrusion detection. In the model construction stage,
a semi-supervised machine learning algorithm is applied to a
collected set of network audit data to generate a profile of normal
network behavior and in the intrusion detection stage, input network
events are analyzed and compared with the patterns gathered in the
profile, and some of them are then flagged as anomalies should these
events are sufficiently far from the expected normal behavior. The
proposed framework is particularly applicable to the situations where
there is only a small amount of labeled network training data
available, which is very typical in real world network environments.
 Homeland Security Council of USA, "National strategy for homeland
 T. N. Saadawi, and L. H. Jordan, Cyber Infrastructure Protection.
Strategic Studies Institute, US Army War College, 2011.
 A. Patcha and J. Park, "An overview of anomaly detection technologies:
exisiting solutions and latest technological data," Computer Networks,
vol. 51(12), 2007, pp. 3448-3470.\
 E. Eskin, et.al. A Geometric Framework for Unsupervised Anomaly
Detection: Detecting Intrusions in Unlabeled Data. Application of Data
Mining in Computer Security (eds. S. Jajodia and B. Dordrecht),
Kluwer, 2002, ch. 4.
 E. Jiang. Automatic Text Classification from Labeled and Unlabeled
Data. A chapter to be appears in Intelligent Data Analysis for Real-Life
Applications: Theory and Practice (eds. R. Magdalena, et. al.), IGI
Global Publishing, 2012.
 J, MacQueen, "Some methods for classification and analysis of
multivariate observations," in 1967 Proc. 5th Berkeley Symposium on
Mathematical Statistics and Probability, University of California Press,
 A. Dempster, N. Laird and D. Rubin, "Maximum likelihood from
incomplete data via the EM algorithm," J. Royal Statistical Society,
Series B, 39, pp. 1-38, 1977.
 KDD Cup, The International Knowledge Discovery and Data Mining
Tools Competition KDD-CUP. http://kdd.ics.uci.edu/datasets/kddcup99,