Abstract— Botnets represent one of
the most aggressive cyber security threats faced by organizations as they
provide different platforms for many illegal activities like distributed denial
of service attacks, click frauds, phishing and malware dissemination. Variety of
techniques which use different feature set are proposed for effective botnet
traffic classification and analyses but several challenges remain unaddressed
such as the effect of feature set of Network flow exporter. In this paper we explore
an open source Network traffic flow exporter (with a set of features) using different
protocol filters. We evaluated that the use of flow exporter and protocol
filters indeed affect the performance of botnet traffic classification.
cyber security, flow exporter, protocol filter, traffic classification.
A botnet is a collection of compromised computers
connected over internet and remotely controlled by botmaster. The individual compromised
machines are called bots. Botnets are created to conduct different malicious
activities like distributed denial of service (DDoS) attacks, click-fraud
scams, spreading spam, stealing victims personal information and taking
advantage of users significant computational resources by using malicious bots 1.
The bots keep updating themselves and are controlled by botmaster to carry out
malicious instructions for different illegal activities. Hence with significantly
increasing high rate of reported infections and illegal activities, the botnets
contribute a serious threat against cyber security.
The significant aspect of botnets architecture include
communication scheme, which has highly evolved over
the years that enhanced botnet functionality and avoid botnet detection. The architecture includes the compromised
bots that communicate with command and control (C&C) server to fetch
instructions from botmaster. Botnets used the Internet Relay Chat (IRC)
protocol for communication until early 2000s. However, the IRC-based bots are highly
vulnerable as they use centralized topology architecture. The complete botnet
network can be disrupted just by shutting down the IRC server. Also, the messages may easily get reveled by
continuous monitoring of network traffic and further research can be done on captured
messages from packets. Since 2003, the botnets evolved and started using more
sophisticated techniques that involved use of decentralized topology
architecture such as peer-to-peer (P2P) and different ubiquitous protocols such
as DNS and HTTP. The P2P communication scheme involves individual bots that act
as both client and server, making it more effective without any fixed
centralized point that could be exploited. However, the P2P botnet topology
also has its limitation that includes higher latency underlying in the command
and control transmission which further impacts the bots synchronization. The
use of various techniques like encryption and fluxing has also helped botnets
to avoid detection. Therefore, botnet identification and detection have become
highly challenging. Many botnet detection approaches have been proposed that
involve network traffic analysis classification. Some of the research in this
category focus to build a generalized model for botnet detection where as
others focuses on specific types of botnets. In Early 2000, mostly the proposed
systems included specifically botnets using IRC 2. However the recent
research is more focused on P2P and HTTP based botnets 3 4. The botnet
monitoring and detection techniques used for botnet classification should be
active and continuous as the botnets use automatic update mechanisms. Also, it potentially
enables them to learn new patterns and help in adapting to any changes in botnet
evolution. Therefore, machine learning techniques (i.e., classification and
clustering) are an effective apt solution which can be deployed. To enable
automatic pattern recognition for meaningful representation of network traffic
analysis, the clustering and classification are used. Hence, the most
significant component of these systems is meaningful feature (attribute) extraction
from network traffic. It is very
challenging to extract these features. Thus to end this, various botnet
detection and analysis systems have proposed their own feature sets that
represent network traffic which consists of the network packets. The network
packets is mainly divided into two major parts: 1) packet header, that contains
control information of protocols being used over network, and 2) packet
payload, which contain the application information being used over the network.
Some of the botnet detection and analysis approaches use network packet headers
4, where as others use packet payload methods 5. Flow based feature
extraction methods are commonly used by the approaches that rely on packet
headers 4. In these approaches, the traffic communication packets are
aggregated into flows and later the statistics are computed. The flow exporters
are used for generating flows and extracting such features. However, various
botnets use encryption techniques to hide the identity and avoid the detection
systems which analyze the packet payload for embedded communication
information. Thus, the flow exporters are very effective because they summarize
the traffic using only network packet headers. Hence, the open source flow
exporter along with machine learning technique is used for performing effective
botnet traffic classification.
AND RELATED WORK
The bots are the vulnerable hosts that are infected by the
self-propagating malwares called bot program and are designed to perform
various malicious activities. The botmaster controls the infected bots network known
as botnet. Initially, the infected bots receive the commands from the botmaster
by C&C medium and perform malicious operations like DDoS, phishing,
spamming, identity theft attacks and stealing user’s significant information 1.
The bot uses five stages to create and
maintain a botnet 1. The first stage includes the infection stage, where the
attacker infects the victim by exploiting the existing vulnerabilities by
different exploitation techniques. The second stage includes the secondary
injection, where the shell code is executed on the infected machine to get the
image of bot binary. This bot binary then itself installs on the infected
machine and later gets converted to a bot. The third stage involves the connection,
the bot binary establishes the C channel which is used by the botmaster. The
fourth stage, after the connection is established then the malicious stage
starts where the botmaster sends the commands to the botnet. The fifth stage
includes the updating and maintenance of bots by botmaster.
Although a significant amount of research work has been
done on botnet detection but botnet detection techniques using network traffic flow
analysis approach have only emerged in the last few years.
Gu et al. developed the BotMiner that detects botnets which
uses the group behavior analysis approach. It uses a clustering approach to
find similar C communication behavior and makes clusters, later employs Snort
6. The data set included non malicious data from the campus network and
malicious data from running bot binaries in a sandbox environment. The captured
traffic files are converted into flows and flow exporter included the features
such as the total number of packets per flow, average number of bytes per
packet and average number of bytes per second. The result showed that the
BotMiner could detect botnets with detection rate (DRs) between 75% and 100%.
Strayer et al. proposed an IRC botnet detection system
which used machine learning techniques (classification and clustering) 2. Firstly
the classification technique is used to filter the chat type of traffic and
later the clustering technique is used to find the group activities in the filtered
traffic. Lastly, the analyzer was applied to the cluster for botnet detection. The
data set used was gathered from a controlled testbed running bot binary. They evaluated
the classifiers against a multidimensional flow correlation technique which was
designed and proposed.
Zeidanloo et al. developed a detection system that focused
on P2P and IRC-based botnets 5. By using filtering, classification, and
clustering approaches, it focused to detect botnets group behavior in a given traffic
file. A flow based technique was used to analyze traffic and payload inspection
was deployed for traffic filtering.
Zhao et al. investigated a botnet detection system based
on flow intervals 3. The flow features of captured traffic packets were
employed with Bayesian networks and decision tree classifiers to detect the
botnets. They evaluated and analyzed the normal and malicious attack traffic. The
result showed DRs over 90% with the false positive rates (FPRs) under 5%.
Haddadi et al.
proposed the botnet detection approach based on botnet traffic analysis 4. By
establishing the HTTP and DNS communication with the publicly available domain
names of botnet C server and legitimate web server, the normal and malicious
traffic was generated. Netflow with machine learning algorithm was proposed to
detect the botnets. Results achieved 97% DR and 3% FPR.
The recent literature work for botnet detection focuses
more on the P2P and HTTP protocols 4. This includes using different data
mining or machine learning techniques such as neural networks, decision trees,
or statistical methods that used flow features. Mostly the normal traffic files
are integrated with attack traffic file to evaluate the performance of the
proposed botnet detection systems.
At last, this paper is aimed to use the features exported
by open source flow exporter and analyzing the flow exporter’s effect on the
performance of botnet classification.
Early literature botnet traffic analysis work used some
network flow information, which included packet headers. Most of them focus on
certain type of protocols such as HTTP and DNS. This indicates use of protocol
filtering in analyzing traffic data. No packet payload related information is
incorporated in it. The possibility of detecting botnets by using only features
extracted from the traffic flow is explored.
Traffic Data Set
The traffic files obtained from botnets that used HTTP
protocol as the communication protocol or HTTP based P2P topology that look
like normal HTTP traffic are used for analysis. The botnet traffic files publically
available at NETRESEC 7 and Snort 8 website are employed for carrying out