Fork me on GitHub

Advanced DNS Analysis with ArcSight

Entropy

This is part one in a series of extending the functionality of ArcSight connectors to analyze DNS requests.

DNS requests are a high-volume event source, but there is value in sending them to the SIEM beyond seeing which workstations are hitting giphy all day. Along the buzzwordy cyber kill chain, DNS requests can be maliciously used at the delivery and command & control (C2) stages in ways that are difficult to detect. Domain Generation Algorithms (DGA) and DNS Tunneling provide a dynamic means of delivery, control, and exfiltration via DNS, meaning signature-based detection methods popular in many IDS/IPS and firewall solutions are suboptimal in preventing malicious DNS traffic.

In this scenario it may be that the SIEM is a better tool for detecting DGA and DNS Tunnels; Splunk provides extensions for calculating the entropy of a given field – a useful method for detecting both techniques – and in ArcSight it is possible to write custom token operators (Java classes) that can be used at the SmartConnector level to do additional analysis on fields. By using a custom token operator, the Shannon entropy of a DNS request can be calculated to indicate the probability that a request was made to a DGA-defined endpoint or is actually traffic tunneling through DNS.

Domain Generation Algorithms (DGA), are used by several families of malware – including Conficker, Locky, and Gameover ZeuS – to generate thousands of potential domains per day (typically based on timestamps or hard-coded seed values). The malware will reach out to candidate domains until it is able to contact the controller. Security researchers rely on reversing the DGA used by the malware in order to know ahead of time what domains may be used, and thus block them. This process takes time, and is offset by malware author’s ability to push out a new variant of the malware.

DNS Tunneling can be used to exfiltrate data from a compromised host by encoding data and breaking it into chunks transmitted as DNS requests to a server controlled by the malicious actor. Typically, DNS traffic is not stringently filtered.

In both cases, the (sub)domains will often appear as though a cat walked across the keyboard (real examples):

  • 3ipp3xuzn2
  • myxmilto
  • qxpmhnrvrkqewurq
  • ad9535b080d1ab5ef38def4384321c05

These all stand out to humans, but it is not practical for an analyst to examine a list of every DNS request made across an enterprise. However, the Shannon entropy of these strings is often atypical versus common domains:

DGA Shannons Popular Domain Shannons
myxmilto 2.75 google 1.92
3ipp3xuzn2 2.92193 twitter 2.13
qxpmhnrvrkqewurq 3.40564 youtube 2.52
ad9535b080d1ab5ef38def4384321c05 3.67923 facebook 2.75

Shannon entropy represents the minimum bits per unit in a given message needed to encode that message in binary. In other words, it represents the variation or predictability. This is defined as:

alt text=-∑_(i=1)^n P(x_i) log_2 P(x_i)")

Where: H = entropy in Shannons (i.e. for banana H = 1.45915) x = a given message (i.e. banana) n = count of discrete values (i.e. for banana n = 3) i = discrete value in x (i.e. for banana i = [b,a,n]) P = frequency of i in x (i.e. for banana i(b) = 1, i(a) = 3, i(n) = 2)

Thus, length and variability increase the entropy of a message. Typically, DGA and DNS tunneling will feature variability, and often length. Thus, by calculating the entropy of a DNS request it is possible to detect DGA and DNS tunneling without relying on signature-based detection.

To add this functionality to ArcSight a new token operator, __shannons, was created to calculate and return the entropy in Shannons of any string passed to the operator. In ArcSight, the entropy values of the subdomain, domain, and DNS request as a whole (excluding the TLD) are then queried to create a baseline of DNS request entropy as a whole and on a per-device basis. This allows us to detect abnormally high-entropy DNS requests and any devices that make a lot of these requests, in addition to any single request that may exceed a pre-set threshold.

Testing It Out

To validate the hypothesis that high Shannon-entropy DNS quests are likely to be associated with DGA and DNS tunneling, over 1 million DGA domains1 where run through a flat-file connector, along with 222 DNS queries made by the iodine DNS tunneling tool2, and the 500 top domains on the Internet.

Minimum Shannons Maximum Shannons Average Shannons
Conficker DGA 0.20 3.91 2.74
Zeus DGA 0.27 4.63 3.47
Iodine tunnel 0.29 5.67 3.21
Top Domains 0.20 3.87 2.76

The efficacy of the approach can be evaluated by considering the percentage of DGA DNS requests with entropy greater than the average entropy of a request to a top domain. Using this measure, 73% of Conficker DGA requests were detected and 93% of Zeus DGA. The test indicates that the DGA algorithm being used strongly impacts how reliably entropy alone can detect a malicious domain; however, it does appear that a DNS tunneling session can be reliably detected because of the volume of high entropy requests made from a single source to a single domain.

In order to improve DGA detection, part two in the series will examine additional methods of statistical analysis that can be implemented through custom token operators.

Test Sample Sets:

blogroll

social