Friday, December 17, 2021
Traffic Measurements and Data Analysis for DNS Security


The Domain Name System (DNS) protocol maps easy-to-remember domain names to their computer-friendly numeric labels, assigned to each Internet-connected device that uses the Internet Protocol. DNS is the most critical and largely unheralded protocol, in the absence of which Internet users would need to memorize IP addresses of all the Internet applications, including banking sites, emails, or social media.

In the early days of the Internet, as highlighted by Dr. Paul Vixie, scientists invested all their efforts in facilitating communications because they believed that "something like the Internet could become humanity's collective digital nervous system.'' When the DNS principles and specifications were designed nearly four decades ago, security consideration was not an issue because the Internet was a network of trusted users. Danny Hillis, an American inventor and scientist, when registering the third domain name on the Internet thought that he should register a few more just in case, but he felt that "it wouldn't be nice.'' This example illustrates the trust within the community; the trust that was also built into the protocols of the Internet, including DNS.

Today's Internet is not only "humanity's collective digital nervous system'' but also a place where cybercriminals exploit technical vulnerabilities and human weaknesses for financial gain. Spammers, phishers, malware creators, speculators, or organized e-crime groups widely abuse the DNS protocol and domain names. DNS has become as critical for them to operate as it is for regular users.

Preventing registration of malicious domains is challenging because it requires assessing the (bad) intentions of domain owners. Prompt removal of domain names directly involved in e-crime requires collecting evidence or verifying evidence provided by trusted notifiers of malicious activity. DNS and hosting providers do not have the financial incentives to effectively confront domain name abuse.

The DNS infrastructure itself remains vulnerable to attacks due to not restrictive enough assumptions about cybercriminals and the threat model when designing protocols in the early days of the Internet. Newly discovered vulnerabilities inherent to the DNS design drive the development and deployment of new extensions to the DNS protocol. However, their uptake has been very slow. It has become less of a technology issue than an economic incentive problem, i.e., whether implementing such security technologies can be profitable for the operators deploying them.

The distributed nature and architecture of the DNS protocol also allow for increased Internet security and stability. One example in which DNS plays an important role is in email security protocols: the Sender Policy Framework (SPF) and the Domain-based Message Authentication, Reporting, and Conformance (DMARC). While the Simple Mail Transfer Protocol (SMTP), designed for email distribution, is inherently insecure, SPF and DMARC providing a set of rules stored in the 'TXT' records of DNS resources can eliminate the problem of domain spoofing. Cybercriminals also abuse the DNS protocol architecture and its features to enhance the resilience of malicious infrastructures, amplify attacks, and avoid detection. Just mention Automatically Generated Domains (AGD) combined with fast-flux networks or Distributed Reflective Denial-of-Service (DRDoS) attacks that leverage open DNS resolvers.

Motivated by the problems of DNS security and domain name abuse, this dissertation has been devoted to DNS security: to make communications more selective and more difficult for malicious actors so that the "collective digital nervous system'' – the Internet – stays less affected, more secure, and trusted by their benign users. The first three contributions present DNS measurement studies related to weaknesses inherent to Internet protocols and domain names that can lead to the exploitation of DNS infrastructure and domain names. The following three contributions present statistical and machine learning approaches related to domain name abuse based on traffic measurements and inferential analysis from DNS-related data.

The first contribution illuminates the problem of non-secure DNS dynamic updates, which allow a miscreant to manipulate DNS entries in the zone files of authoritative name servers. We refer to this type of attack as zone poisoning. In its simplest version, a malicious actor could replace an existing 'A' or 'MX' resource record (RR) in a zone file of an authoritative server and point the domain name to an IP address under control of an attacker, thus effectively hijacking the domain name. We present the first measurement study of the vulnerability. Among the vulnerable domains are governments, health care providers, and banks, demonstrating that the threat impacts important services. With this study and subsequent notifications to affected parties, we aim to improve the security of the DNS ecosystem.

Source Address Validation (SAV) is a standard aimed at discarding packets with spoofed source IP addresses. The absence of SAV for outgoing traffic is a root cause of DRDoS attacks and received widespread attention. While less obvious, the absence of inbound filtering enables an attacker to appear as an internal host of a network and reveals valuable information about the network infrastructure. It may enable other attack vectors such as DNS cache poisoning. As a second contribution, we present the results of the Closed Resolver Project that aims at mitigating the problem of inbound IP spoofing. We perform the first Internet-wide active measurement study to enumerate networks that do not enforce filtering of incoming packets based on their source addresses. To achieve this goal, we identify closed and open DNS resolvers that accept spoofed requests coming from the outside of their network. Our work implies that the absence of inbound SAV makes DNS resolvers vulnerable to several types of attacks, including DNS cache poisoning, DNS zone poisoning, NXNSAttack, or zero-day vulnerabilities in the DNS server software.

Sending forged emails by taking advantage of domain spoofing is a common technique used by attackers. The lack of appropriate email anti-spoofing schemes or their misconfiguration lead to successful phishing attacks or spam dissemination. In the third contribution, we evaluate the coverage of SPF and DMARC deployment in two large-scale campaigns measuring their global adoption rate and deployment by high-profile domains. We propose a new algorithm for identifying defensively registered domains and enumerating the domains with misconfigured SPF rules. We define for the first time, new threat models involving subdomain spoofing and present a methodology for preventing domain spoofing, a combination of good practices for managing SPF and DMARC records and analyzing DNS logs. Our measurement results show that a large part of the domains do not correctly configure the SPF and DMARC rules, which enables attackers to deliver forged emails to user inboxes. Finally, we report on remediation and its effects by presenting the results of notifications sent to Computer Security Incident Response Teams responsible for affected domains.

To enhance competition and choice in the domain name system, the Internet Corporation for Assigned Names and Numbers introduced the new generic Top-Level Domain (gTLD) program, which added hundreds of new gTLDs (e.g. .nyc, .top) to the root DNS zone.
While the program arguably increased the range of domain names available to consumers, it has also created new opportunities for cybercriminals. To investigate this issue, in the fourth contribution, we present the first comparative study of abuse in the domains registered under the new gTLD program and legacy gTLDs (e.g. .com, .org). We combine historical datasets from various sources, including DNS zone files, WHOIS records, passive and active DNS and HTTP measurements, and reputable domain name blacklists to study abuse across gTLDs. We find that the new gTLDs appear to have diverted abuse from the legacy gTLDs: while the total number of domains abused for spam remains stable across gTLDs, we observe a growing number of spam domains in new gTLDs, which suggests a shift from legacy gTLDs to new gTLDs. We also analyze the relationship between DNS abuse, operator security indicators, and the structural properties of new gTLDs. The results indicate that there is an inverse correlation between abuse and stricter registration policies. Our findings suggest that  cybercriminals increasingly prefer to register, rather than hack, domain names and some new gTLDs have become a magnet for malicious actors. As the presented state of the art in gTLD abuse is in clear need of improvement, we have developed cases for modifying the existing safeguards and proposed new ones. ICANN is currently using these results to review the existing anti-abuse safeguards, evaluate their joint effects, and  introduce more effective safeguards before an upcoming new gTLD rollout.

Malicious actors abuse thousands of domain names every day by launching large-scale attacks such as phishing or malware campaigns. While some domains are solely registered for malicious purposes, others are benign but get compromised and misused to serve malicious content. Existing methods for their detection can either predict malicious domains at the time of registration or identify indicators of an ongoing malicious activity conflating maliciously registered and compromised domains into common blacklists. Since the mitigation actions for these two types domains are different, in the fifth contribution, we propose COMAR (Classification of Compromised versus Maliciously Registered Domains), an approach to differentiate between compromised  and maliciously registered domains, complementary to previously proposed domain reputation systems. We start with a thorough analysis of the domain life cycle to determine the relationship between each step and define its associated  features. Based on the analysis, we define a set of 38 features costly to evade. We evaluate COMAR using phishing and malware blacklists and show that it can achieve high accuracy (97% accuracy with a 2.5% false-positive rate) without using any privileged or non-publicly available data, which makes it suitable for the use by any organization. We plan to deploy COMAR at two domain registry operators of the  European country-code TLDs and set up an early notification system to facilitate the remediation of blacklisted domains.

In 2016, law enforcement dismantled the infrastructure of the Avalanche bulletproof hosting service, the largest takedown of a cybercrime operation so far. The malware families supported by Avalanche use Domain Generation Algorithms (DGAs) to generate random domain names for controlling their botnets. The takedown proactively targeted these presumably malicious domains, however, as coincidental collisions with legitimate domains are possible, investigators had first to classify domains to prevent undesirable harm to website owners and botnet victims. The constraints of this real-world takedown (proactive decisions without access to malware activity, no bulk patterns, and no active connections) mean that approaches based on the state of the art cannot be applied. The problem of classifying thousands of registered DGA domain names therefore required an extensive, painstaking manual effort by law enforcement investigators. To significantly reduce this effort without compromising correctness, we develop a model that automates the classification. Through a synergetic approach, we achieve an accuracy of 97.6% with ground truth from the 2017 and 2018 Avalanche takedowns. For the 2019 takedown, this translates into a reduction of 76.9% in manual investigation effort. Furthermore, we interpret the model to provide investigators with insights into how benign and malicious domains differ in behavior, which features and data sources are the most important, and how the model can be applied according to the practical requirements of a real-world takedown. Finally, we assisted law enforcement agencies by applying our approach to the 2019 Avalanche takedown iteration.

It is beyond doubt that selective and secure DNS communication is the basis for a more secure and stable Internet. Armed with the experience of the early days of the Internet and technological advances providing several missing security blocks in DNS, our work contributes to the implementation of security protocols, the identification of new (old) security problems overlooked by the community, as well as the development of statistical and machine learning methods to help intermediaries more effectively mitigate domain name abuse.

Mis à jour le 10 December 2021