S&P 2012

A Framework to Eliminate Backdoors from Response-Computable Authentication

Response-computable authentication (RCA) is a two-party authentication model widely adopted by authentication systems, where an authentication system independently computes the expected user response and authenticates a user if the actual user response matches the expected value. Such authentication systems have long been threatened by malicious developers who can plant backdoors to bypass normal authentication, which is often seen in insider-related incidents. A malicious developer can plant backdoors by hiding logic in source code, by planting delicate vulnerabilities, or even by using weak cryptographic algorithms. Because of the common usage of cryptographic techniques and code protection in authentication modules, it is very difficult to detect and eliminate backdoors from login systems. In this paper, we propose a framework for RCA systems to ensure that the authentication process is not affected by backdoors. Our approach decomposes the authentication module into components. Components with simple logic are verified by code analysis for correctness, components with cryptographic/ obfuscated logic are sand boxed and verified through testing. The key component of our approach is NaPu, a native sandbox to ensure pure functions, which protects the complex and backdoor-prone part of a login module. We also use a testing-based process to either detect backdoors in the sand boxed component or verify that the component has no backdoors that can be used practically. We demonstrated the effectiveness of our approach in real- world applications by porting and verifying several popular login modules into this framework.

Shuaifu Dai, Tao Wei, Chao Zhang, Tielei Wang, Yu Ding, Zhenkai Liang, and Wei Zou

Safe Loading - A Foundation for Secure Execution of Untrusted Programs

The standard loader (ld.so) is a common target of attacks. The loader is a trusted component of the application, and faults in the loader are problematic, e.g., they may lead to local privilege escalation for SUID binaries. Software- based fault isolation (SFI) provides a framework to execute arbitrary code while protecting the host system. A problem of current approaches to SFI is that fault isolation is decoupled from the dynamic loader, which is treated as a black box. The sandbox has no information about the (expected) execution behavior of the application and the connections between different shared objects. As a consequence, SFI is limited in its ability to identify devious application behavior. This paper presents a new approach to run untrusted code in a user- space sandbox. The approach replaces the standard loader with a security-aware trusted loader. The secure loader and the sandbox together cooperate to allow controlled execution of untrusted programs. A secure loader makes security a first class concept and ensures that the SFI system does not allow any unchecked code to be executed. The user-space sandbox builds on the secure loader and subsequently dynamically checks for malicious code and ensures that all control flow instructions of the application adhere to an execution model. The combination of the secure loader and the user-space sandbox enables the safe execution of untrusted code in user-space. Code injection attacks are stopped before any unintended code is executed. Furthermore, additional information provided by the loader can be used to support additional security properties, e.g., in lining of Procedure Linkage Table calls reduces the number of indirect control flow transfers and therefore limits jump-oriented attacks. This approach implements a secure platform for privileged applications and applications reachable over the network that anticipates and confines security threats from the beginning.

Mathias Payer, Tobias Hartmann, and Thomas R. Gross

Flash Memory for Ubiquitous Hardware Security Functions: True Random Number Generation and Device Fingerprints

We demonstrate that unmodified commercial Flash memory can provide two important security functions: true random number generation and digital fingerprinting. Taking advantage of random telegraph noise (a type of quantum noise source in highly scaled Flash memory cells) enables high quality true random number generation at a rate up to 10Kbits / second. A scheme based on partial programming exploits process variation in threshold voltages to allow quick generation of many unique fingerprints that can be used for identification and authentication. Both schemes require no change to Flash chips or interfaces, and do not require additional hardware.

Yinglei Wang, Wing-kei Yu, Shuo Wu, Greg Malysa, G. Edward Suh, and Edwin C. Kan

ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions

Programmers should never fix the same bug twice. Unfortunately this often happens when patches to buggy code are not propagated to all code clones. Unpatched code clones represent latent bugs, and for security-critical problems, latent vulnerabilities, thus are important to detect quickly. In this paper we present ReDeBug, a system for quickly finding unpatched code clones in OS- distribution scale code bases. While there has been previous work on code clone detection, ReDeBug represents a unique design point that uses a quick, syntax- based approach that scales to OS distribution-sized code bases that include code written in many different languages. Compared to previous approaches, ReDeBug may find fewer code clones, but gains scale, speed, reduces the false detection rate, and is language agnostic. We evaluated ReDeBug by checking all code from all packages in the Debian Lenny/Squeeze, Ubuntu Maverick/Oneiric, all Source Forge C and C++ projects, and the Linux kernel for unpatched code clones. ReDeBug processed over 2.1 billion lines of code at 700,000 LoC/min to build a source code database, then found 15,546 unpatched copies of known vulnerable code in currently deployed code by checking 376 Debian/Ubuntu security-related patches in 8 minutes on a commodity desktop machine. We show the real world impact of ReDeBug by confirming 145 real bugs in the latest version of Debian Squeeze packages.

Jiyong Jang, Abeer Agrawal, and David Brumley

Prudent Practices for Designing Malware Experiments: Status Quo and Outlook

Malware researchers rely on the observation of malicious code in execution to collect datasets for a wide array of experiments, including generation of detection models, study of longitudinal behavior, and validation of prior research. For such research to reflect prudent science, the work needs to address a number of concerns relating to the correct and representative use of the datasets, presentation of methodology in a fashion sufficiently transparent to enable reproducibility, and due consideration of the need not to harm others. In this paper we study the methodological rigor and prudence in 36 academic publications from 2006 – 2011 that rely on malware execution. 40% of these papers appeared in the 6 highest-ranked academic security conferences. We find frequent shortcomings, including problematic assumptions regarding the use of execution-driven datasets (25% of the papers), absence of description of security precautions taken during experiments (71% of the articles), and oftentimes insufficient description of the experimental setup. Deficiencies occur in top-tier venues and elsewhere alike, highlighting a need for the community to improve its handling of malware datasets. In the hope of aiding authors, reviewers, and readers, we frame guidelines regarding transparency, realism, correctness, and safety for collecting and using malware datasets.

Christian Rossow, Christian J. Dietrich, Chris Grier, Christian Kreibich, Vern Paxson, Norbert Pohlmann, Herbert Bos, and Maarten van Steen

Abusing File Processing in Malware Detectors for Fun and Profit

We systematically describe two classes of evasion exploits against automated malware detectors. Chameleon attacks confuse the detectors’ file-type inference heuristics, while werewolf attacks exploit discrepancies in format-specific file parsing between the detectors and actual operating systems and applications. These attacks do not rely on obfuscation, metamorphism, binary packing, or any other changes to malicious code. Because they enable even the simplest, easily detectable viruses to evade detection, we argue that file processing has become the weakest link of malware defense. Using a combination of manual analysis and black-box differential fuzzing, we discovered 45 new evasion exploits and tested them against 36 popular antivirus scanners, all of which proved vulnerable to various chameleon and werewolf attacks.

Suman Jana and Vitaly Shmatikov

Dissecting Android Malware: Characterization and Evolution

The popularity and adoption of smart phones has greatly stimulated the spread of mobile malware, especially on the popular platforms such as Android. In light of their rapid growth, there is a pressing need to develop effective solutions. However, our defense capability is largely constrained by the limited understanding of these emerging mobile malware and the lack of timely access to related samples. In this paper, we focus on the Android platform and aim to systematize or characterize existing Android malware. Particularly, with more than one year effort, we have managed to collect more than 1,200 malware samples that cover the majority of existing Android malware families, ranging from their debut in August 2010 to recent ones in October 2011. In addition, we systematically characterize them from various aspects, including their installation methods, activation mechanisms as well as the nature of carried malicious payloads. The characterization and a subsequent evolution-based study of representative families reveal that they are evolving rapidly to circumvent the detection from existing mobile anti-virus software. Based on the evaluation with four representative mobile security software, our experiments show that the best case detects 79.6% of them while the worst case detects only 20.2% in our dataset. These results clearly call for the need to better develop next- generation anti-mobile-malware solutions.

Yajin Zhou and Xuxian Jiang

Distance Hijacking Attacks on Distance Bounding Protocols

After several years of theoretical research on distance bounding protocols, the first implementations of such protocols have recently started to appear. These protocols are typically analyzed with respect to three types of attacks, which are historically known as Distance Fraud, Mafia Fraud, and Terrorist Fraud. We define and analyze a fourth main type of attack on distance bounding protocols, called Distance Hijacking. This type of attack poses a serious threat in many practical scenarios. We show that many proposed distance bounding protocols are vulnerable to Distance Hijacking, and we propose solutions to make these protocols resilient to this type of attack. We show that verifying distance bounding protocols using existing informal and formal frameworks does not guarantee the absence of Distance Hijacking attacks. We extend a formal framework for reasoning about distance bounding protocols to include overshadowing attacks. We use the resulting framework to prove the absence of all of the found attacks for protocols to which our countermeasures have been applied.

Cas Cremers, Kasper B. Rasmussen, Benedikt Schmidt, and Srdjan Capkun

Don’t Trust Satellite Phones : A Security Analysis of Two Satphone Standards

There is a rich body of work related to the security aspects of cellular mobile phones, in particular with respect to the GSM and UMTS systems. To the best of our knowledge, however, there has been no investigation of the security of satellite phones (abbr. sat phones). Even though a niche market compared to the G2 and G3 mobile systems, there are several 100,000 sat phone subscribers worldwide. Given the sensitive nature of some of their application domains (e.g., natural disaster areas or military campaigns), security plays a particularly important role for sat phones. In this paper, we analyze the encryption systems used in the two existing (and competing) sat phone standards, GMR-1 and GMR-2. The first main contribution is that we were able to completely reverse engineer the encryption algorithms employed. Both ciphers had not been publicly known previously. We describe the details of the recovery of the two algorithms from freely available DSP-firmware updates for sat phones, which included the development of a custom disassembler and tools to analyze the code, and extending prior work on binary analysis to efficiently identify cryptographic code. We note that these steps had to be repeated for both systems, because the available binaries were from two entirely different DSP processors. Perhaps somewhat surprisingly, we found that the GMR-1 cipher can be considered a proprietary variant of the GSM A5/2 algorithm, whereas the GMR-2 cipher is an entirely new design. The second main contribution lies in the cryptanalysis of the two proprietary stream ciphers. We were able to adopt known A5/2 cipher text-only attacks to the GMR-1 algorithm with an average case complexity of 2^{32} steps. With respect to the GMR-2 cipher, we developed a new attack which is powerful in a known-plaintext setting. In this situation, the encryption key for one session, i.e., one phone call, can be recovered with approximately 50-65 bytes of key stream and a moderate computational complexity. A major finding of our work is that the stream ciphers of the two existing satellite phone systems are considerably weaker than what is state-of-the-art in symmetric cryptography.

Benedikt Driessen, Ralf Hund, Carsten Willems, Christof Paar, and Thorsten Holz

Memento: Learning Secrets from Process Footprints

We describe a new side-channel attack. By tracking changes in the application’s memory footprint, a concurrent process belonging to a different user can learn its secrets. Using Web browsers as the target, we show how an unprivileged, local attack process – for example, a malicious Android app – can infer which page the user is browsing, as well as finer-grained information: whether she is a paid customer, her interests, etc. This attack is an instance of a broader problem. Many isolation mechanisms in modern systems reveal accounting information about program execution, such as memory usage and CPU scheduling statistics. If temporal changes in this public information are correlated with the program’s secrets, they can lead to a privacy breach. To illustrate the pervasiveness of this problem, we show how to exploit scheduling statistics for keystroke sniffing in Linux and Android, and how to combine scheduling statistics with the dynamics of memory usage for more accurate adversarial inference of browsing behavior.

Suman Jana and Vitaly Shmatikov

Foundations of Logic-Based Trust Management

Over the last 15 years, many policy languages have been developed for specifying policies and credentials under the trust management paradigm. What has been missing is a formal semantics–in particular, one that would capture the inherently dynamic nature of trust management, where access decisions are based on the local policy in conjunction with varying sets of dynamically submitted credentials. The goal of this paper is to rest trust management on a solid formal foundation. To this end, we present a model theory that is based on Kripke structures for counterfactual logic. The semantics enjoys compositionality and full abstraction with respect to a natural notion of observational equivalence between trust management policies. Furthermore, we present a corresponding Hilbert-style axiomatization that is expressive enough for reasoning about a system’s observables on the object level. We describe an implementation of a mechanization of the proof theory, which can be used to prove non-trivial meta-theorems about trust management systems, as well as analyze probing attacks on such systems. Our benchmark results show that this logic-based approach performs significantly better than the only previously available, ad-hoc analysis method for probing attacks.

Moritz Y. Becker, Alessandra Russo, and Nik Sultana

Formalizing and Enforcing Purpose Restrictions in Privacy Policies

Privacy policies often place restrictions on the purposes for which a governed entity may use personal information. For example, regulations, such as the Health Insurance Portability and Accountability Act (HIPAA), require that hospital employees use medical information for only certain purposes, such as treatment, but not for others, such as gossip. Thus, using formal or automated methods for enforcing privacy policies requires a semantics of purpose restrictions to determine whether an action is for a purpose or not. We provide such a semantics using a formalism based on planning. We model planning using a modified version of Markov Decision Processes (MDPs), which exclude redundant actions for a formal definition of redundant. We argue that an action is for a purpose if and only if the action is part of a plan for optimizing the satisfaction of that purpose under the MDP model. We use this formalization to define when a sequence of actions is only for or not for a purpose. This semantics enables us to create and implement an algorithm for automating auditing, and to describe formally and compare rigorously previous enforcement methods. To validate our semantics, we conduct a survey to compare our semantics to how people commonly understand the word “purpose”.

Michael Carl Tschantz, Anupam Datta, and Jeannette M. Wing

Sharing Mobile Code Securely with Information Flow Control

Mobile code is now a nearly inescapable component of modern computing, thanks to client-side code that runs within web browsers. The usual tension between security and functionality is particularly acute in a mobile-code setting, and current platforms disappoint on both dimensions. We introduce a new architecture for secure mobile code, with which developers can use, publish, and share mobile code securely across trust domains. This architecture enables new kinds of distributed applications, and makes it easier to reuse and evolve code from untrusted providers. The architecture gives mobile code considerable expressive power: it can securely access distributed, persistent, shared information from multiple trust domains, unlike web applications bound by the same-origin policy. The core of our approach is analyzing how flows of information within mobile code affect confidentiality and integrity. Because mobile code is untrusted, this analysis requires novel constraints on information flow and authority. We show that these constraints offer principled enforcement of strong security while avoiding the limitations of current mobile-code security mechanisms. We evaluate our approach by demonstrating a variety of mobile-code applications, showing that new functionality can be offered along with strong security.

Owen Arden, Michael D. George, Jed Liu, K. Vikram, Aslan Askarov, and Andrew C. Myers

The Psychology of Security for the Home Computer User

The home computer user is often said to be the weakest link in computer security. They do not always follow security advice, and they take actions, as in phishing, that compromise themselves. In general, we do not understand why users do not always behave safely, which would seem to be in their best interest. This paper reviews the literature of surveys and studies of factors that influence security decisions for home computer users. We organize the review in four sections: understanding of threats, perceptions of risky behavior, efforts to avoid security breaches and attitudes to security interventions. We find that these studies reveal a lot of reasons why current security measures may not match the needs or abilities of home computer users and suggest future work needed to inform how security is delivered to this user group.

Adele E. Howe, Indrajit Ray, Mark Roberts, Malgorzata Urbanska, and Zinta Byrne

User-Driven Access Control: Rethinking Permission Granting in Modern Operating Systems

Modern client platforms, such as iOS, Android, Windows Phone, Windows 8, and web browsers, run each application in an isolated environment with limited privileges. A pressing open problem in such systems is how to allow users to grant applications access to user-owned resources, e.g., to privacy- and cost- sensitive devices like the camera or to user data residing in other applications. A key challenge is to enable such access in a way that is non- disruptive to users while still maintaining least-privilege restrictions on applications. In this paper, we take the approach of user-driven access control, whereby permission granting is built into existing user actions in the context of an application, rather than added as an afterthought via manifests or system prompts. To allow the system to precisely capture permission-granting intent in an application’s context, we introduce access control gadgets (ACGs). Each user- owned resource exposes ACGs for applications to embed. The user’s authentic UI interactions with an ACG grant the application permission to access the corresponding resource. Our prototyping and evaluation experience indicates that user-driven access control is a promising direction for enabling in-context, non-disruptive, and least-privilege permission granting on modern client platforms.

Franziska Roesner, Tadayoshi Kohno, Alexander Moshchuk, Bryan Parno, Helen J. Wang, and Crispin Cowan

New Results for Timing-Based Attestation

In this paper we present a comprehensive timing-based attestation system suitable for typical enterprise use, and evidence of that system’s performance. This system, similar to Pioneer [20] but built with relaxed assumptions, successfully detects attacks on code integrity over 10 links of an enterprise network, despite an average of just 1.7% time overhead for the attacker. We also present the first implementation and evaluation of a Trusted Platform Module (TPM) hardware timing-based attestation protocol. We describe the design and results of a set of experiments showing the effectiveness of our timing-based system, thereby providing further evidence of the practicality of timing-based attestation in real-world settings. While system measurement itself is a worthwhile goal, and timing-based attestation systems can provide measurements that are equally as trustworthy as hardware-based attestation systems, we feel that Time Of Check, Time Of Use (TOCTOU) attacks have not received appropriate attention in the literature. To address this topic, we present the three conditions required to execute such an attack, and how past attacks and defenses relate to these conditions.

Xeno Kovah, Corey Kallenberg, Chris Weathers, Amy Herzog, Matthew Albin, and John Butterworth

ObliviAd: Provably Secure and Practical Online Behavioral Advertising

Online behavioral advertising (OBA) involves the tracking of web users’ online activities in order to deliver tailored advertisements. OBA has become a rapidly increasing source of revenue for a number of web services, and it is typically conducted by third-party data analytics firms such as brokers, which track user behaviors across web-sessions using mechanisms such as persistent cookies. This practice raises significant privacy concerns among users and privacy advocates alike. Therefore, the task of designing OBA systems that do not reveal user profiles to third parties has been receiving growing interest from the research community. Nevertheless, existing solutions are not ideal for privacy preserving OBA: some of them do not provide adequate privacy to users or adequate targeting information to brokers, while others require trusted third parties that are difficult to realize. In this paper, we propose ObliviAd a provably secure architecture for privacy preserving OBA. The distinguishing features of our approach are the usage of secure hardware-based private information retrieval for distributing advertisements and high-latency mixing of electronic tokens for billing advertisers without disclosing any information about client profiles to brokers. ObliviAd does not assume any trusted party and provides brokers an economical alternative that preserves the privacy of users without hampering the precision of ads selection. We present the first formal security definitions for OBA systems (namely, profile privacy, profile unlink ability, and billing correctness) and conduct a formal security analysis of ObliviAd using ProVerif, an automated cryptographic protocol verifier, establishing the aforementioned security properties against a strong adversarial model. Finally, we demonstrated the practicality of our approach with an experimental evaluation.

Michael Backes, Aniket Kate, Matteo Maffei, and Kim Pecina

Quid-Pro-Quo-tocols: Strengthening Semi-honest Protocols with Dual Execution

Known protocols for secure two-party computation that are designed to provide full security against malicious behavior are significantly less efficient than protocols intended only to thwart semi-honest adversaries. We present a concrete design and implementation of protocols achieving security guarantees that are much stronger than are possible with semi-honest protocols, at minimal extra cost. Specifically, we consider protocols in which a malicious adversary may learn a single (arbitrary) bit of additional information about the honest party’s input. Correctness of the honest party’s output is still guaranteed. Adapting prior work of Mohassel and Franklin, the basic idea in our protocols is to conduct two separate runs of a (specific) semi-honest, garbled-circuit protocol, with the parties swapping roles, followed by an inexpensive secure equality test. We provide a rigorous definition and prove that this protocol leaks no more than one additional bit against a malicious adversary. In addition, we propose some heuristic enhancements to reduce the overall information a cheating adversary learns. Our experiments show that protocols meeting this security level can be implemented at cost very close to that of protocols that only achieve semi-honest security. Our results indicate that this model enables the large-scale, practical applications possible within the semi- honest security model, while providing dramatically stronger security guarantees.

Yan Huang, Jonathan Katz, and David Evans

Hummingbird: Privacy at the Time of Twitter

In the last several years, micro-blogging Online Social Networks (OSNs), such as Twitter, have taken the world by storm, now boasting over 100 million subscribers. As an unparalleled stage for an enormous audience, they offer fast and reliable centralized diffusion of pithy tweets to great multitudes of information-hungry and always-connected followers. At the same time, this information gathering and dissemination paradigm prompts some important privacy concerns about relationships between tweeters, followers and interests of the latter. In this paper, we assess privacy in today’s Twitter-like OSNs and describe an architecture and a trial implementation of a privacy-preserving service called Hummingbird. It is essentially a variant of Twitter that protects tweet contents, hash tags and follower interests from the (potentially) prying eyes of the centralized server. We argue that, although inherently limited by Twitter’s mission of scalable information-sharing, this degree of privacy is valuable. We demonstrate, via a working prototype, that Hummingbird’s additional costs are tolerably low. We also sketch out some viable enhancements that might offer better privacy in the long term.

Emiliano De Cristofaro, Claudio Soriente, Gene Tsudik, and Andrew Williams

On the Feasibility of Internet-Scale Author Identification

We study techniques for identifying an anonymous author via linguistic stylometry, i.e., comparing the writing style against a corpus of texts of known authorship. We experimentally demonstrate the effectiveness of our techniques with as many as 100,000 candidate authors. Given the increasing availability of writing samples online, our result has serious implications for anonymity and free speech–an anonymous blogger or whistleblower may be unmasked unless they take steps to obfuscate their writing style. While there is a huge body of literature on authorship recognition based on writing style, almost none of it has studied corpora of more than a few hundred authors. The problem becomes qualitatively different at a large scale, as we show, and techniques from prior work fail to scale, both in terms of accuracy and performance. We study a variety of classifiers, both “lazy” and “eager,” and show how to handle the huge number of classes. We also develop novel techniques for confidence estimation of classifier outputs. Finally, we demonstrate stylometric authorship recognition on texts written in different contexts. In over 20% of cases, our classifiers can correctly identify an anonymous author given a corpus of texts from 100,000 authors; in about 35% of cases the correct author is one of the top 20 guesses. If we allow the classifier the option of not making a guess, via confidence estimation we are able to increase the precision of the top guess from 20% to over 80% with only a halving of recall.

Arvind Narayanan, Hristo Paskov, Neil Zhenqiang Gong, John Bethencourt, Emil Stefanov, Eui Chul Richard Shin, and Dawn Song

Secure and Scalable Fault Localization under Dynamic Traffic Patterns

Compromised and misconfigured routers are a well-known problem in ISP and enterprise networks. Data-plane fault localization (FL) aims to identify faulty links of compromised and misconfigured routers during packet forwarding, and is recognized as an effective means of achieving high network availability. Existing secure FL protocols are path-based, which assume that the source node knows the entire outgoing path that delivers the source node’s packets and that the path is static and long-lived. However, these assumptions are incompatible with the dynamic traffic patterns and agile load balancing commonly seen in modern networks. To cope with real-world routing dynamics, we propose the first secure neighborhood-based FL protocol, DynaFL, with no requirements on path durability or the source node knowing the outgoing paths. Through a core technique we named delayed key disclosure, DynaFL incurs little communication overhead and a small, constant router state independent of the network size or the number of flows traversing a router. In addition, each DynaFL router maintains only a single secret key, which based on our measurement results represents 2 - 4 orders of magnitude reduction over previous path-based FL protocols.

Xin Zhang, Chang Lan, and Adrian Perrig

Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail

We consider the setting of HTTP traffic over encrypted tunnels, as used to conceal the identity of websites visited by a user. It is well known that traffic analysis (TA) attacks can accurately identify the website a user visits despite the use of encryption, and previous work has looked at specific attack/countermeasure pairings. We provide the first comprehensive analysis of general-purpose TA countermeasures. We show that nine known countermeasures are vulnerable to simple attacks that exploit coarse features of traffic (e.g., total time and bandwidth). The considered countermeasures include ones like those standardized by TLS, SSH, and IPsec, and even more complex ones like the traffic morphing scheme of Wright et al. As just one of our results, we show that despite the use of traffic morphing, one can use only total upstream and downstream bandwidth to identify–with 98% accuracy –which of two websites was visited. One implication of what we find is that, in the context of website identification, it is unlikely that bandwidth-efficient, general-purpose TA countermeasures can ever provide the type of security targeted in prior work.

Kevin P. Dyer, Scott E. Coull, Thomas Ristenpart, and Thomas Shrimpton

Off-path TCP Sequence Number Inference Attack - How Firewall Middleboxes Reduce Security

In this paper, we report a newly discovered “off-path TCP sequence number inference” attack enabled by firewall middle boxes. It allows an off-path (i.e., not man-in-the-middle) attacker to hijack a TCP connection and inject malicious content, effectively granting the attacker write-only permission on the connection. For instance, with the help of unprivileged malware, we demonstrate that a successful attack can hijack an HTTP session and return a phishing Face book login page issued by a browser. With the same mechanisms, it is also possible to inject malicious Javascript to post tweets or follow other people on behalf of the victim. The TCP sequence number inference attack is mainly enabled by the sequence-number-checking firewall middle boxes. Through carefully- designed and well-timed probing, the TCP sequence number state kept on the firewall middle box can be leaked to an off-path attacker. We found such firewall middle boxes to be very popular in cellular networks–at least 31.5% of the 149 measured networks deploy such firewalls. Finally, since the sequence- number-checking feature is enabled by design, it is unclear how to mitigate the problem easily.

Zhiyun Qian and Z. Morley Mao

Signing Me onto Your Accounts through Facebook and Google: A Traffic-Guided Security Study of Commercially Deployed Single-Sign-On Web Services

With the boom of software-as-a-service and social networking, web-based single sign-on (SSO) schemes are being deployed by more and more commercial websites to safeguard many web resources. Despite prior research in formal verification, little has been done to analyze the security quality of SSO schemes that are commercially deployed in the real world. Such an analysis faces unique technical challenges, including lack of access to well-documented protocols and code, and the complexity brought in by the rich browser elements (script, Flash, etc.). In this paper, we report the first “field study” on popular web SSO systems. In every studied case, we focused on the actual web traffic going through the browser, and used an algorithm to recover important semantic information and identify potential exploit opportunities. Such opportunities guided us to the discoveries of real flaws. In this study, we discovered 8 serious logic flaws in high-profile ID providers and relying party websites, such as Open ID (including Google ID and Pay Pal Access), Face book, Jan Rain, Freelancer, Farm Ville, Sears.com, etc. Every flaw allows an attacker to sign in as the victim user. We reported our findings to affected companies, and received their acknowledgements in various ways. All the reported flaws, except those discovered very recently, have been fixed. This study shows that the overall security quality of SSO deployments seems worrisome. We hope that the SSO community conducts a study similar to ours, but in a larger scale, to better understand to what extent SSO is insecurely deployed and how to respond to the situation.

Rui Wang, Shuo Chen, and XiaoFeng Wang

Unleashing Mayhem on Binary Code

In this paper we present Mayhem, a new system for automatically finding exploitable bugs in binary (i.e., executable) programs. Every bug reported by Mayhem is accompanied by a working shell-spawning exploit. The working exploits ensure soundness and that each bug report is security-critical and actionable. Mayhem works on raw binary code without debugging information. To make exploit generation possible at the binary-level, Mayhem addresses two major technical challenges: actively managing execution paths without exhausting memory, and reasoning about symbolic memory indices, where a load or a store address depends on user input. To this end, we propose two novel techniques: 1) hybrid symbolic execution for combining online and offline (concolic) execution to maximize the benefits of both techniques, and 2) index-based memory modeling, a technique that allows Mayhem to efficiently reason about symbolic memory at the binary level. We used Mayhem to find and demonstrate 29 exploitable vulnerabilities in both Linux and Windows programs, 2 of which were previously undocumented.

Sang Kil Cha, Thanassis Avgerinos, Alexandre Rebert, and David Brumley

Clash Attacks on the Verifiability of E-Voting Systems

Verifiability is a central property of modern e-voting systems. Intuitively, verifiability means that voters can check that their votes were actually counted and that the published result of the election is correct, even if the voting machines/authorities are (partially) untrusted. In this paper, we raise awareness of a simple attack, which we call a clash attack, on the verifiability of e-voting systems. The main idea behind this attack is that voting machines manage to provide different voters with the same receipt. As a result, the voting authorities can safely replace ballots by new ballots, and by this, manipulate the election without being detected. This attack does not seem to have attracted much attention in the literature. Even though the attack is quite simple, we show that, under reasonable trust assumptions, it applies to several e-voting systems that have been designed to provide verifiability. In particular, we show that it applies to the prominent Three Ballot and VAV voting systems as well as to two e-voting systems that have been deployed in real elections: the Wombat Voting system and a variant of the Helios voting system. We discuss countermeasures for each of these systems and for (various variants of) Helios provide a formal analysis based on a rigorous definition of verifiability. More precisely, our analysis of Helios is with respect to the more general and in the area of e-voting often overlooked notion of accountability.

Ralf Ksters, Tomasz Truderung, and Andreas Vogt

Third-Party Web Tracking: Policy and Technology

In the early days of the web, content was designed and hosted by a single person, group, or organization. No longer. Webpages are increasingly composed of content from myriad unrelated “third-party” websites in the business of advertising, analytics, social networking, and more. Third-party services have tremendous value: they support free content and facilitate web innovation. But third-party services come at a privacy cost: researchers, civil society organizations, and policymakers have increasingly called attention to how third parties can track a user’s browsing activities across websites. This paper surveys the current policy debate surrounding third-party web tracking and explains the relevant technology. It also presents the FourthParty web measurement platform and studies we have conducted with it. Our aim is to inform researchers with essential background and tools for contributing to public understanding and policy debates about web tracking.

Jonathan R. Mayer and John C. Mitchell

EvilSeed: A Guided Approach to Finding Malicious Web Pages

Malicious web pages that use drive-by download attacks or social engineering techniques to install unwanted software on a user’s computer have become the main avenue for the propagation of malicious code. To search for malicious web pages, the first step is typically to use a crawler to collect URLs that are live on the Internet. Then, fast prefiltering techniques are employed to reduce the amount of pages that need to be examined by more precise, but slower, analysis tools (such as honey clients). While effective, these techniques require a substantial amount of resources. A key reason is that the crawler encounters many pages on the web that are benign, that is, the “toxicity” of the stream of URLs being analyzed is low. In this paper, we present EVILSEED, an approach to search the web more efficiently for pages that are likely malicious. EVILSEED starts from an initial seed of known, malicious web pages. Using this seed, our system automatically generates search engines queries to identify other malicious pages that are similar or related to the ones in the initial seed. By doing so, EVILSEED leverages the crawling infrastructure of search engines to retrieve URLs that are much more likely to be malicious than a random page on the web. In other words EVILSEED increases the “toxicity” of the input URL stream. Also, we envision that the features that EVILSEED presents could be directly applied by search engines in their prefilters. We have implemented our approach, and we evaluated it on a large-scale dataset. The results show that EVILSEED is able to identify malicious web pages more efficiently when compared to crawler-based approaches.

Luca Invernizzi and Paolo Milani Comparetti

Rozzle: De-cloaking Internet Malware

JavaScript-based malware attacks have increased in recent years and currently represent a signicant threat to the use of desktop computers, smartphones, and tablets. While static and runtime methods for malware detection have been proposed in the literature, both on the client side, for just-in-time in-browser detection, as well as offline, crawler-based malware discovery, these approaches encounter the same fundamental limitation. Web-based malware tends to be environment-specific, targeting a particular browser, often attacking specic versions of installed plugins. This targeting occurs because the malware exploits vulnerabilities in specific plugins and fails otherwise. As a result, a fundamental limitation for detecting a piece of malware is that malware is triggered infrequently, only showing itself when the right environment is present. We observe that, using fingerprinting techniques that capture and exploit unique properties of browser configurations, almost all existing malware can be made virtually impssible for malware scanners to detect. This paper proposes Rozzle, a JavaScript multi-execution virtual machine, as a way to explore multiple execution paths within a single execution so that environment- specific malware will reveal itself. Using large-scale experiments, we show that Rozzle increases the detection rate for offline runtime detection by almost seven times. In addition, Rozzle triples the effectiveness of online runtime detection. We show that Rozzle incurs virtually no runtime overhead and allows us to replace multiple VMs running different browser configurations with a single Rozzle-enabled browser, reducing the hardware requirements, network bandwidth, and power consumption.

Clemens Kolbitsch, Benjamin Livshits, Benjamin Zorn, and Christian Seifert

Detecting Hoaxes, Frauds, and Deception in Writing Style Online

In digital forensics, questions often arise about the authors of documents: their identity, demographic background, and whether they can be linked to other documents. The field of stylometry uses linguistic features and machine learning techniques to answer these questions. While stylometry techniques can identify authors with high accuracy in non-adversarial scenarios, their accuracy is reduced to random guessing when faced with authors who intentionally obfuscate their writing style or attempt to imitate that of another author. While these results are good for privacy, they raise concerns about fraud. We argue that some linguistic features change when people hide their writing style and by identifying those features, stylistic deception can be recognized. The major contribution of this work is a method for detecting stylistic deception in written documents. We show that using a large feature set, it is possible to distinguish regular documents from deceptive documents with 96.6% accuracy (F-measure). We also present an analysis of linguistic features that can be modified to hide writing style.

Sadia Afroz, Michael Brennan, and Rachel Greenstadt

LASTor : A Low-Latency AS-Aware Tor Client

The widely used Tor anonymity network is designed to enable low-latency anonymous communication. However, in practice, interactive communication on Tor – which accounts for over 90% of connections in the Tor network [1] – incurs latencies over 5x greater than on the direct Internet path. In addition, since path selection to establish a circuit in Tor is oblivious to Internet routing, anonymity guarantees can breakdown in cases where an autonomous system (AS) can correlate traffic across the entry and exit segments of a circuit. In this paper, we show that both of these shortcomings in Tor can be addressed with only client-side modifications, i.e., without requiring a revamp of the entire Tor architecture. To this end, we design and implement a new Tor client, LASTor. First, we show that LASTor can deliver significant latency gains over the default Tor client by simply accounting for the inferred locations of Tor relays while choosing paths. Second, since the preference for low latency paths reduces the entropy of path selection, we design LASTor’s path selection algorithm to be tunable. A user can choose an appropriate tradeoff between latency and anonymity by specifying a value between 0 (lowest latency) and 1 (highest anonymity) for a single parameter. Lastly, we develop an efficient and accurate algorithm to identify paths on which an AS can correlate traffic between the entry and exit segments. This algorithm enables LASTor to avoid such paths and improve a user’s anonymity, while the low runtime of the algorithm ensures that the impact on end-to-end latency of communication is low. By applying our techniques to measurements of real Internet paths and by using LASTor to visit the top 200 websites from several geographically-distributed end-hosts, we show that, in comparison to the default Tor client, LASTor reduces median latencies by 25% while also reducing the false negative rate of not detecting a potential snooping AS from 57% to 11%.

Masoud Akhoondi, Curtis Yu, and Harsha V. Madhyastha

LAP: Lightweight Anonymity and Privacy

Popular anonymous communication systems often require sending packets through a sequence of relays on dilated paths for strong anonymity protection. As a result, increased end-to-end latency renders such systems inadequate for the majority of Internet users who seek an intermediate level of anonymity protection while using latency-sensitive applications, such as Web applications. This paper serves to bridge the gap between communication systems that provide strong anonymity protection but with intolerable latency and non-anonymous communication systems by considering a new design space for the setting. More specifically, we explore how to achieve near-optimal latency while achieving an intermediate level of anonymity with a weaker yet practical adversary model (i.e., protecting an end-host’s identity and location from servers) such that users can choose between the level of anonymity and usability. We propose Lightweight Anonymity and Privacy (LAP), an efficient network-based solution featuring lightweight path establishment and stateless communication, by concealing an end-host’s topological location to enhance anonymity against remote tracking. To show practicality, we demonstrate that LAP can work on top of the current Internet and proposed future Internet architectures.

Hsu-Chun Hsiao, Tiffany Hyun-Jin Kim, Adrian Perrig, Akira Yamada, Samuel C. Nelson, Marco Gruteser, and Wei Meng

Guess Again (and Again and Again): Measuring Password Strength by Simulating Password-Cracking Algorithms

Text-based passwords remain the dominant authentication method in computer systems, despite significant advancement in attackers’ capabilities to perform password cracking. In response to this threat, password composition policies have grown increasingly complex. However, there is insufficient research defining metrics to characterize password strength and using them to evaluate password-composition policies. In this paper, we analyze 12,000 passwords collected under seven composition policies via an online study. We develop an efficient distributed method for calculating how effectively several heuristic password-guessing algorithms guess passwords. Leveraging this method, we investigate (a) the resistance of passwords created under different conditions to guessing, (b) the performance of guessing algorithms under different training sets, (c) the relationship between passwords explicitly created under a given composition policy and other passwords that happen to meet the same requirements, and (d) the relationship between guess ability, as measured with password-cracking algorithms, and entropy estimates. Our findings advance understanding of both password-composition policies and metrics for quantifying password security.

Patrick Gage Kelley, Saranga Komanduri, Michelle L. Mazurek, Richard Shay, Timothy Vidas, Lujo Bauer, Nicolas Christin, Lorrie Faith Cranor, and Julio Lpez

The Science of Guessing: Analyzing an Anonymized Corpus of 70 Million Passwords

We report on the largest corpus of user-chosen passwords ever studied, consisting of anonymized password histograms representing almost 70 million Yahoo! users, mitigating privacy concerns while enabling analysis of dozens of subpopulations based on demographic factors and site usage characteristics. This large data set motivates a thorough statistical treatment of estimating guessing difficulty by sampling from a secret distribution. In place of previously used metrics such as Shannon entropy and guessing entropy, which cannot be estimated with any realistically sized sample, we develop partial guessing metrics including a new variant of guesswork parameterized by an attacker’s desired success rate. Our new metric is comparatively easy to approximate and directly relevant for security engineering. By comparing password distributions with a uniform distribution which would provide equivalent security against different forms of guessing attack, we estimate that passwords provide fewer than 10 bits of security against an online, trawling attack, and only about 20 bits of security against an optimal offline dictionary attack. We find surprisingly little variation in guessing difficulty; every identifiable group of users generated a comparably weak password distribution. Security motivations such as the registration of a payment card have no greater impact than demographic factors such as age and nationality. Even proactive efforts to nudge users towards better password choices with graphical feedback make little difference. More surprisingly, even seemingly distant language communities choose the same weak passwords and an attacker never gains more than a factor of 2 efficiency gain by switching from the globally optimal dictionary to a population-specific lists.

Joseph Bonneau

The Quest to Replace Passwords: A Framework for Comparative Evaluation of Web Authentication Schemes

We evaluate two decades of proposals to replace text passwords for general- purpose user authentication on the web using a broad set of twenty-five usability, deployability and security benefits that an ideal scheme might provide. The scope of proposals we survey is also extensive, including password management software, federated login protocols, graphical password schemes, cognitive authentication schemes, one-time passwords, hardware tokens, phone- aided schemes and biometrics. Our comprehensive approach leads to key insights about the difficulty of replacing passwords. Not only does no known scheme come close to providing all desired benefits: none even retains the full set of benefits that legacy passwords already provide. In particular, there is a wide range from schemes offering minor security benefits beyond legacy passwords, to those offering significant security benefits in return for being more costly to deploy or more difficult to use. We conclude that many academic proposals have failed to gain traction because researchers rarely consider a sufficiently wide range of real-world constraints. Beyond our analysis of current schemes, our framework provides an evaluation methodology and benchmark for future web authentication proposals.

Joseph Bonneau, Cormac Herley, Paul C. van Oorschot, and Frank Stajano

ILR: Where’d My Gadgets Go?

Through randomization of the memory space and the confinement of code to non- data pages, computer security researchers have made a wide range of attacks against program binaries more difficult. However, attacks have evolved to exploit weaknesses in these defenses. To thwart these attacks, we introduce a novel technique called Instruction Location Randomization (ILR). Conceptually, ILR randomizes the location of every instruction in a program, thwarting an attacker’s ability to re-use program functionality (e.g., arc-injection attacks and return-oriented programming attacks). ILR operates on arbitrary executable programs, requires no compiler support, and requires no user interaction. Thus, it can be automatically applied post-deployment, allowing easy and frequent re- randomization. Our preliminary prototype, working on 32-bit x86 Linux ELF binaries, provides a high degree of entropy. Individual instructions are randomly placed within a 31-bit address space. Thus, attacks that rely on a priori knowledge of the location of code or derandomization are not feasible. We demonstrated ILR’s defensive capabilities by defeating attacks against programs with vulnerabilities, including Adobe’s PDF viewer, acroread, which had an in- the-wild vulnerability. Additionally, using an industry-standard CPU performance benchmark suite, we compared the run time of prototype ILR-protected executables to that of native executables. The average run-time overhead of ILR was 13% with more than half the programs having effectively no overhead (15 out of 29), indicating that ILR is a realistic and cost-effective mitigation technique.

Jason Hiser, Anh Nguyen-Tuong, Michele Co, Matthew Hall, and Jack W. Davidson

Space Traveling across VM: Automatically Bridging the Semantic Gap in Virtual Machine Introspection via Online Kernel Data Redirection

It is generally believed to be a tedious, time consuming, and error-prone process to develop a virtual machine introspection (VMI) tool manually because of the semantic gap. Recent advances in Virtuoso show that we can largely narrow the semantic gap. But it still cannot completely automate the VMI tool generation. In this paper, we present VMST, an entirely new technique that can automatically bridge the semantic gap and generate the VMI tools. The key idea is that, through system wide instruction monitoring, we can automatically identify the introspection related data and redirect these data accesses to the in-guest kernel memory. VMST offers a number of new features and capabilities. Particularly, it automatically enables an in-guest inspection program to become an introspection program. We have tested VMST over 15 commonly used utilities on top of 20 different Linux kernels. The experimental results show that our technique is general (largely OS-agnostic), and it introduces 9.3X overhead on average for the introspected program compared to the native non-redirected one.

Yangchun Fu and Zhiqiang Lin

Smashing the Gadgets: Hindering Return-Oriented Programming Using In-place Code Randomization

The wide adoption of non-executable page protections in recent versions of popular operating systems has given rise to attacks that employ return-oriented programming (ROP) to achieve arbitrary code execution without the injection of any code. Existing defenses against ROP exploits either require source code or symbolic debugging information, or impose a significant runtime overhead, which limits their applicability for the protection of third-party applications. In this paper we present in-place code randomization, a practical mitigation technique against ROP attacks that can be applied directly on third-party software. Our method uses various narrow-scope code transformations that can be applied statically, without changing the location of basic blocks, allowing the safe randomization of stripped binaries even with partial disassembly coverage. These transformations effectively eliminate about 10%, and probabilistically break about 80% of the useful instruction sequences found in a large set of PE files. Since no additional code is inserted, in-place code randomization does not incur any measurable runtime overhead, enabling it to be easily used in tandem with existing exploit mitigations such as address space layout randomization. Our evaluation using publicly available ROP exploits and two ROP code generation toolkits demonstrates that our technique prevents the exploitation of the tested vulnerable Windows 7 applications, including Adobe Reader, as well as the automated construction of alternative ROP payloads that aim to circumvent in-place code randomization using solely any remaining unaffected instruction sequences.

Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis

Building Verifiable Trusted Path on Commodity x86 Computers

A trusted path is a protected channel that assures the secrecy and authenticity of data transfers between a user’s input/output (I/O) device and a program trusted by that user. We argue that, despite its incontestable necessity, current commodity systems do not support trusted path with any significant assurance. This paper presents a hyper visor-based design that enables a trusted path to bypass an untrusted operating-system, applications, and I/O devices, with a minimal Trusted Computing Base (TCB). We also suggest concrete I/O architectural changes that will simplify future trusted-path system design. Our system enables users to verify the states and configurations of one or more trusted-paths using a simple, secret less, hand-held device. We implement a simple user-oriented trusted path as a case study.

Zongwei Zhou, Virgil D. Gligor, James Newsome, and Jonathan M. McCune