Research Projects

Mobile App Security and Analysis


The increasing popularity of smartphones has made them a target for malware, and existing protection mechanisms employed by app markets like Google Play and App Store have shown limited success due to lack of understanding of app behaviors. To address this issue, we developed effective and scalable techniques that extract app behaviors from different perspectives by analyzing multiple types of artifacts, including app code, app descriptions, API documents, app meta-data, and graphical user interfaces (GUI). Our techniques help both users and automatic tools better comprehend the behaviors of mobile apps, and thus improve the effectiveness in controlling the security and privacy of mobile apps.

How and Why permissions are used? Existing app markets such as Google Play present a permission list to show what private data an app may access, rather than how and why the app uses the private data, causing users to make uninformed decisions on how to control their privacy. To address this issue, I first proposed a program analysis approach that analyzes app code to extract permission-related behaviors. My approach computes static information flows and classifies them as safe/unsafe based on a tamper analysis that tracks whether private data is obscured before escaping through output channels. I built my approach into TouchDevelop, and evaluated my approach by studying 546 applications published by 194 users. The results show that among the 546 applications, our approach reduces the need to make access granting choices to a mere 10.1% (54) of all applications. To understand why permissionsareused,we proposed WHYPER, a framework that customizes Natural Language Processing(NLP) techniques with semantic models extracted from API documents to determine which sentence (if any) in application descriptions indicates the use of the permission. Our results on 581 popular applications demonstrated that WHYPER effectively identified the sentences that described needs of permissions with 82.8% precision and 81.5% recall.

In which context permissions are used? To seek maximal benefits via executing their malicious payload fre- quently and prolong their lifetime via evading detection, malicious apps often exploit imperceptible system events and external-environment states. Therefore, the context in which a security-sensitive behavior occurs is a strong indicator of whether the security-sensitive behavior is malicious or benign. Based on this insight, we proposed an approach, APPCONTEXT, that analyzed both program code and app meta-data such as manifest file to extract the contexts for the security-sensitive behaviors. These contexts describe the external events and the external-environment states that trigger the behaviors. To evaluate APPCONTEXT, we trained a support vector machine (SVM) classifier using the contexts extracted from 202 malicious apps from multiple malware datasets and 633 benign apps from the Google Play Store. The classifier correctly identified 192 malicious apps with 87.7% precision and 95% recall.

What sensitive information is expected? Besides permission protected sensitive data, such as locations and con- tacts, another major type of sensitive data that has been largely neglected is the sensitive user inputs, i.e., sensitive information entered by users via the GUI. To protect such sensitive data, we proposed a scalable static analysis ap- proach, SUPOR. SUPOR analyzes app GUIs (i.e., UI layout XMLs in Android) and their program code to detect sensitive user inputs and identify their associated variables in the app code as sensitive information sources. We built a system that detects privacy disclosures of sensitive user inputs by combining SUPOR with off-the-shelf static taint analysis. We applied the system to 16,000 popular Android apps, and conducted a measurement study on the privacy disclosures. SUPOR achieved 97.3% precision and 97.3% recall for sensitive user input identification. SUPOR found 355 apps with privacy disclosures and the false positive rate was 8.7%. SUPOR also identified 30 vulnerable apps that disclose sensitive user inputs without protection. Our follow-up work further identified UI widgets that use sensitive icons to indicate the intentions in using users' sensitive information.

My work in mobile security was selected as one of the top ten finalists for CSAW Best Applied Security Paper Award 2015, and produced a static analysis tool that was deployed in TouchDevelop of Microsoft Research. See my papers at ASE'12USENIX Security'13ICSE'15ASEJ'15USENIX Security'15, ICSE’19.

Cyber Attack Investigation via System Monitoring

Advanced Persistent Threat (APT) attacks have caused many well-protected companies with significant financial losses. These APT attacks often infiltrate into target systems in multiple stages and span a long duration of time with a low profile, posing challenges for attack detection and investigation. In order for enterprises to counter advanced attacks, recent approaches based on ubiquitous system monitoring have emerged as an important solution for monitoring system activities and performing attack investigation. System monitoring observes system calls at the kernel level to collect system-level events about system activities. However, the massive amount of collected system monitoring data and the lack of support tools still pose challenges for efficiently incorporating expert knowledge in security analysis and attack investigation. 

To address these challenges, we developed a novel query system built on top of existing monitoring tools and databases, which is designed with novel types of optimizations to support timely attack investigation. Our system provides (1) domain-specific data model and storage for scaling the storage, (2) a domain-specific query language, Attack Investigation Query Language (AIQL) that integrates critical primitives for attack investigation, and (3) an optimized query engine based on the characteristics of the data and the semantics of the queries to efficiently schedule the query execution. We also developed a novel stream-based query system that takes as input, a realtime event feed aggregated from multiple hosts in an enterprise, and provides an anomaly query engine that queries the event feed to identify abnormal behaviors based on the specified anomalies. To facilitate the task of expressing anomalies based on expert knowledge, our system provides a domain-specific query language, SAQL, which allows analysts to express models for (1) rule-based anomalies, (2) time-series anomalies, (3) invariant-based anomalies, and (4) outlier-based anomalies. 

We also developed techniques to perform causality analysis on system monitoring data, which produces a dependency graph to illustrate the system events that cause a specific attack. Our research focuses on (1) reducing the required system monitoring data collected during runtime while preserve the precision in causality analysis, and (2) determining and filtering irrelevant dependencies in the dependency graph.

My work in attack investigation was selected as one of the top ten finalists for CSAW Best Applied Security Paper Award 2018. The security intelligence solution built by our team won first place in the Town Life and Society Innovation Category at CEATEC Award. See my papers at VLDB'16CCS'16USENIX ATC'18USENIX Security'18, CCS' 18.

Software Testing and Analysis

Software testing is one of the most widely-used techniques for improving software reliability. Although test generation is employed to automatically produce high-covering tests to reduce manual effort in testing, existing test-generation tools face problems in handling complex software in practice. To help tool users better understand what problems are faced by test generation, I developed techniques that analyze both program code under testing and structural coverage achieved by the used test-generation tools. My preliminary study identified two major types of problems faced by test generation: (1) the external-method-call problems (EMCP),  where tools fail to handle method calls to external libraries, and (2) the object-creation problems (OCP),  where tools fail to generate sequences of method calls to construct desired object states.

Based on the study, we proposed Covana that precisely identified these two types of problems by pruning irrelevant problems using the data dependencies from not-covered branch to problem candidates (i.e., external method calls for EMCPs and objects for OCPs). By looking into the reported problems,  developers can provide guidance to help the tools address the problems, such as mock objects to address EMCPs and factory methods to address OCPs.

As complex software can have many problems for the residual coverage, users may select only the important problems to provide guidance due to limited resources. Thus, we further developed techniques to provide an economic analysis that can effectively estimate and quantify the benefit and cost for addressing a given problem faced by the tools. I also conducted characteristic studies on how input-dependent loops compromise the effectiveness of test-generation tools based on symbolic execution.

My work in software testing received ICSE SRC Best Project Representing an Innovative Use of Microsoft Technology at ACM SRC Grand Final 2012. See my papers at ICSE'2011ASE'13ICSE'16 EduDSN'18.


Automatic Bug Detection

Bug detection is an important type of automatic techniques that use bug patterns to detect bugs. My techniques focus on performance bugs and reliability bugs.

Performance Bugs. As a type of widespread performance problems, workload-dependent performance bottle- necks (WDPBs) in responsive actions, cause software hangs (i.e., unresponsiveness of software applications).

I proposed an approach, ∆INFER, which infers complexity models of workload-dependent loops to identify WDPBs by analyzing execution profiles on different workloads. Our evaluations on two open-source projects (7Zip and Notepad++) showed that ∆INFER inferred complexity models with a high prediction accuracy and effectively identified 10 performance bugs, where 8 of them are new bugs.

Reliability Bugs. As the inherent complexity of C++ language complicates static analysis, I proposed an abstract representation called ARC++ for modeling C++ objects as well as containers and smart pointers introduced by standard libraries.

The ARC++ representation provides the basis for a novel static analysis, called lifetime dependency analysis, to support typestate checking for C++ objects. We have developed an automatic framework to generate ARC++ representations from C++ programs and evaluated our novel static analysis on 20 open-source projects that consist of more than 2 million lines of code. The results show that our analysis detects about 91 real bugs and the whole analysis for all the projects finishes in a reasonable time, 2104.40s.

See my papers at bug detection (ISSTA'13ISSTA'14).


Textual Software Artifact Analysis 

Software development practices produce various Natural Language (NL) artifacts, which describe the expected behavior of the software. 

In practice, Access Control Policy (ACP), which includes a set of rules specifying which principals (such as users or processes) have access to which resources, is often buried in requirement documents, an important type of NL software artifacts. These documents could be large in size, and thus it is very tedious and error-prone to manually extract ACPs from these documents.

To address this important problem, we proposed an approach, TEXT2POLICY, that analyzes NL software documents, with focus on use cases, to automatically extract ACPs. TEXT2POLICY leverages NLP techniques to annotate the sentences in the documents with both syntactic and semantic meanings, and composes a set of semantic patterns to identify and model sentences that describe ACPs. The evaluations on 927 use-case sentences showed that TEXT2POLICY effectively identified 142 ACP sentences with the precision of 88.7% and the recall of 89.4%, extracts ACP rules with the accuracy of 86.3%. Our follow-up work extended this approach with machine learning techniques to improve accuracy and evaluated the approach on documents from more domains.

Application Programming Interface (API) documents are a typical way of describing legal usage of reusable software libraries, thus facilitating software reuse. However, even with such documents, developers often overlook some documents and build software systems that are inconsistent with the legal usage of those libraries. Existing software verification tools require formal specifications (such as code contracts), and therefore cannot directly verify the legal usage described in natural language text in API documents against code using that library. However, in practice, most libraries do not come with formal specifications, thus hindering tool- based verification.

To address this issue, we propose a novel approach to infer formal specifications from natural language text of API documents. Our evaluation results show that our approach achieves an average of 92% precision and 93% recall in identifying sentences that describe code contracts from more than 2500 sentences of API documents. Furthermore, our results show that our approach has an average 83% accuracy in inferring specifications from over 1600 sentences describing code contracts.

See my papers on access control policy extraction (FSE'12ACSAC'14) and precondition extraction from API documents (ICSE'12).