Research Overview

My research focuses on AI/LLM-driven software engineering and security, particularly leveraging large language models to enhance program analysis, vulnerability detection, and mobile ecosystem security.

Theme 1: LLM-driven Automated Software Engineering with Reliability Guarantees

This research theme aims to establish the foundations of reliable LLM-driven automated software engineering. It explores how large language models can reason about program semantics, support critical software engineering tasks with correctness guarantees, and be systematically evaluated in terms of effectiveness, robustness, and efficiency. By addressing challenges in reasoning, reliability, and evaluation, this work seeks to enable trustworthy integration of LLMs into real-world software development pipelines.

Program Reasoning Capability

Developing LLM-based techniques for precise semantic understanding of programs, enabling accurate reasoning over complex software systems.

    Focus areas:
  • Program Analysis Reasoning
  • Cross-representation/layer Semantic Reasoning
  • Representative papers:
  • Artemis: LLM-Assisted inter-procedural path-sensitive taint analysis (OOPSLA 2025, CCS 2023) Top
  • LLM-CompDroid: Repairing configuration compatibility bugs (TOSEM 2025)
  • KEENHash: Large-scale binary code similarity analysis (ISSTA 2025) Top

Reliability in SE Tasks

Designing LLM-driven approaches to automate core software engineering tasks with a focus on correctness, robustness, and practical effectiveness.

    Focus areas:
  • Quality Assessment of LLM-generated Code
  • Reliable Test Generation and Validation
  • Automated Debugging and Program Repair
  • Representative papers:
  • Test augmentation (OOPSLA 2026) Top
  • Coverage goal selection (TSE 2024)
  • Low Code Programming using traditional vs LLM support (JSS 2025)
  • ChatGPT vs SBST (TSE 2024)
  • Unearthing Gas-Wasting Code Smells in Smart Contracts (TSE 2024)
  • Assessing the Quality of Code Generation by ChatGPT (TSE 2024)

Evaluation and Efficiency

Developing principled frameworks to systematically evaluate LLM-based software engineering techniques in terms of effectiveness, reliability, and computational efficiency.

    Focus area:
  • Benchmark Design
  • Evaluation Metrics
  • Cost Aware Inference
  • Carbon Footprint

Theme 2: Mobile Security and Android Ecosystem Analysis

Understanding security, privacy risks, and malicious behaviors in large-scale mobile ecosystems through systematic analysis of applications, system mechanisms, and software supply chains.

Apps & Android OS Security

    Representative papers:
  • Unauthorized encrypted private data transmission (ICSE 2026) Top
  • Mobile Sharing Service Abuse (WWW 2022) Top
  • App Link Attack (FSE 2020) Top
  • Resource Race Attack (SANER 2020)
  • Diehard Android Apps (ASE 2020) Top

Malware Detection and Adversarial Analysis

    Representative papers:
  • Fine-grained malicious component detection (ASE 2023) Top
  • Adversarial attacks on deep learning apps (JSEP 2023)

App Ecosystem and Supply Chain Security

    Representative papers:
  • Android app bundle analysis (TSE 2025)
  • Third-party library and dependency analysis (ASE 2019,TSE 2021)
  • Repackaged Apps Detection (SANER 2019)
  • App Debloat (TSE 2022)

Theme 3: Empirical Software Engineering and User-Centric Analysis

Conducting large-scale empirical studies to understand software quality, developer behavior, and user feedback.

Bug Analysis and Software Quality

    Representative papers:
  • Bug characterization in Jupyter systems (EASE 2025)
  • Defect prediction and software quality studies (SCP 2025, IJSEKE 2023, ICPADS 2021, WCMC 2021, TReli 2021, SAC 2021, QRS 2020, IST 2020, QRS 2019, ISSRE 2019, JCST 2019, JSS 2019a, JSS 2019b, IST 2018, ICPC 2018)

User Review and Feedback Mining

    Representative papers:
  • User-review-based bug localization (TSE 2022)
  • Feedback Analysis in SPL Forked Developments (SPLC 2025)