Home

Short Bio

I am an associate professor in State Key Laboratory of Processors (SKLP), Institute of Computing Technology (ICT), Chinese Academy of Sciences (CAS). I got my B.Eng degree and M.Eng degree from Harbin Institute of Technology, and Ph.D degree from The University of Hong Kong advised by Prof. Hayden So in 2016. I worked as a research fellow in National University of Singapore from 2016 to 2018, and thereafter, I joined ICT as an associate professor. My current research interest focuses on domain specific architecture and system, LLM for Chip Design, LLM-based code generation. I am a senior member of IEEE and CCF, and a member of ACM. Please check my CV for more information.

Vancancies

I am looking for self-motivated master/intern students for LLM-based intelligent chip designs. It includes various design tasks such as RISC-V SoC design and verification, RISC-V ASIP design automation, domain-specific accelerator generation, Low-power design and optimization. Students with RISC-V processor design and LLM experience are highly preferred. It is possible to work fully remotely. Check the topics in the following list and contact me if you are interested.

  • LLM-based SoC/DSA/ASIP design
  • LLM-based high-level synthesis
  • LLM-based ASIC design, verification, and debugging
  • LLM-based EDA parallelization

News

  • [Feb 2026] Our work on mixed precision neural network processing on MCUs is accepted by ACM TECS’26. It outperforms SOTA neural network processing frameworks substantially and is now open sourced on github. Welcome to try it.
  • [Feb 2026] Our work on input adaptive soft error protection is accepted by TCAD’26.
  • [Jan 2026] Our work that analyzes errors in LLM-based code generation is accepted by TCAD’26. We find many interesting reasons for the errors that may help to imporve future LLMs for verilog code generation.
  • [Jan 2026] Our work that investigates small language models for efficient CUDA code generation through an LLM-generated reasoning graph has been accepted by ICLR’26.
  • [Jun 2025] Our recent work on DRAM-PIM based ANNS Engine is accepeted by SC’25.
  • [Mar 2025] Our DSL for FPGA-based graph processing accelerator generation is accepeted by LCTES’25.
  • [Mar 2025] Our FrontOrder approach is accepted by ICDE’25 and ApproxPilot work is accepted by ISEDA’25. All of them are open sourced, feel free to try them.
  • [Mar 2025] We got the third place in the PPoPP’25 FastCode Programming Challenge for the BFS track and achieved the highest throughput on social network graphs.
  • [Dec 2024] Our TaijiGraph, a computational storage (CSD) based out-of-core graph processing system, is accepted by IPDPS’25. It leverages CSDs to address the I/O bottleneck of out-of-core graph processing systems from the perspective of both high-level data layout, task scheduling and low-level I/O coalescing.
  • [Nov 2024] Two papers about DSA are accepted by HPCA’25.
  • [Oct 2024] Start to serve as an associate editor of IEEE Transactions on Emerging Topics in Computing.
  • [Oct 2024] Our work on CIM-based LLM acceleration collaborated with Dr. Luo is accepted by IEEE TPAMI.
  • [Jun 2024] HLSPilot: LLM-based High-level Synthesis is accepted by ICCAD’24. This work presents an LLM-based high-level synthesis design framework for typical CPU-FPGA architecture. Particularly, it leverages LLM and RAG techniques to generate HLS-based accelerator design for arbitrary sequential C/C++ code, which is also a key component for our LLM-driven SoC generation framework.
  • [May 2024] Our work on computational storage got Huawei OlympusMons Awards in 2024.
  • [Mar 2024] A multi-resolution fault injection framework for deep learning is accepted by TVLSI’24. It is well documented and validated sufficiently, and provides many convenient evaluation features that are typically required in fault-tolerant deep learning study. Check it on github.
  • [Nov 2023] A HW/SW co-design framework for mixed-precision neural network acceleration on FPGAs (DeepBurning-MixQ) is accepted by ICCAD’23. We are adding LLM support to enable more intelligent code generation. The code is open sourced on github and welcome to try it.
  • [Jun 2023] Congratulations! Haitong Huang, Erjing Luo, and Guoyu Li got the Second Place in DAC’23 SDC.
  • [Jan 2023] Our work “EnGN: A High-Throughput and Energy-Efficient Accelerator for Large Graph Neural Networks” won the Best Paper Award of TC’21.
  • [Jul 2022] Haitong Huang, Erjing Luo, and Cangyuan Li in my group got the Third Place in DAC’22 SDC.
  • [Jun 2020] Shengwen Liang and Rick Lee won the Third Prize (FPGA Track) of the 2020 IEEE Low-Power Computer Vision Challenge (LPCVC).

Fundings

  • HW/SW platform for intelligent cross-layer design and optimization of processors, 11 000 000¥, xdb(2024-2028)
  • Natural weather disturbed image generation for remote sensing, 500 000¥, XXX(2023-2024)
  • AI assisted DSA design automation, 300 000¥, SKLP(2023-2024)
  • Automatic Cross-Layer DSA Design and Optimization, 1 600 000¥, National Key Research and Development Program of China(2023-2025)
  • Intelligent In-Storage Big Data Processing System, 1 300 000¥, 1XX(2023-2024)
  • Fault-tolerant Deep Learning Toolchain for COTS Devices, 300 000¥, XXX(2023)
  • Elastic Fault-tolerant Deep Learning Processor Design, 570 000¥, NSFC(2022-2025)
  • Customized Energy-efficient Graph Processing Acceleration on FPGAs, 300 000¥, NSFC(2020-2022)
  • Fault-tolerant Deep Learning Processor Design Automation, 300 000¥, SKLCA(2021-2022)

Talks

  • Investigating Self-Test, Self-Diagnosis, and Self-Recovery (3S) Techniques for Wafer-Scale AI Chips, SEMICON China, 2025
  • 基于大语言模型的全自动CPU-FPGA异构硬件加速, 高性能异构计算与人工智能优化论坛,CCF-HPC, 2024
  • 异构硬件加速器设计、优化以及编程, 高性能异构计算与人工智能优化论坛,CCF-HPC,2023
  • 基于计算存储器的图处理系统设计, 面向复杂图计算应用的新型高能效体系结构论坛,CNCC,2022
  • 容错深度学习处理器微结构设计,集成电路设计与自动化学术会议(CCF-DAC), 2021
  • DeepBurning2.0: An Automatic End-to-end Neural Network Acceleration System on FPGAs, 华南理工大学,软件学院, 2019

Services

  • Session co-chair of DAC’25
  • Session chair of ISEDA’25
  • Member of the Review Board, IEEE Transactions on Parallel and Distributed Systems (TPDS)
  • Associate Editor, IEEE Transactions on Emerging Topics in Computing (TETC)
  • TPC for DFTS’22, FPT’22, ITC’22
  • TPC for ATS’23, FPT’23, DFTS’23, ITC’23, NeurIPS’23
  • TPC for DFTS’24, FPT’24, ICLR’24, FCCM’24, ICML’24
  • TPC for FPT’25, AAAI’25, ICLR’25, DFTS’25, DAC’25
  • Review for TC, TPDS, TCAD. TVLSI, TETC, JETC, JSA, TNNLS

Graduated Students

  • Chengwei Xiong (Master student, 字节跳动)
  • Qing Zhang (Master student, 腾讯)
  • Wenyu Zhang (Intern from Harbin Institute of Technology, 兆易创新)
  • Zesong Jiang (Intern from University of Science and Technology, PhD candidate in Arizona State University)
  • Xinghua Xue (Ph.D, Hangzhou Institute of Advanced Study, UCAS)
  • Zheng Feng (Master student, Huawei, AI Compilation)
  • Guoyu Li (Master student, Baidu, Kunlun Chip)
  • Haitong Huang (Master student, Tencent, AI Lab)
  • Xuejian Sun (Intern from Harbin Institute of Technology, Master student in Fudan University)
  • Erjing Luo (Intern from Beijing Institute of Technology, Mphil in University of Alberta)
  • Miaoxin Wang (Intern from Harbin Institute of Technology, Master student in Nanjing University)
  • Jinming Zhao (Intern from Wuhan University, PhD candidate in the University of Hong Kong)
  • Zhiyu Zhu (Intern from Harbin Institute of Technology, CETC 47)
  • Cheng Chu (Intern from Hefei University of Technology, PhD candidate in Indiana University Bloomington)
  • Meng He (Intern from Hefei University of Technology, 字节跳动)
  • Ziyang Zhu (Intern from Hefei University of Technology, 紫光展锐)
  • Qiang Zhang (Master Student, 快手)
  • Kouzi Xing (Intern from Hefei University of Technology, 商汤科技)
  • Li Li (Intern from Hefei University of Technology, 京东)
  • Kexin Chu (Intern from Hefei University of Technology, 百度)
  • Kaijie Tu (Intern from Hefei University of Technology, 计算所)
  • Chang Shi (Master student, Alibaba)
  • Peibin Wu (Master student, MSRA)