I am a direct Ph.D. student in Electronic Information at Harbin Institute of Technology (HIT), advised by Prof. Tonghua Su. My research focuses on multimodal generation, digital human video generation, and talking-head / portrait animation. .

Research Interests:

  • Digital human video generation
  • Talking-head generation and audio-driven portrait animation
  • DiT / diffusion models for video generation
  • Controllable face video generation
  • Large-scale face video dataset construction
  • Agent-based AIGC systems

Education

  • Sep. 2024 – Jun. 2029, Harbin Institute of Technology, Ph.D. Student in Electronic Information, Advisor: Prof. Tonghua Su
  • Sep. 2020 – Jun. 2024, Central South University, B.Eng. in Software Engineering

Selected Publications


CVIU 2026
sym
  • DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation He Feng, et al. CVIU, 2026. CCF-B · First Author [Project Page] [Paper] [Code]
ICCV 2025
sym
  • DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation ICCV, 2025. CCF-A · Second Author [Project Page] [Paper] [Dataset]
TMM
sym
  • DH-OmniFace: A Large-Scale and Multi-Attribute Dataset Suite for Controllable Face Video Generation He Feng, et al. IEEE TMM. CCF-A / JCR Q1 · First Author · Major Revision [Project Page] [Paper] [Dataset]
ACM MM
sym
  • CogPortrait: Fine-Grained Eye-Region Control in Portrait Animation via Hierarchical Agent Planning He Feng, et al. ACM MM. CCF-A · First Author · Under Review [Project Page] [Paper] [Code]
  • Talking Face Video Generation Survey, TPAMI. Contributor · Under Review

Patents

  • Two accepted internal invention patents on talking-face generation and digital human foundation models.
  • Method and System for Automatic Generation of Multilingual Educational Videos Based on Multi-Agent Collaboration — Accepted.
  • Multi-language MOOC Teaching Video Automatic Generation System Based on Multi-Agent Collaboration — Software Copyright.

Industry Experience

Algorithm Intern · Digital Human Group, Li Auto

Jul. 2024 – Jan. 2025

  • Reproduced and evaluated 20+ talking-face generation models (SadTalker, AniPortrait, EMO, MuseV, VASA-1, MODA).
  • Developed and evaluated OpenSora-Plan-based talking-face foundation models at 1B/3B/5B scales.
  • Built a 1,000+ hour talking-face video database with MLLM-generated captions and multimodal annotations.
  • Supported DH-FaceVid-1K and DH-OmniFace dataset construction.

Projects

Digital Human Generation Foundation Model R&D

Li Auto-University Joint Program · Jun. 2024 – Jan. 2025 Technical route design, dataset investigation, model reproduction, and experimental validation for a 2D digital human video generation foundation model.

Agent-Based Multilingual MOOC Teaching Video Generation System

Jan. 2026 – Mar. 2026 Multi-agent course video generation pipeline supporting PPT upload or course-topic input, with automatic course structuring, translation verification, voice cloning, digital human generation, video composition, and subtitle export.

Academic Service

  • Organizer, ACM Multimedia Asia 2025 Grand Challenge: Multimodal Multiethnic Talking-Head Video Generation.
  • Reviewer for TMM, Neurocomputing, Pattern Recognition, Knowledge-Based Systems, Neural Networks, ACM MM, AAAI, NeurIPS, CVPR, ICLR, ICML.

Skills

Programming: Python, PyTorch, Linux, Git, Docker, LaTeX Deep Learning: training, inference, hyperparameter tuning, experiment management Languages: Chinese, English

Visitor Map