I am a direct Ph.D. student in Electronic Information at Harbin Institute of Technology (HIT), advised by Prof. Tonghua Su. My research focuses on multimodal generation, digital human video generation, and talking-head / portrait animation.
.
Research Interests:
- Digital human video generation
- Talking-head generation and audio-driven portrait animation
- DiT / diffusion models for video generation
- Controllable face video generation
- Large-scale face video dataset construction
- Agent-based AIGC systems
Education
- Sep. 2024 – Jun. 2029, Harbin Institute of Technology, Ph.D. Student in Electronic Information, Advisor: Prof. Tonghua Su
- Sep. 2020 – Jun. 2024, Central South University, B.Eng. in Software Engineering
Selected Publications

- DiTalker: A Unified DiT-based Framework for High-Quality and Speaking Styles Controllable Portrait Animation He Feng, et al. CVIU, 2026. CCF-B · First Author [Project Page] [Paper] [Code]

- DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation ICCV, 2025. CCF-A · Second Author [Project Page] [Paper] [Dataset]

- DH-OmniFace: A Large-Scale and Multi-Attribute Dataset Suite for Controllable Face Video Generation He Feng, et al. IEEE TMM. CCF-A / JCR Q1 · First Author · Major Revision [Project Page] [Paper] [Dataset]

- CogPortrait: Fine-Grained Eye-Region Control in Portrait Animation via Hierarchical Agent Planning He Feng, et al. ACM MM. CCF-A · First Author · Under Review [Project Page] [Paper] [Code]
- Talking Face Video Generation Survey, TPAMI. Contributor · Under Review
Patents
- Two accepted internal invention patents on talking-face generation and digital human foundation models.
- Method and System for Automatic Generation of Multilingual Educational Videos Based on Multi-Agent Collaboration — Accepted.
- Multi-language MOOC Teaching Video Automatic Generation System Based on Multi-Agent Collaboration — Software Copyright.
Industry Experience
Algorithm Intern · Digital Human Group, Li Auto
Jul. 2024 – Jan. 2025
- Reproduced and evaluated 20+ talking-face generation models (SadTalker, AniPortrait, EMO, MuseV, VASA-1, MODA).
- Developed and evaluated OpenSora-Plan-based talking-face foundation models at 1B/3B/5B scales.
- Built a 1,000+ hour talking-face video database with MLLM-generated captions and multimodal annotations.
- Supported DH-FaceVid-1K and DH-OmniFace dataset construction.
Projects
Digital Human Generation Foundation Model R&D
Li Auto-University Joint Program · Jun. 2024 – Jan. 2025 Technical route design, dataset investigation, model reproduction, and experimental validation for a 2D digital human video generation foundation model.
Agent-Based Multilingual MOOC Teaching Video Generation System
Jan. 2026 – Mar. 2026 Multi-agent course video generation pipeline supporting PPT upload or course-topic input, with automatic course structuring, translation verification, voice cloning, digital human generation, video composition, and subtitle export.
Academic Service
- Organizer, ACM Multimedia Asia 2025 Grand Challenge: Multimodal Multiethnic Talking-Head Video Generation.
- Reviewer for TMM, Neurocomputing, Pattern Recognition, Knowledge-Based Systems, Neural Networks, ACM MM, AAAI, NeurIPS, CVPR, ICLR, ICML.
Skills
Programming: Python, PyTorch, Linux, Git, Docker, LaTeX Deep Learning: training, inference, hyperparameter tuning, experiment management Languages: Chinese, English