Xuyang Cao

曹旭阳

博士，算法工程师，党员

京东, 北京, 中国
北京市通州区科技十一街18号
京东总部C座

Email:

newxuyangcao [at] gmail [dot] com

个人简介

我目前是一名算法工程师，就职于京东集团-京东健康，主要研究领域为AIGC和多模态方向。近1年时间主要研究多模态深度学习算法，其中以语音驱动的低成本数字人/宠物生成技术探索和应用为主。

我从2016年开始从事计算机视觉、图像处理和模式识别相关研究工作。本硕博毕业于北京交通大学，博士期间主要研究方向为语义分割和半监督学习，导师为陈后金教授和李艳凤教授，同时我在读博期间和彭亚辉教授具有紧密合作。

教育背景

博士, 电子与信息工程学院, 北京交通大学 \| 国家公派留学奖学金，年度学科top 3%博士论文	2017-2022
硕士, 电子与信息工程学院, 北京交通大学 \| 硕博连读硕士阶段	2016-2017
学士, 电子与信息工程学院, 北京交通大学 \| 大学生创新项目国家级，学生会副主席，优秀团员	2012-2016

工作经历

算法工程师, 京东集团-京东健康 |

博士管培生, 2022年度经开区亦城优秀人才, 2023年度最佳员工-燃燃之星, 2024年度最佳员工-技术之星

2022.07-至今

项目经历

数字人+LLM技术和应用创新项目		2024.01-2025.01
以核心研发角色推动京东健康数字人技术从0到1建设，探索数字人技术在泛健康领域的技术和应用创新。负责从技术调研、数据集收集&构建、到模型设计&训练、业务应用的全流程设计。积累了2.5T+数据，研发了3类数字人生成模型，包括基于GAN的双塔模型、基于diffusion的端到端生成模型、基于diffusion和解耦人脸表征的实时生成模型；设计了2套数字人生成框架，分别用于离线和在线场景。在医疗知识科普、对话、宠物换装等5+个场景成功应用；年度商业化收入250万+，累计服务用户23万+。完成2项新技术论文和代码开源 [JoyVASA , JoyHallo]，Github Stars累计1000+。
脑科学项目-脑机接口(BCI)-基于脑影像的脑疾病诊断		2023.02-2024.01
作为核心研究人员推动京东健康脑科学研究从0到1建设，包括基于BCI的卒中肢体运功障碍康复研究，和基于脑影像的阿尔兹海默病（AD）和双向情感障碍（BD）疾病识别。主导私有化脑电和脑影像数据采集、脑电数据采集平台搭建、神经解码算法设计、和对外合作等。完成4500+例数据的私有和公开脑影像数据集、超40小时的运动想象脑电数据，完成5篇论文和2篇专利的撰写/投稿，脑电识别和脑影像等多项研究达到行业SOTA水平；以参与单位成功申报科技部国家重点研发计划1项（老年人虚拟现实认知康复训练系统）；推动京东健康成为脑机接口产业联盟成员单位。
基于深度学习的乳腺肿块（超声）和肺部肿块（MRI）语义分割研究		2018.09-2022.04
研究基于深度学习的语义分割算法，应用于乳腺超声图像及肺部MRI图像。深入研究语义分割算法，如全监督学习、半监督学习、神经网络架构搜索等。提出了轻量级膨胀密集连接网络用于三维乳腺肿瘤分割。相比经典分割网络，性能提高了5%，而网络参数减少了超过20倍。设计了不确定性感知的时间集成模型用于半监督分割。仅使用1.1%标注数据的情况下，半监督方法的性能达到了监督分割方法的94.4%。提出了一种NAS为基础的三维医学图像分割框架，并比最先进的手动设计分割网络提升了4.2%的表现。以第一作者发表了4篇高水平论文，包括2篇top期刊，总影响因子30+。
基于计算机视觉的高速列车齿轮缺陷检测		2018.09-2022.04
作为核心开发人员，基于计算机视觉技术，研发动车组联轴节齿轮表面点蚀坑自动检测及定量分析解决方案。通过获取联轴节齿面图像，结合图像识别算法对齿轮点蚀情况进行统计、报表，并对超限齿轮发出警报。负责齿面图像磨损区域检测及分割算法的研发、windows平台软件系统的设计与实现。 3分钟内可完成内外齿轮104个齿面1mm以上的缺陷检测；发表了相关论文及专利。
基于计算机视觉的接触网系统的几何参数测量		2018.09-2022.04
利用尺度因子和帧差法提供接触网系统几何参数测量方案。负责前期算法仿真并参与硬件结构设计工作。完成2项相关专利。

发表论文

G. Wang, X. Cao, S. An, F. Fan, C. Zhang, J. Wang, F. Yu, Z. Wang. Multi-Dimension-Embedding-Aware Modality Fusion Transformer for Psychiatric Disorder Classification, ICIGP, 2025. [paper]
X. Cao, G Wang, S Shi, J Zhao, Y Yao, J Fei, M Gao. JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation. Arxiv, 2024. [paper] [code][project]
S. Shi, X. Cao, J. Zhao, G. Wang. JoyHallo: Digital human model for Mandarin. Arxiv, 2024. [technical report] [code][project]
Z. Gao, Y. Guo, G. Wang, X. Chen, X. Cao, C. Zhang, S. An, F. Xu. Robust deep learning from incomplete annotation for accurate lung nodule detection. Computers in Biology and Medicine, 2024, 173:108361. [paper]
X. Cao, H. Chen, Y. Li, Y. Peng, Y. Zhou, L. Cheng, T. Liu, D. Shen. Auto-DenseUnet: Searchable Neural Network Architecture for Tumor Segmentation in 3D Automated Breast Ultrasound. Medical Image Analysis, 2022, 82: 102589. [paper]
Y. Zhou, H. Chen, Y. Li, X. Cao, S. Wang, D. Shen. Cross-Model Attention-Guided Tumor Segmentation for 3D Automated Breast Ultrasound (ABUS) Images. IEEE Journal of Biomedical and Health Informatics, 2022, 26(1): 301-311. [paper]
X. Cao, H. Chen, Y. Li, Y. Peng, S. Wang, L. Cheng. Uncertainty Aware Temporal-Ensembling Model for Semi-supervised ABUS Mass Segmentation. IEEE Transactions on Medical Imaging, 2021, 40(1):431-443. [paper]
X. Cao, H. Chen, Y. Li, Y. Peng, S. Wang, L. Cheng. Dilated Densely Connected U-Net with Uncertainty Focus Loss for 3D ABUS Mass Segmentation. Computer Methods and Programs in Biomedicine, 2021, 209: 106313. [paper]
J. Li, H. Chen, Y. Li, Y. Peng, N. Cai, X. Cao. AMRSegNet: Adaptive Modality Recalibration Network for Lung Tumor Segmentation on Multi-Modal MR Images. Multimedia Tools and Applications, 2021, 80: 33779–33797. [paper]
X. Cao, H. Chen, Y. Li, Y. Peng, Y. Zhou, L. Cheng. Boundary Loss with Non-Euclidean Distance Constraint for ABUS Mass Segmentation. 2020 CISP-BMEI, Chengdu, China, 2020, pp: 645-650. [paper]
Y. Peng, X. Cao, H. Chen, Y. Li, J. Li, X. Wang. Preliminary Study on Noise and Artifact Reduction in Phase-Contrast CT Image of Tristructural-Isotropic Coated Fuel Particle (in Chinese). Acta Electronica Sinica, 2019, 47(2): 448-453. [paper]
C. Wang, F. Li, Y. Li, H. Chen and X. Cao. A Defect Status Detecting Method for External Gear in Railway. 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), Chongqing, 2018, pp: 123-127. [paper]
Y. Li, X. Cao, H. Chen, L. Zhang, N. Yang. Defect Status Detection Method Based on Machine Vision for External Gear in Train (in Chinese). Journal of The China Railway Society, 2018, 40(12):33-41. [paper]
J. Wei, X. Cao, H. chen, Y. li. Research on benign and malignant masses classification in mammogram (in Chinese). Journal of Beijing Jiaotong University, 2017, 41(5): 73-. [paper]

专利&书籍

Y. Peng, W. Jiang, Z. Zhu, H. Yang, X. Cao, H. Chen. A method of Measuring the Geometric Parameters of an Overhead Line System by using Geometric Magnification and Monocular Vision. China, CN201810182553.1, 2018-11-13. [Link]

Y. Peng, C. Zhang, B. Zheng, J. Yin, X. Cao, H. Chen. A method and a Device for Measuring the Geometric Parameters of an Overhead Line System by using Scale Factors and Frame Differences. China, CN201710464403.5, 2017-06-19. [Link]

Y. Zhou, X. Cao. Neural Networks with TensorFlow 2, Apress, 2020. [译] [Link]

个人资质

研究兴趣：		AIGC，多模态，图像处理，模式识别，计算机视觉，人工智能
语言技能：		中文：母语 \| 英文：熟练（听说读写），雅思7分
计算机技能：		编程语言：Python, C++ \| 计算机视觉：Pytorch，OpenCV \| 其他：Linux，Vim，LaTex
兴趣爱好：		个人博客累计阅读量33w+，CSDN博客专家 \| 平时喜欢阅读，跑步，旅游

Xuyang Cao

Algorithm Engineer, Ph.D.

JD.com, Beijing, China
No. 18 Kechuang 11 Street, Tongzhou District, Beijing
#C JD Building

Email:

newxuyangcao [at] gmail [dot] com

Biography

Dr. Xuyang Cao is an algorithm engineer at JD Health Inc. in JD.com, specializing in AIGC and multimodal research. Over the past year, his primary focuses has been on multimodal deep learning algorithms, with an emphasis on audio-driven low-cost talking face generation technology exploration and application.

Since 2016, He has been engaged in research in computer vision, image processing, and pattern recognition. He completed his undergraduate, master’s, and Ph.D. degrees at Beijing Jiaotong University. During his Ph.D., his research focused on semantic segmentation and semi-supervised learning, under the supervision of Professors Houjin Chen and Yanfen Li. Additionally, He closely collaborated with ProfessorYahui Peng during his doctoral studies.

Education

Ph.D, School of Electronic and Information Engineering, Beijing Jiaotong University	2017-2022
Master Candidate, School of Electronic and Information Engineering, Beijing Jiaotong University	2016-2017
Bachelor of Engineering, School of Electronic and Information Engineering, Beijing Jiaotong University	2012-2016

Projects

Audio Driven Talking Face + LLM Technology and Application Innovation		2024.01-2025.01
Led the development of JD Health's digital human technology, driving it from concept to deployment. Oversaw the entire project lifecycle, including technical research, dataset creation, model design, and business application. Developed three types of digital human generation models: a GAN-based dual-tower model, a diffusion-based end-to-end model, and a real-time model based on decoupled facial representations. Successfully applied these models across more than five domains, including medical knowledge dissemination, interactive dialogue, and virtual pet customization, generating over 2.5 million in annual commercial revenue and serving more than 230,000 users. Published two papers and open-sourced code, garnering over 1,000 stars on GitHub. [JoyVASA , JoyHallo]
The Brain Science Project - Brain-Computer Interface (BCI) - Brain Disease Diagnosis Based on Medical Imaging		2023.02-2024.01
Served as a key researcher in driving JD Health’s brain science research from concept to execution, focusing on stroke rehabilitation using Brain-Computer Interface (BCI) technology and brain imaging-based diagnosis of Alzheimer’s (AD) and Bipolar Disorder (BD). Led the development of a proprietary EEG and brain imaging data collection platform, designed neural decoding algorithms, and managed external collaborations. Processed over 4,500 data samples, including private and public brain imaging datasets and more than 40 hours of motor imagery EEG data. Published five research papers and filed two patents, with multiple studies in EEG recognition and brain imaging achieving state-of-art results. Contributed to the successful application for a national key R&D project (Virtual Reality Cognitive Rehabilitation Training System for the Elderly) under the Ministry of Science and Technology, and helped JD Health become a member of the Brain-Computer Interface Industry Alliance.
Research on Deep Learning based Breast Mass (Ultrasound) and Lung Mass (MRI) Segmentation		2018.09-2022.04
Research on deep learning based semantic segmentation algorithms on breast ultrasound images as well as lung MRI images. Dig deep into semantic segmentation algorithms, such as fully supervised learning, semi-supervised learning, neural network architecture search, etc. Proposed lightweight dilated densely connected network for 3D breast tumor segmentation. The performance improved over 5% compared with classical segmentation networks, while network parameters were over 20 times smaller than classical networks. Designed an uncertainty-aware temporal-ensembling model for semi-supervised segmentation. The performance of semi-supervised method is able to achieve 94.4% of that in supervised segmentation with only 1.1% labeled data. Suggested an NAS-based 3D medical image segmentation framework, and achieved an improvement of 4.2% compared with the state-of-art human-designed segmentation network. Related journal and conference papers have been published, total impact factor 30+.
High Speed Train Gear Defect Detection Based on Computer Vision		2018.09-2022.04
As a core developer, designed an automatic detection and quantitative analysis solution for surface pitting of coupling gear components in train sets, based on computer vision technology. Collected coupling gear surface images and applied image recognition algorithms to generate statistics, reports, and alerts for gears exceeding the defect threshold. Led the development of algorithms for detecting and segmenting gear surface wear areas, as well as the design and implementation of the software system on the Windows platform. Completed defect detection of 104 gear surfaces (internal and external) with defects larger than 1mm within 3 minutes; published related papers and patents.
Geometric Parameters Measurement of an Overhead Line System		2018.09-2022.04
Provide solution for measuring the geometric parameters of an overhead line system using scale factors and frame differences. I was responsible for the previous algorithm simulation and participated in the hardware structure design work. Two related patents have been granted.

Publications

G. Wang, X. Cao, S. An, F. Fan, C. Zhang, J. Wang, F. Yu, Z. Wang. Multi-Dimension-Embedding-Aware Modality Fusion Transformer for Psychiatric Disorder Classification, ICIGP, 2025. [paper]
X. Cao, G Wang, S Shi, J Zhao, Y Yao, J Fei, M Gao. JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation. Arxiv, 2024. [paper] [code][project]
S. Shi, X. Cao, J. Zhao, G. Wang. JoyHallo: Digital human model for Mandarin. Arxiv, 2024. [technical report] [code][project]
Z. Gao, Y. Guo, G. Wang, X. Chen, X. Cao, C. Zhang, S. An, F. Xu. Robust deep learning from incomplete annotation for accurate lung nodule detection. Computers in Biology and Medicine, 2024, 173:108361. [paper]
X. Cao, H. Chen, Y. Li, Y. Peng, Y. Zhou, L. Cheng, T. Liu, D. Shen. Auto-DenseUnet: Searchable Neural Network Architecture for Tumor Segmentation in 3D Automated Breast Ultrasound. Medical Image Analysis, 2022, 82: 102589. [paper]
Y. Zhou, H. Chen, Y. Li, X. Cao, S. Wang, D. Shen. Cross-Model Attention-Guided Tumor Segmentation for 3D Automated Breast Ultrasound (ABUS) Images. IEEE Journal of Biomedical and Health Informatics, 2022, 26(1): 301-311. [paper]
X. Cao, H. Chen, Y. Li, Y. Peng, S. Wang, L. Cheng. Uncertainty Aware Temporal-Ensembling Model for Semi-supervised ABUS Mass Segmentation. IEEE Transactions on Medical Imaging, 2021, 40(1):431-443. [paper]
X. Cao, H. Chen, Y. Li, Y. Peng, S. Wang, L. Cheng. Dilated Densely Connected U-Net with Uncertainty Focus Loss for 3D ABUS Mass Segmentation. Computer Methods and Programs in Biomedicine, 2021, 209: 106313. [paper]
J. Li, H. Chen, Y. Li, Y. Peng, N. Cai, X. Cao. AMRSegNet: Adaptive Modality Recalibration Network for Lung Tumor Segmentation on Multi-Modal MR Images. Multimedia Tools and Applications, 2021, 80: 33779–33797. [paper]
X. Cao, H. Chen, Y. Li, Y. Peng, Y. Zhou, L. Cheng. Boundary Loss with Non-Euclidean Distance Constraint for ABUS Mass Segmentation. 2020 CISP-BMEI, Chengdu, China, 2020, pp: 645-650. [paper]
Y. Peng, X. Cao, H. Chen, Y. Li, J. Li, X. Wang. Preliminary Study on Noise and Artifact Reduction in Phase-Contrast CT Image of Tristructural-Isotropic Coated Fuel Particle (in Chinese). Acta Electronica Sinica, 2019, 47(2): 448-453. [paper]
C. Wang, F. Li, Y. Li, H. Chen and X. Cao. A Defect Status Detecting Method for External Gear in Railway. 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), Chongqing, 2018, pp: 123-127. [paper]
Y. Li, X. Cao, H. Chen, L. Zhang, N. Yang. Defect Status Detection Method Based on Machine Vision for External Gear in Train (in Chinese). Journal of The China Railway Society, 2018, 40(12):33-41. [paper]
J. Wei, X. Cao, H. chen, Y. li. Research on benign and malignant masses classification in mammogram (in Chinese). Journal of Beijing Jiaotong University, 2017, 41(5): 73-. [paper]

Patents&Translated Books

Y. Zhou, X. Cao. Neural Networks with TensorFlow 2, Apress, 2020. [translated] [Link]

Personal Qualifications

Research Interests:		AIGC, Multimodal, Image Processing, Pattern Recognition, Computer Vision, Artificial Intelligence
Language Skills:		Chinese (Monther Tongue) \| English (Proficient in Listening, Speaking, Reading, Writing), IELTS Score: 7
Computer Skills:		Programming Languages: Python, C++ \| Computer Vision: Pytorch, OpenCV \| Others: Linux, Vim, LaTex
Hobbies:		Personal blog with over 330,000 views, CSDN Blog Expert \| Enjoys reading, running, and traveling

曹 旭阳

博士，算法工程师，党员

个人简介

教育背景

工作经历

项目经历

发表论文

专利&书籍

个人资质

Xuyang Cao

Algorithm Engineer, Ph.D.

Biography

Education

Projects

Publications

Patents&Translated Books

Personal Qualifications

曹旭阳