Hao Tang

Carnegie Mellon University
Office: 5000 Forbes Av, Pittsburgh, PA 15213, USA🇺🇸
Email: bjdxtanghao@gmail.com

Hey, thanks for stopping by! 👋

I am currently a postdoctoral fellow at CMU, USA🇺🇸. Before this, I held a postdoctoral position at ETH Zürich, Switzerland🇨🇭. My academic journey includes earning a master's degree from Peking University, China🇨🇳, and completing my Ph.D. at UoT, Italy🇮🇹. Additionally, I had the privilege of being a visiting scholar at University of Oxford, UK🇬🇧. Furthermore, I undertook a visiting internship at IIAI, UAE🇦🇪.

My research interests are AIGC, AI4Science, machine learning, and computer vision. Specifically, I focus on:

Generative AI (GANs, diffusion models) and its applications (e.g., image generation, image translation, text-to-image synthesis/editing, person image synthesis, semantic image synthesis, style transfer, video generation, graph-based generation, layout generation)
Efficient AI (e.g., pruning, distillation, quantization, NAS)
Interpretability AI
Robustness AI
Large language model (LLM)
AI for Science
Multi-modalities (e.g., audio-to-video synthesis, language-vision, point cloud)
Low-level vision (image/video restoration, super-resolution, denoising, deblurring, HDR deghosting)
High-level vision (depth estimation, segmentation, detection, recognition)
3D vision (e.g., nerf, 3D-aware image/video generation, object reconstruction/generation, 3D pose transfer)
Medical image enhancement and analysis
Human pose estimation and motion prediction

Position Openings

If this resonates with you, we are actively hiring.

For prospective collaborators, we have multiple positions for Postdoc/Ph.D./Master/Intern researchers. If you are interested in joining/visitng our lab or remotely working with us, please email me with your self-introduction, the project of interest (what is the problem you are trying to solve? and how are you trying to solve this problem (be as specific as possible)?), transcript, and CV to bjdxtanghao@gmail.com.

News

2024-04	🎉🎉🎉I received offers from MIT and Harvard University.
2024-02	We have 7 papers (Explanation for ViT + Faithfulness of ViT + Diffusion Policy for Versatile Navigation + Subject-Driven Generation [Final rating: 455] + Diffusion Model for 3D Hand Pose Estimation + Adversarial Learning for 3D Pose Transfer + Efficient Diffusion Distillation [224->235]) accepted to CVPR 2024.
2024-01	We have 1 paper (Architectural Layout Generation) accepted to TPAMI 2024.
2023-12	We have 1 paper (Sign Pose Sequence Generation) accepted to AAAI 2024.
2023-10	We have 4 papers (BEV Perception + Efficient ViT + 3D Motion Transfer + Graph Distillation) accepted to NeurIPS 2023.
2023-09	We have 1 paper (Practical Blind Image Denoising) accepted to MIR 2023.
2023-07	We have 1 paper (Semantic Image Synthesis) accepted to TPAMI 2023.
2023-06	We have 1 paper (Visible-Infrared Person Re-ID) accepted to ICCV 2023.
2023-05	We have 2 papers (Image Restoration Dataset + 3D-Aware Video Generation) accepted to CVPRW 2023 and 1 paper (3D Face Generation) accepted to JSTSP 2023.
2023-04	We have 1 paper (Speed-Aware Object Detection) accepted to ICML 2023, 2 papers (Lottery Ticket Hypothesis for ViT + Zero-shot Character Recognition) accepted to IJCAI 2023, 1 paper (3D Human Pose Estimation) accepted to PR 2023, and 1 paper (SAR Target Recognition) accepted to TGRS 2023.
2023-03	We have 6 papers (HDR Deghosting + Point Cloud Registration + Graph-Constrained House Generation + Mathematical Architecture Design + Text-to-Image Synthesis + Efficient Semantic Segmentation) accepted to CVPR 2023.
2023-02	We have 3 papers (Camouflaged Object Detection + Brain Vessel Image Segmentation + Cross-View Image Translation) accepted to ICASSP 2023 and 1 paper (Camouflaged Object Detection) accepted to TCSVT 2023.
2023-01	We have 1 paper (Semantic Image Synthesis) accepted to ICLR 2023 and 1 paper (Human Reaction Generation) accepted to TMM 2023.
2022-11	We have 4 papers (Real-Time Segmentation + Wearable Design + Efficient ViT Training + Text-Guided Image Editing) accepted to AAAI 2023, 1 paper accepted (Person Pose and Facial Image Synthesis) to IJCV 2022, 1 paper (Salient Object Detection) accepted to TIP 2022, and 1 paper (Object Detection Transformer) accepted to TCSVT 2022.
2022-10	We have 1 paper (Sinusoidal Neural Radiance Fields) accepted to BMVC 2022 and 1 paper (Guided Image-to-Image Translation) accepted to TPAMI 2022.
2022-09	We have 1 paper (Facial Expression Translation) accepted to TAFFC 2022 and 1 paper (Ship Detection) accepted to TGRS 2022.
2022-07	We have 5 papers (Real-Time SR + Video SR + Soft Token Pruning for ViT + 3D-Aware Human Synthesis + Video Semantic Segmentation) accepted to ECCV 2022, 1 paper (Gaze Correction and Animation) accepted to TIP 2022, and 1 paper (Cross-view Panorama Image Synthesis) accepted to PR 2022.
2022-06	We have 2 papers (Character Image Restoration + Character Image Denoising) accepted to ACM MM 2022.
2022-04	We have 1 paper (Real-Time Portrait Stylization) accepted to IJCAI 2022, 1 paper (Wide-Context Transformer for Semantic Segmentation) accepted to TGRS 2022, and 1 paper (Incremental Learning for Semantic Segmentation) accepted to TMM 2022.
2022-03	We have 5 papers (Text-to-Image Synthesis + 3D Human Pose Estimation + Text-Driven Image Manipulation + 3D Face Modeling + 3D Face Restoration) accepted to CVPR 2022, 1 paper (Image Generation) accepted to TPAMI 2022, and 1 paper (Cross-View Panorama Image Synthesis) accepted to TMM 2022.
2021-12	We have 2 papers (Generalized 3D Pose Transfer + Audio-Visual Speaker Tracking) accepted to AAAI 2022.
2021-11	We have 1 paper (Building Extraction in VHR Remote Sensing Images) accepted to TIP 2021.
2021-10	We have 3 papers (Cross-View Image Translation + Data-driven 3D Animation + Natural Image Matting) accepted to BMVC 2021.
2021-08	We have 1 paper (Layout-to-Image Translation) accepted to TIP 2021 and 1 paper (Unpaired Image-to-Image Translation) accepted to TNNLS 2021.
2021-07	We have 2 papers (Continuous Pixel-Wise Prediction + Unsupervised 3D Pose Transfer) accepted to ICCV 2021.
2021-06	We have 1 paper (Cross-View Exocentric to Egocentric Video Synthesis) accepted to ACM MM 2021 and 1 paper (Total Generate) accepted to TMM 2021.
2020-08	We have 1 paper (Person Image Generation) accepted to BMVC 2020, 2 papers (Semantic Image Synthesis + Unsupervised Gaze Correction and Animation) accepted to ACM MM 2020, and 1 paper (Controllable Image-to-Image Translation) accepted to TIP 2020.
2020-07	We have 1 paper (Person Image Generation) accepted to ECCV 2020.
2020-05	We have 1 paper (Deep Dictionary Learning and Coding) accepted to TNNLS 2020 and 1 paper (Semantic Segmentation of Remote Sensing Images) accepted to TGRS 2020.
2020-02	We have 1 paper (Semantic-Guided Scene Generation) accepted to CVPR 2020.
2019-07	We have 1 paper (Keypoint-Guided Image Generation) accepted to ACM MM 2019.
2019-02	We have 1 paper (Cross-View Image Translation) accepted to CVPR 2019.
2018-06	We have 1 paper (Hand Gesture-to-Gesture Translation) accepted to ACM MM 2018.
2018-02	We have 1 paper (Monocular Depth Estimation) accepted to CVPR 2018.
2016-07	We have 1 paper (Large Scale Image Retrieval) accepted to IJCAI 2016.
2015-08	We have 1 paper (Gender Classification) accepted to ACM MM 2015.

Featured Publications

(Including NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, AAAI, IJCAI, ACM MM)

^†Equal Contribution, ^*Corresponding Author(s)

Arxiv

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang*

In Arxiv, 2024

PDF Code
Arxiv

A Survey on Multimodal Wearable Sensor-based Human Action Recognition

Jianyuan Ni, Hao Tang, Syed Tousiful Haque, Yan Yan, Anne HH Ngu

In Arxiv, 2024

PDF Code
Arxiv

Physical Adversarial Attack Neets Computer Vision: A Decade Survey

Hui Wei, Hao Tang, Xuemei Jia, Zhixiang Wang, Hanxun Yu, Zhubo Li, Shin'ichi Satoh, Luc Van Gool, Zheng Wang

In Arxiv, 2024

PDF Code
Arxiv

StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion

Ming Tao, Bingkun Bao, Hao Tang, Yaowei Wang, Changsheng Xu

In Arxiv, 2024

PDF Code
Arxiv

InstructGIE: Towards Generalizable Image Editing

Zichong Meng, Changdi Yang, Jun Liu, Hao Tang*, Pu Zhao*, Yanzhi Wang*

In Arxiv, 2024

PDF Code
Arxiv

Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image Enhancement

Xiaofeng Zhang, Zishan Xu, Hao Tang, Chaochen Gu, Wei Chen, Shanying Zhu, Xinping Guan

In Arxiv, 2024

PDF Code
Arxiv

MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation

Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan

In Arxiv, 2024

PDF Code
Arxiv

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion

Jun Liu, Chao Wu, Changdi Yang, Hao Tang*, Haoye Dong, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang*, Yanzhi Wang*

In Arxiv, 2024

PDF Code
Arxiv

StableGarment: Garment-Centric Generation via Stable Diffusion

Rui Wang, Hailong Guo, Jiaming Liu, Huaxia Li, Haibo Zhao, Xu Tang, Yao Hu, Hao Tang, Peipei Li

In Arxiv, 2024

PDF Code
Arxiv

SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior

Huan-ang Gao, Mingju Gao, Jiaju Li, Wenyi Li, Rong Zhi, Hao Tang, Hao Zhao

In Arxiv, 2024

PDF Code
CVPR
Highlight

HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud

Wencan Cheng, Hao Tang, Luc Van Gool, Jong Hwan Ko

In CVPR 2024, Seattle, USA

PDF Code
CVPR

Versatile Navigation under Partial Observability via Value-guided Diffusion Policy

Gengyu Zhang, Hao Tang, Yan Yan

In CVPR 2024, Seattle, USA

PDF Code
CVPR

Towards Robust 3D Pose Transfer with Adversarial Learning

Haoyu Chen, Hao Tang, Ehsan Adeli, Guoying Zhao

In CVPR 2024, Seattle, USA

PDF Code
CVPR

On the Faithfulness of Vision Transformer Explanations

Junyi Wu, Weitai Kang, Hao Tang, Yuan Hong, Yan Yan

In CVPR 2024, Seattle, USA

PDF Code
CVPR

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan

In CVPR 2024, Seattle, USA

PDF Code
CVPR

SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Yuxuan Zhang, Jiaming Liu, Yiren Song, Rui Wang, Hao Tang, Jinpeng Yu, Huaxia Li, Xu Tang, Yao Hu, Han Pan, Zhongliang Jing

In CVPR 2024, Seattle, USA

PDF Code
CVPR
Workshop

Towards Online Real-Time Memory-based Video Inpainting Transformers

Guillaume Thiry, Hao Tang*, Radu Timofte, Luc Van Gool

In CVPR 2024, Seattle, USA

PDF Code
AAAI

G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model

Pan Xie, Qipeng Zhang, Peng Taiying, Hao Tang*, Yao Du, Zexian Li

In AAAI 2024, Vancouver, Canada

PDF Code
NeurIPS

HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception

Peiyan Dong, Zhenglun Kong, Xin Meng, Pinrui Yu, Yifan Gong, Geng Yuan, Hao Tang*, Yanzhi Wang

In NeurIPS 2023, New Orleans, USA

PDF Code
NeurIPS

PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile

Peiyan Dong, Lei Lu, Chao Wu, Cheng Lyu, Geng Yuan, Hao Tang*, Yanzhi Wang

In NeurIPS 2023, New Orleans, USA

PDF Code
NeurIPS

LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer

Haoyu Chen, Hao Tang, Radu Timofte, Luc Van Gool, Guoying Zhao

In NeurIPS 2023, New Orleans, USA

PDF Code
NeurIPS

Does Graph Distillation See Like Vision Dataset Counterpart?

Beining Yang, Kai Wang, Qingyun Sun, Cheng Ji, Xingcheng Fu, Hao Tang, Yang You, Jianxin Li

In NeurIPS 2023, New Orleans, USA

PDF Code
MIR

Practical Blind Image Denoising via Swin-Conv-UNet and Data Synthesis

Kai Zhang, Yawei Li, Jingyun Liang, Jiezhang Cao, Yulun Zhang, Hao Tang, Dengping Fan, Radu Timofte, Luc Van Gool

Springer Machine Intelligence Research (MIR), 2023

PDF Code
ICCV

Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification

Jianbing Wu, Hong Liu, Yuxin Su, Wei Shi, Hao Tang

In ICCV 2023, Paris, France

PDF Code
ICML

SpeedDETR: Speed-aware Transformers for End-to-end Object Detection

Peiyan Dong, Zhenglun Kong, Xin Meng, Peng Zhang, Hao Tang*, Yanzhi Wang, Chih-Hsien Chou

In ICML 2023, Hawaii, USA

PDF Code
IJCAI

Data Level Lottery Ticket Hypothesis for Vision Transformers

Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang

In IJCAI 2023, Macao, China

PDF Code
CVPR

Graph Transformer GANs for Graph-Constrained House Generation

Hao Tang, Zhenyu Zhang, Humphrey Shi, Bo Li, Ling Shao, Nicu Sebe, Radu Timofte, Luc Van Gool

In CVPR 2023, Vancouver, Canada

PDF Code
CVPR

Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration

Guofeng Mei, Hao Tang, Xiaoshui Huang, Weijie Wang, Juan Liu, Jian Zhang, Luc Van Gool, Qiang Wu

In CVPR 2023, Vancouver, Canada

PDF Code
CVPR

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

Xuan Shen, Yaohua Wang, Ming Lin, Yilun Huang, Hao Tang, Xiuyu Sun, Yanzhi Wang

In CVPR 2023, Vancouver, Canada

PDF Code
CVPR

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis

Ming Tao, Bingkun Bao, Hao Tang, Changsheng Xu

In CVPR 2023, Vancouver, Canada

PDF Code
ICLR

Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis

Hao Tang, Xiaojuan Qi, Guolei Sun, Dan Xu, Nicu Sebe, Radu Timofte, Luc Van Gool

In ICLR 2023, Kigali, Rwanda

PDF Code
ECCV

3D-Aware Semantic-Guided Generative Model for Human Synthesis

Jichao Zhang, Enver Sangineto, Hao Tang, Aliaksandr Siarohin, Zhun Zhong, Nicu Sebe, Wei Wang

In ECCV 2022, Tel Aviv, Israel

PDF Code
ECCV

Towards Interpretable Video Super-Resolution via Alternative Optimization

Jiezhang Cao, Jingyun Liang, Kai Zhang, Wenguan Wang, Qin Wang, Yulun Zhang, Hao Tang, Luc Van Gool

In ECCV 2022, Tel Aviv, Israel

PDF Code
CVPR

MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation

Wenhao Li, Hong Liu, Hao Tang, Pichao Wang, Luc Van Gool

In CVPR 2022, New Orleans, USA

PDF Code
CVPR Oral

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

Ming Tao, Hao Tang, Fei Wu, Xiaoyuan Jing, Bingkun Bao, Changsheng Xu

In CVPR 2022, New Orleans, USA

PDF Code
ACM MM Best
Paper Candidate

GestureGAN for Hand Gesture-to-Gesture Translation in the Wild

Hao Tang, Wei Wang, Dan Xu, Yan Yan, Nicu Sebe

In ACM MM 2018, Seoul, South Korea

PDF Code