Hao Tang

Robotics Institute, Carnegie Mellon University
Office: Smith Hall, 5000 Forbes Av, Pittsburgh, PA 15213, USA🇺🇸
Email: bjdxtanghao@gmail.com

Hey, thanks for stopping by! 👋

I am currently a postdoctoral fellow at the Robotics Institute, Carnegie Mellon University, USA🇺🇸, working under the guidance of Prof. Fernando De la Torre. Before this, I held a postdoctoral position at the Computer Vision Lab, ETH Zürich, Switzerland🇨🇭, under the supervision of Prof. Radu Timofte and Prof. Luc Van Gool. My academic journey includes earning a master’s degree from the School of Electronics and Computer Engineering, Peking University, China🇨🇳, under the mentorship of Prof. Hong Liu, and completing my Ph.D. studies with the Multimedia and Human Understanding Group, University of Trento, Italy🇮🇹, under the mentorship of Prof. Nicu Sebe. Additionally, I had the privilege of being a visiting scholar in the Department of Engineering Science at the University of Oxford, UK🇬🇧, under the guidance of Prof. Philip Torr. Furthermore, I undertook a visiting internship at the IIAI, UAE🇦🇪, under the supervision of Prof. Ling Shao.

My research interests are AIGC, AI4Science, machine learning, and computer vision. Specifically, I focus on:

  • Generative AI (GANs, diffusion models) and its applications (e.g., image generation, image translation, text-to-image synthesis/editing, person image synthesis, semantic image synthesis, style transfer, video generation, graph-based generation)
  • Efficient AI (e.g., pruning, distillation, quantization, NAS)
  • Large language model (LLM)
  • AI for Science
  • Multi-modalities (e.g., audio-to-video synthesis, language-vision model)
  • Low-level vision (image/video restoration, super-resolution, denoising, deblurring, HDR deghosting)
  • High-level vision (depth estimation, segmentation, detection, recognition)
  • 3D vision (e.g., nerf, 3D-aware image/video generation, object reconstruction/generation, 3D pose transfer)
  • Medical image enhancement and analysis
  • Point cloud registration and segmentation
  • Human pose estimation and motion prediction
  • Network robustness and interpretability

Position Openings

If this resonates with you, we are actively hiring.

For prospective collaborators, we have multiple positions for Postdoc/Ph.D./Master/Intern researchers. If you are interested in joining/visitng our lab or remotely working with us, please email me with your self-introduction, the project of interest (what is the problem you are trying to solve? and how are you trying to solve this problem (be as specific as possible)?), transcript, and CV to bjdxtanghao@gmail.com.

News

2024-02 We have 7 papers (Explanation for ViT + Faithfulness of ViT + Diffusion Policy for Versatile Navigation + Subject-Driven Generation [Final rating: 455] + Diffusion Model for 3D Hand Pose Estimation + Adversarial Learning for 3D Pose Transfer + Efficient Diffusion Distillation [224->235]) accepted to CVPR 2024.
2024-01 We have 1 paper (Architectural Layout Generation) accepted to TPAMI 2024.
2023-12 We have 1 paper (Sign Pose Sequence Generation) accepted to AAAI 2024.
2023-10 We have 4 papers (BEV Perception + Efficient ViT + 3D Motion Transfer + Graph Distillation) accepted to NeurIPS 2023.
2023-07 We have 1 paper (Semantic Image Synthesis) accepted to TPAMI 2023.
2023-06 We have 1 paper (Visible-Infrared Person Re-ID) accepted to ICCV 2023.
2023-05 We have 2 papers (Image Restoration Dataset + 3D-Aware Video Generation) accepted to CVPRW 2023 and 1 paper (3D Face Generation) accepted to JSTSP.
2023-04 We have 1 paper (Speed-Aware Object Detection) accepted to ICML 2023, 2 papers (Lottery Ticket Hypothesis for ViT + Zero-shot Character Recognition) accepted to IJCAI 2023, 1 paper (3D Human Pose Estimation) accepted to PR 2023, and 1 paper (SAR Target Recognition) accepted to TGRS 2023.
2023-03 We have 6 papers (HDR Deghosting + Point Cloud Registration + Graph-Constrained House Generation + Mathematical Architecture Design + Text-to-Image Synthesis + Efficient Semantic Segmentation) accepted to CVPR 2023.
2023-02 We have 3 papers (Camouflaged Object Detection + Brain Vessel Image Segmentation + Cross-View Image Translation) accepted to ICASSP 2023 and 1 paper (Camouflaged Object Detection) accepted to TCSVT.
2023-01 We have 1 paper (Semantic Image Synthesis) accepted to ICLR 2023 and 1 paper (Human Reaction Generation) accepted to TMM.
2022-11 We have 4 papers (Real-Time Segmentation + Wearable Design + Efficient ViT Training + Text-Guided Image Editing) accepted to AAAI 2023, 1 paper accepted (Person Pose and Facial Image Synthesis) to IJCV, 1 paper (Salient Object Detection) accepted to TIP, and 1 paper (Object Detection Transformer) accepted to TCSVT.
2022-10 We have 1 paper (Sinusoidal Neural Radiance Fields) accepted to BMVC 2022 and 1 paper (Guided Image-to-Image Translation) accepted to TPAMI.
2022-09 We have 1 paper (Facial Expression Translation) accepted to TAFFC and 1 paper (Ship Detection) accepted to TGRS.
2022-07 We have 5 papers (Real-Time SR + Video SR + Soft Token Pruning for ViT + 3D-Aware Human Synthesis + Video Semantic Segmentation) accepted to ECCV 2022, 1 paper (Gaze Correction and Animation) accepted to TIP, and 1 paper (Cross-view Panorama Image Synthesis) accepted to PR.
2022-06 We have 2 papers accepted to ACM MM 2022.
2022-04 We have 1 paper accepted to IJCAI 2022, 1 paper accepted to TGRS, and 1 paper accepted to TMM.
2022-03 We have 5 papers accepted to CVPR 2022, 1 paper accepted to TPAMI, and 1 paper accepted to TMM.
2021-12 We have 2 papers accepted to AAAI 2022.
2021-11 We have 1 paper accepted to TIP.
2021-10 We have 3 papers accepted to BMVC 2021.
2021-08 We have 1 paper accepted to TIP and 1 paper accepted to TNNLS.
2021-07 We have 2 papers accepted to ICCV 2021.
2021-06 We have 1 paper accepted to ACM MM 2021 and 1 paper accepted to TMM.
2020-08 We have 1 paper accepted to BMVC 2020, 2 papers accepted to ACM MM 2020, and 1 paper accepted to TIP.
2020-07 We have 1 paper accepted to ECCV 2020.
2020-05 We have 1 paper accepted to TNNLS and 1 paper accepted to TGRS.
2020-02 We have 1 paper accepted to CVPR 2020.
2019-07 We have 1 paper accepted to ACM MM 2019.
2019-02 We have 1 paper accepted to CVPR 2019.
2018-06 We have 1 paper accepted to ACM MM 2018.
2018-02 We have 1 paper accepted to CVPR 2018.
2016-07 We have 1 paper accepted to IJCAI 2016.
2015-08 We have 1 paper accepted to ACM MM 2015.

Featured Publications

(Including NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, AAAI, IJCAI, ACM MM)

Equal Contribution, *Corresponding Author(s)

  1. Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
     Zeyu Zhang,  Akide Liu,  Ian Reid,  Richard Hartley,  Bohan Zhuang,  Hao Tang*
    In Arxiv, 2024
  2. StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
     Ming Tao,  Bingkun Bao,  Hao Tang,  Yaowei Wang,  Changsheng Xu
    In Arxiv, 2024
  3. InstructGIE: Towards Generalizable Image Editing
     Zichong Meng,  Changdi Yang,  Jun Liu,  Hao Tang*,  Pu Zhao*,  Yanzhi Wang*
    In Arxiv, 2024
  4. MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation
     Bin Xie,  Hao Tang,  Bin Duan,  Dawen Cai,  Yan Yan
    In Arxiv, 2024
  5. Efficient Pruning of Large Language Model with Adaptive Estimation Fusion
     Jun Liu,  Chao Wu,  Changdi Yang,  Hao Tang*,  Haoye Dong,  Zhenglun Kong,  Geng Yuan,  Wei Niu,  Dong Huang*,  Yanzhi Wang*
    In Arxiv, 2024
  6. StableGarment: Garment-Centric Generation via Stable Diffusion
     Rui Wang,  Hailong Guo,  Jiaming Liu,  Huaxia Li,  Haibo Zhao,  Xu Tang,  Yao Hu,  Hao Tang,  Peipei Li
    In Arxiv, 2024
  7. SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior
     Huan-ang Gao,  Mingju Gao,  Jiaju Li,  Wenyi Li,  Rong Zhi,  Hao Tang,  Hao Zhao
    In Arxiv, 2024
  8. HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud
     Wencan Cheng,  Hao Tang,  Luc Van Gool,  Jong Hwan Ko
    In CVPR 2024, Seattle, USA
  9. Versatile Navigation under Partial Observability via Value-guided Diffusion Policy
     Gengyu Zhang,  Hao Tang,  Yan Yan
    In CVPR 2024, Seattle, USA
  10. Towards Robust 3D Pose Transfer with Adversarial Learning
     Haoyu Chen,  Hao Tang,  Ehsan Adeli,  Guoying Zhao
    In CVPR 2024, Seattle, USA
  11. On the Faithfulness of Vision Transformer Explanations
     Junyi Wu,  Weitai Kang,  Hao Tang,  Yuan Hong,  Yan Yan
    In CVPR 2024, Seattle, USA
  12. Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
     Junyi Wu,  Bin Duan,  Weitai Kang,  Hao Tang,  Yan Yan
    In CVPR 2024, Seattle, USA
  13. SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
     Yuxuan Zhang,  Jiaming Liu,  Yiren Song,  Rui Wang,  Hao Tang,  Jinpeng Yu,  Huaxia Li,  Xu Tang,  Yao Hu,  Han Pan,  Zhongliang Jing
    In CVPR 2024, Seattle, USA
  14. Towards Online Real-Time Memory-based Video Inpainting Transformers
     Guillaume Thiry,  Hao Tang*,  Radu Timofte,  Luc Van Gool
    In CVPR 2024, Seattle, USA
  15. G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model
     Pan Xie,  Qipeng Zhang,  Peng Taiying,  Hao Tang*,  Yao Du,  Zexian Li
    In AAAI 2024, Vancouver, Canada
  16. HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception
     Peiyan Dong,  Zhenglun Kong,  Xin Meng,  Pinrui Yu,  Yifan Gong,  Geng Yuan,  Hao Tang*, Yanzhi Wang
    In NeurIPS 2023, New Orleans, USA
  17. PackQViT: Faster Sub-8-bit Vision Transformers via Full and Packed Quantization on the Mobile
     Peiyan Dong,  Lei Lu,  Chao Wu,  Cheng Lyu,  Geng Yuan,  Hao Tang*, Yanzhi Wang
    In NeurIPS 2023, New Orleans, USA
  18. LART: Neural Correspondence Learning with Latent Regularization Transformer for 3D Motion Transfer
     Haoyu Chen,  Hao Tang,  Radu Timofte,  Luc Van Gool,  Guoying Zhao
    In NeurIPS 2023, New Orleans, USA
  19. Does Graph Distillation See Like Vision Dataset Counterpart?
     Beining Yang,  Kai Wang,  Qingyun Sun,  Cheng Ji,  Xingcheng Fu,  Hao Tang,  Yang You,  Jianxin Li
    In NeurIPS 2023, New Orleans, USA
  20. Learning Concordant Attention via Target-aware Alignment for Visible-Infrared Person Re-identification
     Jianbing Wu,  Hong Liu,  Yuxin Su,  Wei Shi,  Hao Tang
    In ICCV 2023, Paris, France
  21. SpeedDETR: Speed-aware Transformers for End-to-end Object Detection
     Peiyan Dong,  Zhenglun Kong,  Xin Meng,  Peng Zhang,  Hao Tang*,  Yanzhi Wang,  Chih-Hsien Chou
    In ICML 2023, Hawaii, USA
  22. Data Level Lottery Ticket Hypothesis for Vision Transformers
     Xuan Shen,  Zhenglun Kong,  Minghai Qin,  Peiyan Dong,  Geng Yuan,  Xin Meng,  Hao Tang,  Xiaolong Ma,  Yanzhi Wang
    In IJCAI 2023, Macao, China
  23. Graph Transformer GANs for Graph-Constrained House Generation
     Hao Tang,  Zhenyu Zhang,  Humphrey Shi,  Bo Li,  Ling Shao,  Nicu Sebe,  Radu Timofte,  Luc Van Gool
    In CVPR 2023, Vancouver, Canada
  24. Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration
     Guofeng Mei,  Hao Tang,  Xiaoshui Huang,  Weijie Wang,  Juan Liu,  Jian Zhang,  Luc Van Gool,  Qiang Wu
    In CVPR 2023, Vancouver, Canada
  25. DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
     Xuan Shen,  Yaohua Wang,  Ming Lin,  Yilun Huang,  Hao Tang,  Xiuyu Sun,  Yanzhi Wang
    In CVPR 2023, Vancouver, Canada
  26. GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
     Ming Tao,  Bingkun Bao,  Hao Tang,  Changsheng Xu
    In CVPR 2023, Vancouver, Canada
  27. Edge Guided GANs with Contrastive Learning for Semantic Image Synthesis
     Hao Tang,  Xiaojuan Qi,  Guolei Sun,  Dan Xu,  Nicu Sebe,  Radu Timofte,  Luc Van Gool
    In ICLR 2023, Kigali, Rwanda
  28. 3D-Aware Semantic-Guided Generative Model for Human Synthesis
     Jichao Zhang,  Enver Sangineto,  Hao Tang,  Aliaksandr Siarohin,  Zhun Zhong,  Nicu Sebe,  Wei Wang
    In ECCV 2022, Tel Aviv, Israel
  29. Towards Interpretable Video Super-Resolution via Alternative Optimization
     Jiezhang Cao,  Jingyun Liang,  Kai Zhang,  Wenguan Wang,  Qin Wang,  Yulun Zhang,  Hao Tang,  Luc Van Gool
    In ECCV 2022, Tel Aviv, Israel
  30. MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
     Wenhao Li,  Hong Liu,  Hao Tang,  Pichao Wang,  Luc Van Gool
    In CVPR 2022, New Orleans, USA
  31. DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis
     Ming Tao,  Hao Tang,  Fei Wu,  Xiaoyuan Jing,  Bingkun Bao,  Changsheng Xu
    In CVPR 2022, New Orleans, USA
  32. GestureGAN for Hand Gesture-to-Gesture Translation in the Wild
     Hao Tang,  Wei Wang,  Dan Xu,  Yan Yan,  Nicu Sebe
    In ACM MM 2018, Seoul, South Korea