Generated by ChatGPT.

Fall 2025 - Web 15:10-18:00 PM, Peking University

This course provides a comprehensive introduction to deep learning techniques and large-scale models, emphasizing both foundational concepts and cutting-edge research achievements. It begins with an overview of deep learning research milestones and recent advances, followed by an in-depth exploration of deep generative models, including Autoregressive (AR) Models, Variational Autoencoders (VAEs), Normalizing Flow Models, Generative Adversarial Networks (GANs), Energy-based Models, Score-Based Models, Diffusion Models, Mamba Models, Hybrid Generative Models, efficient generative modeling techniques, and evaluation methods.

The course then covers core computer vision tasks such as image classification with Convolutional Neural Networks (CNNs), object detection, and image segmentation. Students will also study Large Language Models (LLMs), including RNNs, LSTMs, Attention mechanisms, and Transformers, and their applications in agent AI. Multimodal LLMs are introduced to enable understanding across text, vision, and other modalities. Further topics include robot learning and world models, as well as AI applications in scientific domains, including material science, protein and biology, video understanding, and 3D/geometry modeling.

Designed for graduate students and researchers, this course equips participants with the knowledge to conduct advanced research and apply deep learning and large models to a wide range of AI problems, from generative modeling and multimodal understanding to embodied intelligence and scientific AI applications.

Syllabus

10 Weeks

  • 1: Introduction + Research Achievements + Advances
  • 2: Deep Generative Models (AR, VAE, Normalizing Flow)
  • 3: Deep Generative Models (GAN, Energy-based Models, Score-Based Models, Diffusion Models)
  • 4: October 1st, holiday
  • 5: October 8th, holiday
  • 6: Deep Generative Models (Mamba Models, Efficient Generative Models)
  • 7: Image Classification (CNN Architectures), Object Detection, Image Segmentation
  • 8: LLM (RNN, LSTM, Attention and Transformer), Multimodal LLM, Agent AI
  • 9: Robot Leaning + World Model
  • 10: AI4Scicence (Material Science, Protein and Biology), 3D and Geometry

7 Weeks

  • 11, 12, 13 (3 weeks): Paper Presentation
  • 14, 15, 16 (3 weeks): Project Presentation
  • 17: Exam

Course Staff

Feedback

For questions, please discuss on the Wechat group. You can also email Dr. Hao Tang at hao.tang@pku.edu.cn.