📝 Publications

SIGGRAPH 2025
sym

High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion

Xiang Zhang, Yang Zhang, Lukas Mehl, Markus Gross, Christopher Schroers

[arXiv]

  • We introduce SplatDiff, a pixel-splatting-guided video diffusion model for synthesizing novel views with consistent geometry and high-fidelity texture from a single image.
  • SplatDiff excels in single-view novel view synthesis, sparse-view novel view synthesis, and stereo video conversion, demonstrating remarkable crossdomain and cross-task performance.
NeurIPS 2024
sym

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

Xiang Zhang, Bingxin Ke, Hayko Riemenschneider, Nando Metzger, Anton Obukhov, Markus Gross, Konrad Schindler, Christopher Schroers

[arXiv] [Video] [Poster]

  • We propose BetterDepth to boost zero-shot MDE methods with plug-and-play diffusion refiners, achieving robust affine-invariant MDE performance with fine-grained details.
  • We design global pre-alignment and local patch masking strategies to enable learning detail refinement from small-scale synthetic datasets while preserving rich prior knowledge from pre-trained MDE models for zero-shot transfer.
ECCV 2024 - Oral
sym

HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

Xiang Zhang, Yulun Zhang, Fisher Yu

[Code] [Supp] [Video]

  • We propose a simple yet effective strategy (HiT-SR) to convert popular transformer-based SR methods to our hierarchical transformers, boosting SR performance by exploiting multi-scale features and long-range dependencies.
  • We design a spatial-channel correlation method to efficiently leverage spatial and channel features with linear computational complexity to window sizes, enabling utilization of large hierarchical windows, e.g., $64\times64$ windows.
ICCV 2023
sym

Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Xiang Zhang, Lei Yu, Wen Yang, Jianzhuang Liu, Gui-Song Xia

[Code] [Dataset] [Youtube]

  • A scale-aware network is designed to allow flexible setups of input spatial resolutions and enable learning from different temporal scales of motion blur.
  • A self-supervised learning framework is proposed for model training with real-world data and performance generalization in spatial and temporal domains.
  • A multi-scale real-world blurry dataset (MS-RBD) is constructed to facilitate the evaluation of deblurring performance in real-world scenarios.
TPAMI 2022
sym

Learning to See Through with Events

Lei Yu, Xiang Zhang, Wei Liao, Wen Yang, Gui-Song Xia

[Code] [Dataset] [Bilibili]

  • An event-based synthetic aperture imaging (E-SAI) algorithm is proposed to see through dense occlusions even under extreme lighting conditions.
  • A hybrid network composed of an spiking encoder and a convolutional decoder is designed to mitigate the disturbances from occlusions and guarantee the overall reconstruction performance.
CVPR 2022
sym

Unifying Motion Deblurring and Frame Interpolation with Events

Xiang Zhang, Lei Yu

[Code] [Youtube]

  • We present a unified framework for event-based video deblurring and interpolation (EVDI).
  • By utilizing the constraints between cross-modal frames and events, a fully self-supervised learning method is proposed to enable network training with real-world data without requiring ground-truth images.

* means equal contribution and indicates my supervisor.