📝 Publications

CVPR 2026
sym

Guardians of the Hair: Rescuing Soft Boundaries in Depth, Stereo, and Novel Views

Xiang Zhang, Yang Zhang, Lukas Mehl, Markus Gross, Christopher Schroers

[arXiv]

  • We present HairGuard to capture, model, and reconstruct fine-grained soft boundary details in 3D vision tasks, achieving state-of-the-art performance on monocular depth estimation, stereo conversion, and novel view synthesis.
  • We leverage image matting datasets for training, enabling HairGuard to automatically identify and fix soft boundaries without relying on manually crafted cues like trimaps. A plug-and-play depth fixer is proposed for precise refinement, alongside a color fuser for high-quality view synthesis.
SIGGRAPH 2025
sym

High-Fidelity Novel View Synthesis via Splatting-Guided Diffusion

Xiang Zhang, Yang Zhang, Lukas Mehl, Markus Gross, Christopher Schroers

[Website] [Paper] [arXiv] [Supp] [Video]

  • We introduce SplatDiff, a pixel-splatting-guided video diffusion model for synthesizing novel views with consistent geometry and high-fidelity texture from a single image.
  • SplatDiff excels in single-view novel view synthesis, sparse-view novel view synthesis, and stereo video conversion, demonstrating remarkable crossdomain and cross-task performance.
NeurIPS 2024
sym

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

Xiang Zhang, Bingxin Ke, Hayko Riemenschneider, Nando Metzger, Anton Obukhov, Markus Gross, Konrad Schindler, Christopher Schroers

[Website] [arXiv] [Poster]

  • We propose BetterDepth to boost zero-shot MDE methods with plug-and-play diffusion refiners, achieving robust affine-invariant MDE performance with fine-grained details.
  • We design global pre-alignment and local patch masking strategies to enable learning detail refinement from small-scale synthetic datasets while preserving rich prior knowledge from pre-trained MDE models for zero-shot transfer.
ECCV 2024
sym

HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

Xiang Zhang, Yulun Zhang, Fisher Yu

ECCV 2024 Oral Presentation

[Code] [Supp] [Video]

  • We propose a simple yet effective strategy (HiT-SR) to convert popular transformer-based SR methods to our hierarchical transformers, boosting SR performance by exploiting multi-scale features and long-range dependencies.
  • We design a spatial-channel correlation method to efficiently leverage spatial and channel features with linear computational complexity to window sizes, enabling utilization of large hierarchical windows, e.g., $64\times64$ windows.
ICCV 2023
sym

Generalizing Event-Based Motion Deblurring in Real-World Scenarios

Xiang Zhang, Lei Yu, Wen Yang, Jianzhuang Liu, Gui-Song Xia

[Code] [Dataset] [Youtube]

  • A scale-aware network is designed to allow flexible setups of input spatial resolutions and enable learning from different temporal scales of motion blur.
  • A self-supervised learning framework is proposed for model training with real-world data and performance generalization in spatial and temporal domains.
  • A multi-scale real-world blurry dataset (MS-RBD) is constructed to facilitate the evaluation of deblurring performance in real-world scenarios.
TPAMI 2022
sym

Learning to See Through with Events

Lei Yu, Xiang Zhang, Wei Liao, Wen Yang, Gui-Song Xia

[Code] [Dataset] [Bilibili]

  • We provide more analysis of the E-SAI framework, including more details on the components of triggered events and the corresponding epipolar geometry.
  • We design a spatial transformer network to automatically refocus the events collected by a moving event camera with fronto-parallel uniform motion, relaxing the dependence on prior information such as camera velocity and target depth.
CVPR 2022
sym

Unifying Motion Deblurring and Frame Interpolation with Events

Xiang Zhang, Lei Yu

[Code] [Youtube]

  • We present a unified framework for event-based video deblurring and interpolation (EVDI) that generates arbitrarily high frame-rate sharp videos from blurry inputs.
  • By utilizing the constraints between cross-modal frames and events, a fully self-supervised learning method is proposed to enable network training with real-world data without requiring ground-truth images.
TSP 2022
sym

Spiking Sparse Recovery with Non-convex Penalties

Xiang Zhang, Lei Yu, Gang Zheng, Yonina C. Eldar

[Paper]

  • We present an adaptive sparse spiking recovery (A-SSR) algorithm to solve a class of non-convex regularized SR problems with spiking neural networks.
  • When implemented on the neuromorphic Loihi chip, our A-SSR can solve sparse recovery problems with approximately 1% of the power consumption of fast iterative shrinkage-thresholding algorithm.
CVPR 2021
sym

Event-based Synthetic Aperture Imaging with a Hybrid Network

Xiang Zhang*, Wei Liao*, Lei Yu, Wen Yang, Gui-Song Xia

CVPR 2021 Best Paper Candidate and Oral Presentation

[Code] [Dataset] [Youtube]

  • An event-based synthetic aperture imaging (E-SAI) algorithm is proposed to see through dense occlusions even under extreme lighting conditions.
  • A hybrid network composed of an spiking encoder and a convolutional decoder is designed to mitigate the disturbances from occlusions and guarantee the overall reconstruction performance.

* means equal contribution and indicates my supervisor.