InstantSfM: Fully Sparse and Parallel Structure-from-Motion

1University of Southern California, 2University at Buffalo, 3Tsinghua University,
* Equal Contribution

TLDR: InstantSfM is a fully sparse and parallel Structure-from-Motion pipeline that leverages GPU acceleration to achieve up to 40Γ— speedup over traditional methods like COLMAP while maintaining or improving reconstruction accuracy across diverse datasets.

Abstract πŸ“„

Structure-from-Motion (SfM), a method that recovers camera poses and scene geometry from uncalibrated images, is a central component in robotic reconstruction and simulation. Despite the state-of-the-art performance of traditional SfM methods such as COLMAP and its follow-up work, GLOMAP, naΓ―ve CPU-specialized implementations of bundle adjustment (BA) or global positioning (GP) introduce significant computational overhead when handling large-scale scenarios.

In this paper, we unleash the full potential of GPU parallel computation to accelerate each critical stage of the standard SfM pipeline. Building upon recent advances in sparse-aware bundle adjustment optimization, our design extends these techniques to accelerate both BA and GP within a unified global SfM framework.

Through extensive experiments on datasets of varying scales (e.g. 5000 images where VGGSfM and VGGT run out of memory), our method demonstrates up to ∼40Γ— speedup over COLMAP while achieving consistently comparable or even improved reconstruction accuracy.

Method Overview πŸ› οΈ

InstantSfM introduces a comprehensive PyTorch-based SfM pipeline that leverages GPU acceleration for both Bundle Adjustment and Global Positioning optimization stages.

Sparse Jacobian Structure

Sparse-Aware Optimization 🧠

The key insight is that the Jacobian matrix in SfM optimization is highly sparse. We implement efficient sparse matrix operations using cuSPARSE to dramatically reduce both memory usage and computational cost.

Performance Comparison ⚑

Runtime Comparison

Scalability πŸ“Š

InstantSfM demonstrates superior scalability compared to COLMAP and GLOMAP, with speedups becoming more pronounced as the number of images increases. Our method can handle thousands of images efficiently on a single GPU.

Key Features 🌟

⚑ GPU Acceleration

Complete SfM pipeline implemented in PyTorch with CUDA acceleration, enabling efficient parallel processing of large-scale datasets.

🧠 Depth Prior Integration

Support for incorporating ground truth depth priors to achieve metric-scale reconstruction, crucial for robotics applications.

πŸ”— Easy Integration

User-friendly PyTorch implementation that enables seamless integration with other machine learning frameworks and custom optimization routines.

Interactive 3D Reconstruction πŸ–±οΈπŸ§©

Explore the 3D reconstructions generated by InstantSfM. Click and drag to rotate, scroll to zoom, and use the dropdown to switch between different scenes.

Use mouse to rotate (left click + drag), pan (right click + drag), and zoom (scroll wheel). Point clouds are colored by RGB values extracted during reconstruction.

Presentation Video 🎬

BibTeX

@article{zhong2025instantsfm,
  title={InstantSfM: Fully Sparse and Parallel Structure-from-Motion},
  author={Zhong, Jiankun and Zhan, Zitong and Gao, Quankai and Chen, Ziyu and Lou, Haozhe and Mao, Jiageng and Neumann, Ulrich and Wang, Yue},
  journal={arXiv preprint},
  year={2025}
}