Visual Localization and Mapping

Background

Precise localization technology plays a crucial role in many applications, including augmented reality and autonomous systems. Common approaches utilize expensive hardware such as LIDAR sensors, GPS, WiFi, or Bluetooth. However, these hardware sensors become less effective in GPS-denied environments, including adverse weather conditions and indoor environments. As an alternative solution, inexpensive visual sensors can be employed. Nevertheless, robust visual localization based on a single image remains challenging.

To address this challenge, we propose a method for robust visual localization that replaces either all steps or certain components of the traditional pipeline with deep learning-based trainable models. This approach aims to overcome the limitations of current visual localization techniques by leveraging the power of deep learning to improve accuracy and robustness in challenging environments.

Proposal Methods

FeatLoc

FeatLoc is a method for indoor 3D self-localization. It was published in the ISPRS Journal of Photogrammetry and Remote Sensing in 2022. FeatLoc uses 2D camera data to perform 3D self-localization in indoor environments. It has been proven to achieve higher accuracy and performance compared to existing methods.

D2s

Previous state-of-the-art self-localization methods may involve significant costs during inference and storage due to their complex procedures. In response, D2S offers a method that generates 3D scene coordinates from sparse descriptors extracted from a single RGB image using a simple network. This approach demonstrates superior performance compared to state-of-the-art CNN-based methods.

PL2Map

In recent years, integrating point and line features in images has been expected to achieve better accuracy as a method for visual localization and mapping. However, extending localization frameworks can increase the memory and computational resources required for matching tasks. PL2Map is a technique where lightweight neural networks learn to represent both 3D point and line features, achieving excellent pose accuracy by leveraging multiple learned mappings.

Demo Video of PL2Map

D2S+NeRF

Neural Radiance Fields (NeRF) is a technology published in 2020 that uses neural networks to create 3D scenes from multiple photographs. Due to deep learning models’ dependence on large amounts of data, performance decreases when data samples are limited. To address this challenge, D2S+NeRF introduces a pipeline for keypoint descriptor synthesis using NeRF, effectively solving this issue.

Demo Video of D2S+NeRF