Applications in Monocular Computer Vision using Geometry and Learning: Map Merging, 3D Reconstruction and Detection of Geometric Primitives

Research output: ThesisDoctoral Thesis (compilation)

49 Downloads (Pure)


As the dream of autonomous vehicles moving around in our world comes closer, the problem of robust localization and mapping is essential to solve. In this inherently structured and geometric problem we also want the agents to learn from experience in a data driven fashion. How the modern Neural Network models can be combined with Structure from Motion (SfM) is an interesting research question and this thesis studies some related problems in 3D reconstruction, feature detection, SfM and map merging.

In Paper I we study how a Bayesian Neural Network (BNN) performs in Semantic Scene Completion, where the task is to predict a semantic 3D voxel grid for the Field of View of a single RGBD image. We propose an extended task and evaluate the benefits of the BNN when encountering new classes at inference time. It is shown that the BNN outperforms the deterministic baseline.

Papers II-­III are about detection of points, lines and planes defining a Room Layout in an RGB image. Due to the repeated textures and homogeneous colours of indoor surfaces it is not ideal to only use point features for Structure from Motion. The idea is to complement the point features by detecting a Wireframe – a connected set of line segments – which marks the intersection of planes in the Room Layout. Paper II concerns a task for detecting a Semantic Room Wireframe and implements a Neural Network model utilizing a Graph Convolutional Network module. The experiments show that the method is more flexible than previous Room Layout Estimation methods and perform better than previous Wireframe Parsing methods. Paper III takes the task closer to Room Layout Estimation by detecting a connected set of semantic polygons in an RGB image. The end­-to-­end trainable model is a combination of a Wireframe Parsing model and a Heterogeneous Graph Neural Network. We show promising results by outperforming state of the art models for Room Layout Estimation using synthetic Wireframe detections. However, the joint Wireframe and Polygon detector requires further research to compete with the state of the art models.

In Paper IV we propose minimal solvers for SfM with parallel cylinders. The problem may be reduced to estimating circles in 2D and the paper contributes with theory for the two­view relative motion and two­-circle relative structure problem. Fast solvers are derived and experiments show good performance in both simulation and on real data.

Papers V-­VII cover the task of map merging. That is, given a set of individually optimized point clouds with camera poses from a SfM pipeline, how can the solutions be effectively merged without completely re­solving the Structure from Motion problem? Papers V­-VI introduce an effective method for merging and shows the effectiveness through experiments of real and simulated data. Paper VII considers the matching problem for point clouds and proposes minimal solvers that allows for deformation of
each point cloud. Experiments show that the method robustly matches point clouds with drift in the SfM solution.
Original languageEnglish
Awarding Institution
  • Mathematics (Faculty of Engineering)
  • Åström, Kalle, Supervisor
  • Flood, Gabrielle, Assistant supervisor
  • Heyden, Anders, Assistant supervisor
Award date2023 Jun 2
Place of PublicationLund
ISBN (Print)978-91-8039-643-1
ISBN (electronic) 978-91-8039-644-8
Publication statusPublished - 2023

Bibliographical note

Defence details
Date: 2023-06-02
Time: 13:15
Place: Lecture Hall Hörmander, Centre of Mathematical Sciences, Sölvegatan 18, Faculty of Engineering LTH, Lund University, Lund.
External reviewer(s)
Name: Maki, Atsuto
Title: Prof.
Affiliation: KTH Royal Institute of Technology, Sweden.

Subject classification (UKÄ)

  • Computer Vision and Robotics (Autonomous Systems)


Dive into the research topics of 'Applications in Monocular Computer Vision using Geometry and Learning: Map Merging, 3D Reconstruction and Detection of Geometric Primitives'. Together they form a unique fingerprint.

Cite this