Obtaining a good baseline between different video frames is one of the key elements in vision-based monocular SLAM systems. However, if the video frames contain only a few 2D feature correspondences with a good baseline, or the camera only rotates without sufficient translation in the beginning, tracking and mapping becomes unstable. We introduce a real-time visual SLAM system that incrementally tracks individual 2D features, and estimates camera pose by using matched 2D features, regardless of the length of the baseline. Triangulating 2D features into 3D points is deferred until keyframes with sufficient baseline for the features are available. Our method can also deal with pure rotational motions, and fuse the two types of measurements in a bundle adjustment step. Adaptive criteria for keyframe selection are also introduced for efficient optimization and dealing with multiple maps. We demonstrate that our SLAM system improves camera pose estimates and robustness, even with purely rotational motions.
This material is posted here with permission of the IEEE. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to firstname.lastname@example.org.