Robotics Internship at Fudan Univeristy MAGIC Lab

Learning and experiencing Simultaneous Localization And Mapping (SLAM), perception, planning, and control

Prior to Start: Familiarizing with ROS

Before officially beginning the internship, we utilized online resources and textbooks to self learn about the fundamentals of the Robotics Operating System (ROS), which is a commonly used robotics software frameworks that has powerful open source development tools and algorithms. We installed corresponding configurations and packages relevent.

For our first stage of learning, we used Ubuntu 20.04 and the corresponding ROS Noetic versions to run our simulations and algorithms.

Week 1: Studying of SLAM and HKUST ELEC5660 Lectures

In the first week, we systematically studied visual Simultaneous Localization And Mapping (SLAM), including chapters 1-7 and 9.1 from textbook 视觉SLAM十四讲 (14 Chapters of Visual SLAM), which included everything about visual SLAM, from mathematics, algorithms to hardware and software. On the other hand, the lectures (Lec. 5-10) from Hong Kong University of Science and Technology ELEC5660: Introduction to Aerial Robots gave lively examples and specific illustration of SLAM topics.

SLAM is important as it allows the robot to understand its current location and what is surrounding it, allowing it map out a space, which can be crucial for its self-planning and execution of tasks. Localization means that it can use its onboard sensors, like cameras for images or odometers for wheel revolutions, to tell the amount of its movement. Mapping means that it can use its acquired information of its surroundings, either visually (i.e. from cameras), or from LiDar sensors to form a map.

The studying outcomes are enriching. A few important topics studied include:

Various types of cameras (monocular cams., binocular cams., as well as depth cameras) commonly seen in SLAM and how they work
How feature points can be extracted
Odometers and why they are important
Perspective-n-Point (PnP) problems
What is OpenCV and its capabilities
Kalman Filters, as well as Enhanced Kalman Filters (EKF)

Week 2: Problem Solving PnP Localization

A more project based problem solving of the PnP algorithm was done during second week. The project originated from HKUST ELEC5660's Project 2 Phase 1, which requires a hand-written PnP code snippet that estimates the pose of the subject, given a pre-recorded dataset. The code was realized and annotated (Figure 1) and a successful output in RViz was acquired (Figure 2)

Figure 1. Required Code and its Annotation of the Algorithm

Figure 2. Output of the Pose Estimation in RViz, Blue is the Given PnP, Red is the Written PnP

Week 3: Problem Solving EKF Fusion

On third week, the project about Enhanced Kalman Filter from HKUST ELEC5660 Project 2 Phase 2 was studied. Kalman Filter is a strong algorithm to reduce noise, and EKF solves Kalman Filter's inability to solve non-linear problems, which is the case in the project. As shown in Figure 3., the EKF filtered pose estimation (In Green Arrows) seems more smooth compared to non-filtered pose estimation (In Blue and Red)

Figure 3. Output of Pose Estimation, Green Represents EKF Outputs, Red and Blue for Unfiltered Outputs

Week 4: Visual SLAM Hands On

During fourth week, another internship member and I configured the environment for a cart robot and applied ORB-SLAM3 mapping (using RGB-D camera) to the office space.

We first studied and read relevent essays and documents about ORB-SLAM3 to learn about its principles and its pros and cons, which we eventually found matching as we conduct real life mapping with it. ORB-SLAM3 is a robust SLAM algorithm that have strong tracking abilities and can use multiple sensors together to enhance its mapping and localization capabilities: more on that at the end of this part.

ORB-SLAM3

Essay from ORB-SLAM3's authors

A few features of ORB-SLAM3 include its ability to quickly build a new map after it loses track on the previous map, which occurs a lot and can be a prime issue of mapping failures. With ORB-SLAM3's new map creation, it can keep track of the entire mapping session and detect loops, while connecting the new created maps to previous maps. Meanwhile, ORB-SLAm3's ability of utilizing multiple sensors grants it to be capable of recording at high speeds with good accuracy.

Platform

Base: AgileX Hunter SE

Development Platform: Nvidia Jetson Orin

Camera: Intel Realsense D435

LiDAR: Ouster OS1-32-U

Software: Ubuntu 20.04 & ROS Noetic

Figure 4. The Platform Used

Mapping Trails

Trail 1 (07/06/2023, 14:30, ample sunlight)

Mapping Conducted at a Rectangular Shaped Office Building Hallway

Lap 1

It can be seen that ORB-SLAM3 lost track of the map at one of the corners, which triggered ORB-SLAM3's new map mechanism. But ORB-SLAM3 was still able to identify the loop with the new map created. This lap was run at lower speed, using AgileX Hunter SE's remote control

Lap 2

On the second lap, a lost of track still occured at the same corner, resulting in the second new map to be created, but the overall map have a better outline. This lap was run at lower speed.

Lap 3

The lost of track still occured, but map seems overall constructed. This lap was run at lower speed. Below is a video of the cart mapping the third lap.

Trail 2 (07/06/2023, 16:30, adequate sunlight)

Mapping Conducted at a Rectangular Shaped Office Building Hallway

Lap 1

Perhaps due to more ideal sunlight conditions, the first lap of the second trial had great results even in higher speeds, with no lost of track and a full loop detected.

Lap 2

At lower speeds, the second lap completed the map substantially better.

ORB-SLAM3 Experience Summary

ORB-SLAM3 ran ideally and produced acceptable quality maps with Intel Realsense D435's depth camera, presenting its pros and cons in the real life scenario. It is able to keep up with high velocity mapping, and its new map mechanism helped to detect loops and realize the entire map. However, we do see some noises located at the center of the hallway, and that it can lose track of the map under bright sunlights and smooth surfaces. Figure 5 shows the corner that we saw ORB-SLAM3 lost track many times, which is a smooth metal surface with perhaps limited feature points.

Figure 5. The Corner where ORB-SLAM3 Often Lost Track, the Red Circle shows the Smooth Metal Surface

Week 5: Path Planning Hands On

In the fifth week of learning, we first covered the foundational concepts in Planning, which included an understanding of algorithms such as Dijkstra, A*, RRT, and the overall framework of planning. We decided to use ROS's built-in move_base for planning. However, the first challenge to address is map generation. While move_base primarily uses 2D grid maps, the default map format generated by ORB-SLAM3 is a point cloud map. Therefore, we need to find a method to convert the point cloud map into a 2D grid map, whether it's a real-time conversion during recording or post-processing of point cloud map .bag files.

ORB-SLAM3 Grid Mapping with Monocular Camera

We first attempted a grid mapping modification of ORB-SLAM3 with the monocular camera on the D435. We referred to a monocular version available on GitHub, but the monocular SLAM results were unsatisfactory.

Gmapping + LaserScan Grid Mapping / map_server Map Transformation

We then attempted to use gmapping in conjunction with LaserScan data from the Ouster OS-1 LiDAR for mapping. However, due to the high performance of the Ouster OS-1, which exceeded the limitation of 1440 points per frame imposed by gmapping, we had to reduce the LiDAR's detection range. Even after significantly reducing the range and point density to a point that severely affected mapping quality, gmapping still failed to effectively generate maps. During this process, we also encountered issues where the os_sensor topic provided by Ouster couldn't establish a connection with the tf transform nodes (Figure 6).Both /pointcloud_to_laserscan and /base_link_to_laser are connected to /tf and /scan, potentially causing error.

Figure 6. The tf Node Graph Showing Error

We also attempted to use map_server for post-processing, trying to convert the previously recorded .bag files from ORB-SLAM3 into grid maps. However, these efforts were not very successful in achieving the desired map conversion and quality.

Week 6: Path Planning & Control Hands On

Continuing the work from Week 5, we explored effective grid mapping methods and used move_base to let the cart use our map, locate itself on the map, and control itself to move.

Octomap & FAST-Lio Mapping

Under the suggestion of MAGIC Lab student Wu Ke, we used Octomap as our mapping tool. Our first attempt resulted in a failure due to the lack of odometer, as shown in Figure 7. This rounded map happens because without odometer, Octomap would not be able to know where the cart is and which angle it is pointed, thus generating a map that seems to stay in one origin.

Figure 7. Failed Mapping without Odometer

Later, we managed to address the issue by using FAST-LIO for odometry calculations. However, we then encountered another problem of excessive noise in the generated map (in Figure 8, the black areas represent obstacles, even though there were no obstacles in the corridor of the seventh floor of the green area). We were able to improve the situation by fine-tuning the Ouster OS1 LiDAR's detection range: setting the maximum range to 0.35 and the minimum range to -0.2. During this process, a few minor noise points emerged, which we eventually traced back to the operator's movements. The dynamic recognition of FAST-LIO was being confused, and the issue was resolved when the operator stayed out of the LiDAR's field of view (as shown in Figure 9).

Figure 8. Map with Noise

Figure 9. Fixed Succeful Map

Figure 10. Launch File with Tuned LiDAR Range (Row 15 and 16)

As a result of these efforts, we successfully recorded effective two-dimensional grid maps in PGM and YAML formats. The maps were significantly improved in quality, addressing the noise problems that were initially encountered.

move_base Planning

After successfully obtaining the map, move_base can be utilized as a global and local path planner to control the robot's movements. It dynamically updates the map based on information from the LiDAR, adjusts motion, and performs obstacle avoidance. move_base requires five configuration files: costmap_common_params, global_costmap_params, global_planner_params, local_costmap_params, and teb_local_planner. These files correspond to the global and local cost maps, which reflect the obstacle levels, and planners that determine the path.

In the initial experiment, using a too large inflation factor led to the phenomenon shown in Figure 12, where the robot became hesitant to move forward. This required adjusting the inflation parameter. The inflaction parameter is a parameter of how large move_base inflates the obstacles so that it can drive safely, the smaller the inflation parameter, the smaller the obstacles seem for move_base. The video below demonstrates the robot's performance in RViz when move_base plans a path to a destination. However, later in the video, due to issues with the inflation factor, the robot ends up colliding with a wall, necessitating human intervention.

Figure 11. tf Tree under move_base

Figure 12. Untuned Inflation Parameter Resulting in Obstacles to be Overinflated

On the last day we planned on adjusting parameters, including the inflation parameter, but unfortunately the on-cart Nvidia Orin burned out and was no longer operational. But the final outcome is that the cart is able to know its location on the pre-recorded map, receive a location marked on the map by the operator, and drive itself to the marked location, as shown in the viedo (remote control shown in the video to prove the cart is indeed driving itself and not controlled by a human).