A Learning Vision System

Amir Bousani
Apr 15, 2022
3 min read

Updated: May 31, 2022

RGo Robotics has recently unveiled Perception Engine, our innovative learning vision system technology. It delivers a revolutionary way of learning an environment and performing localization along with other, higher levels of perception.

To fully understand the benefits of this disruptive new technology, let’s look at where mobile robotic systems are today. Currently, most rely on 2D LiDAR laser scanners for localization. Perception Engine exceeds the capabilities of 2D LiDAR, creating true, 3D robotic perception.

When You Just See a Slice of the World, It’s Very Easy to Get Confused

2D LiDAR provides a very accurate two-dimensional scan of a single slice of the world. If the room boundaries (at the scanners height) are known, laser scanner information can be used to calculate where we are within the room. If you know your distance from all boundaries, there are very few options for where you are. And if you know something about how you got there, that is usually enough to know your exact position.

Example of a 2D laser scanner view — An example of a 2D laser scanner view

The challenges typically occur during initial setup and when the robot can’t see the expected boundaries well enough.

A long installation time is required as you need a very accurate map before you can start. The map is derived from a dense and noisy point cloud that requires post-processing, and frequently, manual corrections to remove non-stationary items that the laser scanner saw during installation. For many systems, the map may take multiple days to generate and clean up, involving expensive field installation resources and allocation of customer resources and facility downtime.

A second challenge is working in unstructured and dynamic environments. If objects are moving around the scanner, these can confuse your localization by creating wrong boundaries. Even objects that are static now (for example, a pallet) might be moved tomorrow and confuse the system. Open and unstructured environments can be even more challenging as they may not provide enough natural features at the laser level to allow for accurate position calculations.

Ramps and uneven surfaces are another classic challenge. If at any point the scanner is not parallel to the ground, you will see items at different heights. A tiny tilt up can create huge difference at long distance.

Identifying Key Points to Create More Robust Perception

Perception Engine allows us to see more widely and navigate in 3D space just like humans and most animals do. It is harder to implement, but so much more robust.

With our technology, we take pictures at different points along our path. We look at each frame and pick interesting points that we can track (which points—and what makes them so interesting—will be covered in a separate post). Then we look for the same point (a.k.a. “feature”) in the subsequent frame, and measure how much it moved (in angles – for axis X and Y).

By doing this with a few points, we can estimate how much we moved between these frames, and the relative distance to each point. Doing that over the full path will give us the estimated position at each point, and some 3D points we estimated along the way.

It isn’t easy. The detection and tracking algorithms are complex and getting them done both robustly and at high accuracy in varied environments requires a lot of experience. We need to overcome the estimation errors for each point, and incorrect matches in the detection and tracking algorithms. There are also points that do not match with the movement of the other points, like on moving objects. These need to be ignored, and to be marked as outliers. There are cases where almost everything the camera sees are outliers.

Here is an example of how it works. These are two pictures taken one after the other on a moving camera:

The photo below provides an example of how points moved from one picture to the other over the same plane. The white dot is the original pixel in the first picture. The red dot is the matching pixel in the second frame. The green line between them shows the change vector.

This approach provides good short-term accuracy, with low frame to frame error. Still, over a long travel path, these small errors accumulate, especially if the machine had not yet returned to its origin (known as “loop closure”).

At RGo Robotics, we have developed a unique additional layer of algorithms that solves this problem. We use visual learning to remember key points and structures in the environment around us. We generate a unique descriptor for them, and constantly compare everything we see to the structures seen previously. Whenever there is a match, we can fix our pose estimation without accumulating error. This allows us to estimate with confidence based on the number of matches we have, guaranteeing high accuracy and confidence in every environment we had already seen.

For any environment using mobile robots, Perception Engine represents a real breakthrough in visual pose estimation, utilizing the true value of seeing the world.

A Learning Vision System

Recent Posts

Comments

TECHNOLOGY

SOLUTIONS

ABOUT