Tiny Quadrotor Learns to Fly in 18 Seconds

Drones, Reinforcement learning, Robotics, Simulation

It’s kind of astonishing how quadrotors have scaled over the past decade. Like, we’re now at the point where they’re verging on disposable, at least from a commercial or research perspective—for a bit over US $200, you can buy a little 27-gram, completely open-source drone, and all you have to do is teach it to fly. That’s where things do get a bit more challenging, though, because teaching drones to fly is not a straightforward process. Thanks to good simulation and techniques like reinforcement learning, it’s much easier to imbue drones with autonomy than it used to be. But it’s not typically a fast process, and it can be finicky to make a smooth transition from simulation to reality.

New York University’s Agile Robotics and Perception Lab has managed to streamline the process of getting basic autonomy to work on drones, and streamline it by a lot: The lab’s system is able to train a drone in simulation from nothing up to stable and controllable flying in 18 seconds flat on a MacBook Pro. And it actually takes longer to compile and flash the firmware onto the drone itself than it does for the entire training process.


So not only is the drone able to keep a stable hover while rejecting pokes and nudges and wind, but it’s also able to fly specific trajectories. Not bad for 18 seconds, right?

One of the things that typically slows down training times is the need to keep refining exactly what you’re training for, without refining it so much that you’re only training your system to fly in your specific simulation rather than the real world. The strategy used here is what the researchers call a curriculum (you can also think of it as a sort of lesson plan) to adjust the reward function used to train the system through reinforcement learning. The curriculum starts things off being more forgiving and gradually increasing the penalties to emphasize robustness and reliability. This is all about efficiency: Doing that training that you need to do in the way that it needs to be done to get the results you want, and no more.

There are other, more straightforward, tricks that optimize this technique for speed as well. The deep-reinforcement learning algorithms are particularly efficient, and leverage the hardware acceleration that comes along with Apple’s M-series processors. The simulator efficiency multiplies the benefits of the curriculum-driven sample efficiency of the reinforcement-learning pipeline, leading to that wicked-fast training time.

This approach isn’t limited to simple tiny drones—it’ll work on pretty much any drone, including bigger and more expensive ones, or even a drone that you yourself build from scratch.

Jonas Eschmann

We’re told that it took minutes rather than seconds to train a policy for the drone in the video above, although the researchers expect that 18 seconds is achievable even for a more complex drone like this in the near future. And it’s all open source, so you can, in fact, build a drone and teach it to fly with this system. But if you wait a little bit, it’s only going to get better: The researchers tell us that they’re working on integrating with the PX4 open source drone autopilot. Longer term, the idea is to have a single policy that can adapt to different environmental conditions, as well as different vehicle configurations, meaning that this could work on all kinds of flying robots rather than just quadrotors.

Everything you need to run this yourself is available on GitHub, and the paper is on ArXiv here.