Reading Time

3 Minutes

Daniil Osokin

Developer at OpenCV.ai

Revolutionizing Object Tracking: "Tracking Everything Everywhere All at Once" Paper Review

Tech track #1. "Tracking Everything Everywhere All at Once" review

The "Tracking Everything Everywhere All at Once" paper, a collaborative work by Cornell University, Google Research, and UC Berkeley, offers a breakthrough solution to the problem of tracking any point in video footage. The method maps each pixel from every frame into a common 3D space, tracing its trajectory across time. This method is not designed for real-time tracking but for in-depth recorded video analysis. In this article, we highlight the most exciting points of this paper.

June 15, 2023

Introduction

Last week Cornell University, Google Research, and UC Berkeley published a paper with an intriguing title "Tracking Everything Everywhere All at Once". It proposes a solution for the track any point problem. As the name hints, it tracks everything everywhere - meaning pixels across all frames, even if they are occluded. See the beautiful visualizations from the paper’s OmniMotion website:

objects trajectory tracking

So, if you select a specific pixel, you can find its coordinates on all previous frames, as well as on all next frames. That is fantastic! However, we believe (while no code is available) this pixel should be a good feature to track, e.g. corner, not from single-color or low-textured areas.

Two things to note:

The first thing is that this algorithm is intended to work on the whole video all at once. It runs an optimization process given all video frames, so it is not designed for real-time tracking. However, it might be useful for the analysis of surveillance videos or sports analytics.
‍

Second thing: it needs to run an external algorithm for supervision to perform the track optimization process. The authors compute the RAFT optical flow for all frame pairs before the optimization.

Idea

The authors propose to map each object pixel from every frame to a single common 3D space (the canonical 3D volume G in the paper). Thus one point in this space corresponds to a 3D trajectory across time. They use two algorithms to create this 3D space:

NeRF to model the volume density and color prediction.

Invertible neural networks capture the camera parameters and scene motion from different frames.

‍

The algorithm overview is shown in the picture below:

Two algorithms to create the 3D space

The loss functions are flow loss, photometric loss, and penalty for large displacement between consecutive 3D point locations to ensure temporal smoothness of the 3D motion. OmniMotion compares tracking results on the target TAP-Vid benchmark and shows an impressive improvement.

‍

Object tracking remains one of the hard computer vision problems. This paper significantly advances the state-of-the-art using complex but elegant algorithms. It is a great research paper - and we are looking forward to trying it live!

Paper: https://arxiv.org/pdf/2306.05422.pdf

Let's discuss your project

Book a complimentary consultation

Read also

May 7, 2024

Computer Vision in Sports: People Train and Compete — Machines Watch and Help

At the upcoming 2024 Olympic Games in Paris, the world will see the most advanced AI and computer vision systems for sports developed by Intel. These systems will not only help capture athletic performance with millimeter and millisecond accuracy but also create 3D models of athletes for replays and analyzing complex situations. The data and models will be available to both referees and spectators. Artificial intelligence and computer vision systems in sports are no longer a high-tech novelty but an everyday reality. People train, challenge, and watch others compete — and hundreds of tech companies are helping to make it safer and more efficient. And more fun, too!

April 16, 2024

Which GPUs are the most relevant for Computer Vision

In the field of CV, selecting the appropriate hardware can be tricky due to the variety of usable machine-learning models and their significantly different architectures. Today’s article explores the criteria for selecting the best GPU for computer vision, outlines the GPUs suited for different model types, and provides a performance comparison to guide engineers in making informed decisions.

April 12, 2024

Digest 19 | OpenCV AI Weekly Insights

Dive into the latest OpenCV AI Weekly Insights Digest for concise updates on computer vision and AI. Explore OpenCV's distribution for Android, iPhone LiDAR depth estimation, simplified GPT-2 model training by Andrej Karpathy, and Apple's ReALM system, promising enhanced AI interactions.