We introduce a representation learning framework for spatial trajectories. We represent partial observations of trajectories as probability distributions in a learned latent space, which characterize the uncertainty about unobserved parts of the trajectory. Our framework allows us to obtain samples from a trajectory for any continuous point in time—both interpolating and extrapolating. Our flexible approach supports directly modifying specific attributes of a trajectory, such as its pace, as well as combining different partial observations into single representations. Experiments show our method’s advantage over baselines in prediction tasks.
Press the blue arrows to show or hide the different experiments.
Given the past segment of a trajectory (shown as an RGB video with a human skeleton overlapped), our model predicts the future of the trajectory (skeleton-only part of the video). Only a few of the shown frames are actually input to the model, but we show more to give more context.
Given the past and future segments of a trajectory (shown as an RGB video with a human skeleton overlapped), our model predicts the segment in between the past and the future, shown as a skeleton without RGB. For each of the examples, you will first see an RGB video, followed by the model's prediction, and followed after that by a second part of the RGB video. The two RGB video parts represent the input to the model (the input is just the skeleton keypoints that are overlapped on top of the RGB). Only a few of the shown frames are actually input to the model, but we show more to give more context.
Our model represents trajectories as distributions. It is therefore capable of predicting multiple different (and plausible) futures given a past. Next, we show some examples where we sample three different trajectories given a past clip.
Moving along specific directions in the latent space results in progressively faster trajectories, or in trajectories with an increasing temporal offset with respect to the original one.
We show the decoded trajectory that results in sampling a point in the latent space, and moving along the "speed direction" in the latent space. For each example we show five clips that go from slow (left) to fast (right). Press "play" to play.
We show the decoded trajectory that results in sampling a point in the latent space, and moving along the "temporal offset direction" in the latent space. For each example we show five clips that go from starting early (left) to starting later (right). Press "play" to play.
When observing a segment of a trajectory, one may have some uncertainty about the rest of the trajectory. For instance, a segment showing a person jumping may correspond to a trajectory that continues with the person falling, or to a trajectory that proceeds with them doing a backflip and landing on their feet, but it will not belong to a trajectory of a person swimming. Therefore, we represent the segment as a distribution over trajectories, which allow us to represent the likelihood of a trajectory given the segment. During training, the goal is to learn this mapping from the input space (segments of trajectories) to the latent space (distributions over trajectories).
Concretely, given two segments that have been obtained from the same underlying trajectory, we want some trajectory to exist such that its likelihood under the distributions representing each of the segments is high. To encourage this, we train the model to maximize the overlap between these distributions. Similarly, we minimize the overlap between (the distribution representations of) segments sampled from different trajectories, under the assumption that no trajectory exists that contains both segments.
We train a triplet loss that requires positives and negatives among the different segments. These positives and negatives can be defined by understanding the relationship between the segments: segments that could belong to the same trajectory are positives, and segments that cannot belong to the same trajectory are treated as negatives.
In practice, we represent the distributions Q (which represent segments, and are distributions over the trajectories these segments could have been sampled from) as either Normal distributions or box embeddings.
@inproceedings{suris2022trajectories,
title={Representing Spatial Trajectories as
Distributions},
author={Sur\'is, D\'idac and Vondrick, Carl},
year={2022},
booktitle={Advances in Neural Information Processing Systems 35 (NeurIPS)},
}
We thank Arjun Mani and Mia Chiquier for helpful feedback. This research is based on work partially supported by the NSF NRI Award #2132519 and the DARPA MCS program. DS is supported by the Microsoft PhD Fellowship. The webpage template was inspired by this and this project pages.