Representing Spatial Trajectories as Distributions

Columbia University
NeurIPS 2022

Overview

Abstract

We introduce a representation learning framework for spatial trajectories. We represent partial observations of trajectories as probability distributions in a learned latent space, which characterize the uncertainty about unobserved parts of the trajectory. Our framework allows us to obtain samples from a trajectory for any continuous point in time—both interpolating and extrapolating. Our flexible approach supports directly modifying specific attributes of a trajectory, such as its pace, as well as combining different partial observations into single representations. Experiments show our method’s advantage over baselines in prediction tasks.

Model predictions

Press the blue arrows to show or hide the different experiments.

Future Video Prediction

Given the past segment of a trajectory (shown as an RGB video with a human skeleton overlapped), our model predicts the future of the trajectory (skeleton-only part of the video). Only a few of the shown frames are actually input to the model, but we show more to give more context.

Interpolation Between Clips

Given the past and future segments of a trajectory (shown as an RGB video with a human skeleton overlapped), our model predicts the segment in between the past and the future, shown as a skeleton without RGB. For each of the examples, you will first see an RGB video, followed by the model's prediction, and followed after that by a second part of the RGB video. The two RGB video parts represent the input to the model (the input is just the skeleton keypoints that are overlapped on top of the RGB). Only a few of the shown frames are actually input to the model, but we show more to give more context.

Multi-mode prediction

Our model represents trajectories as distributions. It is therefore capable of predicting multiple different (and plausible) futures given a past. Next, we show some examples where we sample three different trajectories given a past clip.

Temporal editing

Moving along specific directions in the latent space results in progressively faster trajectories, or in trajectories with an increasing temporal offset with respect to the original one.

Speed

We show the decoded trajectory that results in sampling a point in the latent space, and moving along the "speed direction" in the latent space. For each example we show five clips that go from slow (left) to fast (right). Press "play" to play.

Temporal offset

We show the decoded trajectory that results in sampling a point in the latent space, and moving along the "temporal offset direction" in the latent space. For each example we show five clips that go from starting early (left) to starting later (right). Press "play" to play.

Method

Schematic
Schematic of our framework. We show the input space ℝK, the latent space ℝN, and the mappings between the two (encoder Θ and decoder Φ). A segment s belonging to a trajectory u is encoded into a distribution Q, from which a trajectory z is sampled and decoded at a time t, to get x̂t.

When observing a segment of a trajectory, one may have some uncertainty about the rest of the trajectory. For instance, a segment showing a person jumping may correspond to a trajectory that continues with the person falling, or to a trajectory that proceeds with them doing a backflip and landing on their feet, but it will not belong to a trajectory of a person swimming. Therefore, we represent the segment as a distribution over trajectories, which allow us to represent the likelihood of a trajectory given the segment. During training, the goal is to learn this mapping from the input space (segments of trajectories) to the latent space (distributions over trajectories).

Concretely, given two segments that have been obtained from the same underlying trajectory, we want some trajectory to exist such that its likelihood under the distributions representing each of the segments is high. To encourage this, we train the model to maximize the overlap between these distributions. Similarly, we minimize the overlap between (the distribution representations of) segments sampled from different trajectories, under the assumption that no trajectory exists that contains both segments.

Examples of segments. We illustrate how spatial trajectories (left) are ideally encoded into the latent space (right). The intersection between two segment representations (boxes in the figure) represents the trajectories that contain the two segments. “Future given past” represents a segment decoded at a future time, from a trajectory sampled from the past representation. It is effectively a sample of a possible future given the past. Other segments are defined similarly. For clarity, we do not show other options like “past given past”, which would be the same box as past P. Best viewed in color.
Schematic of inputs and outputs

We train a triplet loss that requires positives and negatives among the different segments. These positives and negatives can be defined by understanding the relationship between the segments: segments that could belong to the same trajectory are positives, and segments that cannot belong to the same trajectory are treated as negatives.

In practice, we represent the distributions Q (which represent segments, and are distributions over the trajectories these segments could have been sampled from) as either Normal distributions or box embeddings.

Citation

@inproceedings{suris2022trajectories,
 title={Representing Spatial Trajectories as Distributions},
 author={Sur\'is, D\'idac and Vondrick, Carl},
 year={2022},
 booktitle={Advances in Neural Information Processing Systems 35 (NeurIPS)},
}

Acknowledgements

We thank Arjun Mani and Mia Chiquier for helpful feedback. This research is based on work partially supported by the NSF NRI Award #2132519 and the DARPA MCS program. DS is supported by the Microsoft PhD Fellowship. The webpage template was inspired by this and this project pages.