Flow Matching: A visual introduction

Flow Matching (FM) has become a prevalent technique to train a certain class of generative models. In this post we'll try to explore the intuition behind flow matching and how it works.

We'll use this notebook to build a simple flow matching model illustrating linear flow matching based on a minimal toy example. Our goal is to try to keep things simple, intuitive, and visual. We won't be doing any deep dive into the mathematical details of the model, if you're interested in the mathematical details I recommend checking out the references at the end of this post.

In [1]:

Flow matching

Flow matching is a technique to learn how to transport samples from one distribution to another. For example we could learn how to transport samples from a simple distribution we can easily sample from (e.g. Gaussian noise) to a complex distribution (e.g. images , videos , robot actions , etc.).

Toy Example: Mapping Gaussian noise to a bimodal distribution

In this post we'll build a simple toy example of a generative model using flow matching. For illustrative purposes we'll start with a simple 1D bimodal target distribution $π_1$ and learn how to transport samples from a 1D Gaussian noise distribution $π_0$ to this target distribution.

In practice the target points $x_1 \sim π_1$ are approximated by sampling from a limited dataset of training points $X_1$ and the noise points $x_0 \sim π_0$ are sampled from a chosen noise distribution $π_0$ that is easy to sample from (e.g. Gaussian noise).

In [2]:
No description has been provided for this image

The flow matching model predicts a velocity field

A flow matching model does not predict flow paths directly, but instead predicts a velocity field that can be used to sample the flow paths. The velocity field describes how to move a sample from the noise distribution to the target distribution.

We can describe the flow matching model with learnable parameters $\theta$ as a function: $${FM}_{\theta}(x_t, t) = v(x_t, t)$$ This function takes a sample $x_t$ at flow step $t$ and predicts the velocity vector $v(x_t, t) = dx_t / dt$ that describes how to move the sample $x_t$ closer to the target distribution at step $t$.

The step $t$ is a value between 0 and 1 that describes the progress of the sample $x_t$ along the flow path from the noise distribution to the target distribution. When $t=0$ the sample $x_t = x_0$ is a sample from the noise distribution $π_0$ and when $t=1$ the sample $x_t = x_1$ is a sample from the target distribution $π_1$.

At inference time we can sample a starting point $x_0$ from the noise distribution $π_0$ and then use the predicted velocity field ${FM}_{\theta}(x_t, t)$ to iteratively move the sample towards the target distribution $π_1$ in small steps $dt$

This is illustrated in the following animation ( generated further down in the notebook ) which shows the integration of a sample from the noise distribution $π_0$ on the left towards the target distribution $π_1$ on the right using the predicted velocity field ${FM}_{\theta}(x_t, t)$. The velocity field is visualized as a heatmap where the vertical axis represents the position of the sample $x_t$ and the horizontal axis represents the flow step $t$ going from 0 on the left to 1 on the right. Red means a positive velocity (sample pushed up towards higher $x$) and blue means a negative velocity (sample pulled down towards lower $x$).

In [3]:

Training the flow matching model is learning the velocity field

Since the flow matching model ${FM}_{\theta}(x_t, t)$ should predict the velocity field $v(x_t, t) = dx_t / dt$ we can train the model on samples of velocity vectors $\mathbf{v}(x_t, t)$.

The flow matching training objective is to minimize the expected reconstruction error of the velocity field: $$ \underset{\theta}{\text{argmin}} \; \mathbb{E}_{t, x_t} \Big\| {FM}_{\theta}(x_t, t) - v(x_t, t) \Big\|^2 $$

with $t \sim \mathcal{U}[0, 1]$ and $x_t$ taken from a sampled reference path evaluated at flow step $t$.

We'll be using straight line reference paths in this post since they are simple and common.

Training: Straight line reference paths

We're going to focus on a common variant of flow matching where we learn a flow matching model based on straight line reference paths. Training flow matching with straight-line conditional paths and independent couplings is also equivalent to the rectified flow training objective.

Linear (straight line) flow matching is trained on a set of reference paths between the noise and target distributions. More specifically, linear flow matching prefers learning from straight line trajectories between the noise and target distributions because they tend to give straighter paths that require fewer steps to reconstruct the target distribution.

To sample a reference path we can independently sample a target point $x_1$ from our target distribution $π_1$ and independently sample a noise point $x_0$ from the noise distribution $π_0$. This gives us a single coupling $(x_0, x_1)$ that allows us to define a straight line reference path between the noise and target samples. During training we'll sample a large set of coupling-inducing paths $(X_0, X_1)$ and use these to train the flow matching model.

The following code illustrates how we define the straight line reference path between a noise and target sample.

In [4]:
def interpolate_linear(x_0, x_1, t):
    """Evaluates the linear interpolation path between x_0 and x_1 at step t."""
    x_t = ((1 - t) * x_0) + (t * x_1)
    return x_t

The following figure shows a few sampled straight-line reference paths, as well as the reference path distribution approximated by sampling a large number of straight-line reference paths.

In [5]:
No description has been provided for this image

Training: Sampling velocity vectors

Since we are using straight-line reference paths, the sampled velocity vectors $\mathbf{v}(x_t, t)$ have a very simple form. Given a sample from the noise distribution $x_0$ and a sample from the target distribution $x_1$ we can describe the conditional velocity vector along the straight-line connecting $x_0$ and $x_1$ as: $\mathbf{v}(x_t, t) = x_1 - x_0$ as illustrated in the following code and figure.

In [6]:
def get_target_velocity(x_0, x_1):
    """
    Get the velocity for a given pair of noise and target points.
    This is the per-pair (conditional) velocity along the straight path.
    """
    return x_1 - x_0
In [7]:
No description has been provided for this image

Training: Flow matching objective

We can now write out our objective as a function of the samples from the noise distribution $x_0$ and the target distribution $x_1$: $$ \underset{\theta}{\text{argmin}} \; \mathbb{E}_{t, X_0, X_1} \Big\| {FM}_{\theta}(x_t, t) - (X_1 - X_0) \Big\|^2 \quad\quad $$ with $t \sim \mathcal{U}[0, 1]$, $X_0 \sim \pi_0$, $X_1 \sim \pi_1$, and $x_t = (1 - t) X_0 + t X_1$.

Note that the flow matching model ${FM}_{\theta}(x_t, t)$ is trained conditionally on specific straight-line couplings $(X_0, X_1)$, but since these are averaged out in the training objective, the flow matching model will learn an approximation of the velocity field independent of any specific coupling.

For this simple toy example we could even approximate the flow field directly by sampling a large number of reference paths and computing the average velocity for fixed bins over the flow field. This approximated expectation is illustrated in the following figure, which shows the average flow field. Red means a positive velocity (sample pushed up towards higher $x$) and blue means a negative velocity (sample pulled down towards lower $x$).

In [8]:
No description has been provided for this image

Training the Flow Matching model

Now that we have defined our optimization objective, and how we can sample the data to train the model, we can define the flow matching model and train it. We'll create a simple neural network with a single hidden layer that we can train to predict the velocity field.

In [9]:
class FlowMatchingModel(nn.Module):
    """
    Flow Matching model to predict the velocity field at time t and position x_t.
    """

    def __init__(self, data_dim: int, hidden_dim: int) -> None:
        super().__init__()
        # Simple MLP
        self.net: nn.Sequential = nn.Sequential(
            nn.Linear(data_dim + 1, hidden_dim),  # +1 for time embedding
            nn.GELU(),
            nn.Linear(hidden_dim, hidden_dim),
            nn.GELU(),
            nn.Linear(hidden_dim, data_dim),
        )

    def forward(
        self,
        t: torch.Tensor,  # Denoising step [batch_size, 1]
        x_t: torch.Tensor,  # Interpolated samples [batch_size, data_dim]
    ) -> torch.Tensor:  # [batch_size, data_dim]
        """
        Predicts the velocity field at time t and position x_t.
        """
        tx: torch.Tensor = torch.cat([t, x_t], dim=-1)
        return self.net(tx)

We can now define the loss function as a function of the flow matching model, the noise samples $X_0$, the target samples $X_1$, and the flow steps $T$:

In [10]:
def compute_loss(
    flow_matching_model: FlowMatchingModel,
    x_0: torch.Tensor,
    x_1: torch.Tensor,
    t: torch.Tensor,
) -> torch.Tensor:
    """
    Compute the loss for a single batch of (X_0, X_1) couplings and flow steps T.
    """
    # Interpolate the data at the sampled time step
    x_t = interpolate_linear(x_0=x_0, x_1=x_1, t=t)
    # Get the target velocity
    v_target = get_target_velocity(x_0=x_0, x_1=x_1)
    # Predict the velocity
    v_pred = flow_matching_model(t=t, x_t=x_t)
    # Compute the loss
    loss = ((v_pred - v_target) ** 2).mean()
    return loss

Using this loss function we can now train the flow matching model in a straightforward gradient-based optimization loop. We'll use a standard Adam optimizer to optimize the model parameters.

In [11]:
# Train the flow matching model

# Hyperparameters
data_dim: int = 1  # 1D data
hidden_dim: int = 64
nb_train_iterations: int = 10_000
lr: float = 1e-3
batch_size: int = 256

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(626)

# Initialize the vector field network and optimizer
flow_matching_model = FlowMatchingModel(data_dim=data_dim, hidden_dim=hidden_dim).to(DEVICE).train()
optimizer = optim.Adam(flow_matching_model.parameters(), lr=lr)

# Training loop
losses: list[float] = []
with tqdm(range(nb_train_iterations), desc="Training", unit="iteration") as progress_bar:
    for i in progress_bar:
        # Sample a batch of target and noise samples
        x_1 = torch.from_numpy(mixture_sample(size=batch_size)).to(dtype=torch.float32, device=DEVICE).unsqueeze(-1)
        x_0 = torch.randn_like(x_1)
        # Sample a random time step for each sample in the batch
        t = torch.rand(x_1.shape[0], device=DEVICE).unsqueeze(-1)

        # Compute the loss
        loss = compute_loss(flow_matching_model=flow_matching_model, x_0=x_0, x_1=x_1, t=t)

        # Backpropagate the loss
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        losses.append(loss.item())
        progress_bar.set_postfix({"Loss": f"{loss.item():.2f}"})
In [12]:
No description has been provided for this image

Visualizing the trained flow matching model

Now that we have trained this simple flow matching model we can visualize the learned velocity field by getting the predicted velocity field ${FM}_{\theta}(x_t, t)$ at a grid of points $(t, x_t)$ and plotting this grid of velocities as a color image. Red means a positive velocity (sample pushed up towards higher $x$) and blue means a negative velocity (sample pulled down towards lower $x$).

In [13]:
No description has been provided for this image

Sampling from the trained model

At inference time we can sample a starting point $x_0$ from the noise distribution $π_0$ and then use the predicted velocity field ${FM}_{\theta}(x_t, t)$ to iteratively move (integrate) the sample towards a sample $\hat{x}_1$ from the target distribution $π_1$.

The code below starts with noise $ x_0 \sim \mathcal{N}(0, 1)$ and integrates the learned ODE using the simple Euler method . The Euler method is a simple integration method that at each step $t$ takes the velocity field prediction ${FM}_{\theta}(x_t, t)$ at the current position $x_t$ and moves the sample a small step $dt$ in the direction of the velocity field.

In [14]:
# Illustration on how to sample x_1 from x_0 using the learned velocity field
nb_steps = 15
path_x = np.zeros(nb_steps + 1)  # Array to store the full sampled path
t_steps = np.linspace(0, 1, nb_steps + 1)  # Steps $t$ in the range [0, 1]

# x_0 starting point (Pre-selected here for the example, but ideally x_0 ~ N(0, I
x_0 = torch.Tensor([[0.85]]).to(DEVICE)

with torch.inference_mode():
    flow_matching_model.eval()
    x_t = x_0  # Initialize the sample at the starting point
    path_x[0] = x_t.squeeze().cpu().numpy()
    # Integrate the velocity field using Euler integration from t=0 to t=1
    for i in range(nb_steps):
        t = t_steps[i]  # Current step $t$
        dt = t_steps[i + 1] - t_steps[i]  # Step size
        t_batch = torch.Tensor([[t]]).to(DEVICE)  # Expand the step to a batch dimension
        # Get the velocity field prediction at the current position and time step and move the sample a small step dt in the direction of the velocity field
        x_t = x_t + flow_matching_model(t=t_batch, x_t=x_t) * dt
        path_x[i + 1] = x_t.squeeze().cpu().numpy()


display(HTML(pd.DataFrame({"t": t_steps, "x": path_x}).transpose().to_html()))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
t 0.000 0.067 0.133 0.200 0.267 0.333 0.400 0.467 0.533 0.600 0.667 0.733 0.800 0.867 0.933 1.000
x 0.850 0.805 0.767 0.738 0.719 0.716 0.731 0.769 0.830 0.909 0.999 1.095 1.192 1.288 1.384 1.481

We can illustrate this sampled path in the following animation which shows the integration from the noise sample $x_0$ towards the target distribution $\hat{x}_1$ using the predicted velocity field ${FM}_{\theta}(x_t, t)$ above. The velocity field is visualized as a heatmap where the vertical axis represents the position of the sample $x_t$ and the horizontal axis represents the flow step $t$ going from 0 on the left to 1 on the right. Red means a positive velocity (sample pushed up towards higher $x$) and blue means a negative velocity (sample pulled down towards lower $x$).

Notice that while we trained on straight-line paths, the sampled path it not necessarily a straight line. This is because we don't learn the paths directly but learn the unconditioned velocity field by training on a large set of straight-line reference paths.

In [15]:

We can also take a large sample from the model $\hat{X}_1$ and reconstruct the target distribution $\pi_1$. We'll define a sample function that will generate samples by integrating the learned vector field using Euler integration . We'll then plot the target distribution and the reconstructed samples.

In [16]:
@torch.inference_mode()
def sample(
    n_samples: int,  # Number of samples to generate
    model: FlowMatchingModel,  # The flow matching model
    nb_steps: int,  # Number of Euler integration steps
) -> torch.Tensor:
    """Generates samples by integrating the learned vector field using Euler integration."""
    ts = torch.linspace(0, 1, nb_steps + 1, device=DEVICE)
    x_t = torch.randn(n_samples, data_dim).to(DEVICE)  # Sample x_0 ~ N(0, I)
    for i in range(nb_steps):  # Euler integration from t=0 to t=1 (last step happens just before t=1)
        t = ts[i]  # Current step $t$
        dt = ts[i + 1] - ts[i]  # Step size
        t_batch = t.expand(n_samples).unsqueeze(-1)
        # Move the sample a small step dt in the direction of the velocity field
        x_t = x_t + model(t=t_batch, x_t=x_t) * dt
    return x_t  # Final sample x_1
In [17]:
No description has been provided for this image

As a final illustration, let's illustrate the the path density between the starting noise samples $\hat{X}_0$ and the final reconstructed samples $\hat{X}_1$ by sampling a large number of paths from the noise distribution $\pi_0$ to the target distribution $\pi_1$.

In [18]:
No description has been provided for this image

Summary

To conclude we've implemented a simple flow matching model and trained it on 1D toy data. The 1D toy data allowed us to easily visualize the flow matching model, the velocity field, and the sampled paths.

In real-world applications the target distribution is not known and more complex, resulting in a more complex vector field, which typically requires using more complex models to learn the vector field and more sophisticated sampling strategies to sample from the model.

References and further reading

In [19]:
Python implementation: CPython
Python version       : 3.12.10
IPython version      : 9.6.0

torch     : 2.8.0+cu128
IPython   : 9.6.0
numpy     : 2.3.3
scipy     : 1.16.2
tqdm      : 4.67.1
matplotlib: 3.10.6
pandas    : 2.3.3
seaborn   : 0.13.2

This post at peterroelants.github.io is generated from an IPython notebook file. Link to the full IPython notebook file