Gaussian processes (3/3) - exploring kernels
This post will go more in-depth in the kernels fitted in our example fitting a Gaussian process to model atmospheric CO₂ concentrations . We will describe and visually explore each part of the kernel used in our fitted model, which is a combination of the exponentiated quadratic kernel, exponentiated sine squared kernel, and rational quadratic kernel. This post is the last part of a series on Gaussian processes:
# Imports
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import warnings
warnings.simplefilter("ignore")
from collections.abc import Callable, Mapping
from typing import TypeAlias
import jax
jax.config.update("jax_enable_x64", True)
import jax.numpy as jnp
import numpy as np
from numpy.typing import NDArray
import matplotlib.pyplot as plt
from matplotlib.figure import Figure
from matplotlib.gridspec import SubplotSpec
from mpl_toolkits.axes_grid1 import make_axes_locatable
import matplotlib.gridspec as gridspec
import seaborn as sns
Array: TypeAlias = jax.Array
ArrayLike: TypeAlias = Array | np.ndarray
KernelMatrix: TypeAlias = NDArray[np.float64]
KernelFn: TypeAlias = Callable[..., KernelMatrix]
FittedParams: TypeAlias = Mapping[str, float]
sns.set_style("darkgrid")
np.random.seed(42)
#
Kernel function
A kernel (or covariance function) describes the covariance of the Gaussian process random variables. Together with the mean function the kernel completely defines a Gaussian process.
In the first post we introduced the concept of the kernel which defines a prior on the Gaussian process distribution. To summarize the kernel function $k(x, x')$ models the covariance between each pair in $x$. The kernel function together with the mean function $m(x)$ define the Gaussian process distribution:
$$y \sim \mathcal{GP}(m(x),k(x,x'))$$
Valid kernels
In order to be a valid kernel function the resulting kernel matrix $\Sigma = k(X, X)$ should be positive definite . Which implies that the matrix should be symmetric . Being positive definite also means that the kernel matrix is invertible .
The process of defining a new valid kernel from scratch is not always trivial. Typically pre-defined kernels are used to model a variety of processes. In what follows we will visually explore some of these pre-defined kernels that we used in our fitting example .
# Kernel and plotting functions to be used below
def as_column(x: ArrayLike) -> Array:
"""Represent one-dimensional inputs as column vectors."""
return jnp.asarray(x, dtype=jnp.float64).reshape(-1, 1)
def squared_distance(xa: ArrayLike, xb: ArrayLike) -> Array:
"""Compute pairwise squared distances for stationary kernels."""
xa = as_column(xa)
xb = as_column(xb)
return jnp.sum((xa[:, None, :] - xb[None, :, :]) ** 2, axis=-1)
# The plots below use NumPy and Matplotlib, so the kernel helpers
# convert JAX arrays to NumPy arrays before returning.
def exponentiated_quadratic_kernel(
xa: ArrayLike, xb: ArrayLike, amplitude: float = 1.0, length_scale: float = 1.0
) -> KernelMatrix:
"""Show smooth functions through an RBF covariance."""
covariance = amplitude**2 * jnp.exp(
-0.5 * squared_distance(xa=xa, xb=xb) / length_scale**2
)
return np.asarray(covariance)
def rational_quadratic_kernel(
xa: ArrayLike,
xb: ArrayLike,
amplitude: float = 1.0,
length_scale: float = 1.0,
scale_mixture: float = 1.0,
) -> KernelMatrix:
"""Show multi-scale smoothness through a rational quadratic covariance."""
covariance = amplitude**2 * (
1.0 + squared_distance(xa=xa, xb=xb) / (2.0 * scale_mixture * length_scale**2)
) ** (-scale_mixture)
return np.asarray(covariance)
def periodic_kernel(
xa: ArrayLike,
xb: ArrayLike,
amplitude: float = 1.0,
length_scale: float = 1.0,
period: float = 1.0,
) -> KernelMatrix:
"""Show repeating functions through a periodic covariance."""
xa = as_column(xa)
xb = as_column(xb)
distance = jnp.abs(xa[:, None, :] - xb[None, :, :])
sine_squared = jnp.sum(jnp.sin(jnp.pi * distance / period) ** 2, axis=-1)
covariance = amplitude**2 * jnp.exp(-2.0 * sine_squared / length_scale**2)
return np.asarray(covariance)
def local_periodic_kernel(
xa: ArrayLike,
xb: ArrayLike,
amplitude: float = 1.0,
periodic_length_scale: float = 1.0,
period: float = 1.0,
local_length_scale: float = 1.0,
) -> KernelMatrix:
"""Show periodic functions whose similarity fades with distance."""
return periodic_kernel(
xa=xa,
xb=xb,
amplitude=amplitude,
length_scale=periodic_length_scale,
period=period,
) * exponentiated_quadratic_kernel(
xa=xa, xb=xb, amplitude=1.0, length_scale=local_length_scale
)
def sample_from_kernel(
covariance: KernelMatrix, nb_of_samples: int, nb_of_realizations: int
) -> KernelMatrix:
"""Draw sample functions to make a covariance matrix concrete."""
return np.random.multivariate_normal(
mean=np.zeros(nb_of_samples), cov=covariance, size=nb_of_realizations
).T
def plot_kernel(
X: KernelMatrix,
y: KernelMatrix,
covariance: KernelMatrix,
description: str,
fig: Figure,
subplot_spec: SubplotSpec,
xlim: tuple[int, int],
scatter: bool = False,
rotate_x_labels: bool = False,
) -> None:
"""Visualize a kernel through samples and its covariance matrix."""
grid_spec = gridspec.GridSpecFromSubplotSpec(
1,
2,
width_ratios=[2, 1],
height_ratios=[1],
wspace=0.18,
hspace=0.0,
subplot_spec=subplot_spec,
)
ax1 = fig.add_subplot(grid_spec[0])
ax2 = fig.add_subplot(grid_spec[1])
# Plot samples
if scatter:
for i in range(y.shape[1]):
ax1.scatter(X, y[:, i], alpha=0.8, s=3)
else:
for i in range(y.shape[1]):
ax1.plot(X, y[:, i], alpha=0.8)
ax1.set_ylabel("$y$", fontsize=13, labelpad=0)
ax1.set_xlabel("$x$", fontsize=13, labelpad=0)
ax1.set_xlim(xlim)
if rotate_x_labels:
for label in ax1.get_xticklabels():
label.set_rotation(30)
ax1.set_title(f"Samples from {description}")
# Plot covariance matrix
im = ax2.imshow(covariance, cmap="YlGnBu")
divider = make_axes_locatable(ax2)
cax = divider.append_axes("right", size="5%", pad=0.02)
cbar = plt.colorbar(im, ax=ax2, cax=cax)
cbar.ax.set_ylabel("$K(X,X)$", fontsize=8)
ax2.set_title(f"Covariance matrix\n{description}")
ax2.set_xlabel("X", fontsize=10, labelpad=0)
ax2.set_ylabel("X", fontsize=10, labelpad=0)
# Show 5 custom ticks on x an y axis of covariance plot
nb_ticks = 5
ticks = list(range(xlim[0], xlim[1] + 1))
ticks_idx = np.rint(
np.linspace(1, len(ticks), num=min(nb_ticks, len(ticks))) - 1
).astype(int)
ticks = list(np.array(ticks)[ticks_idx])
ax2.set_xticks(np.linspace(0, len(X) - 1, len(ticks)))
ax2.set_yticks(np.linspace(0, len(X) - 1, len(ticks)))
ax2.set_xticklabels(ticks)
ax2.set_yticklabels(ticks)
if rotate_x_labels:
for label in ax2.get_xticklabels():
label.set_rotation(30)
ax2.grid(False)
#
White noise kernel
The white noise kernel represents independent and identically distributed noise added to the Gaussian process distribution.
$$k(x, x) = \sigma^2 I_n$$ With:
- $\sigma^2$ the variance of the noise.
- $I_n$ the identity matrix.
This formula results in a covariance matrix with zeros everywhere except on the diagonal of the covariance matrix. This diagonal contains the variances of the individual random variables. All covariances between samples are zero because the noise is uncorrelated.
Samples from the white noise kernel together with a visual representation of the covariance matrix are plotted in the next figure.
# Plot kernel matrix and samples of white noise kernel
nb_of_samples = 100 # Number of test points.
nb_of_realizations = 3 # Number of function realizations
# Generate independent samples that can be transformed
xlim = (-2, 2)
X = np.expand_dims(np.linspace(*xlim, nb_of_samples), 1)
# Start plotting
fig = plt.figure(figsize=(7, 2.7))
gs = gridspec.GridSpec(1, 1, figure=fig, wspace=0.0, hspace=0.0)
# Sample from the prior
Σ = np.eye(nb_of_samples) # Identity matrix
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
# Plot
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="white noise kernel",
fig=fig,
subplot_spec=gs[0],
xlim=xlim,
scatter=True,
)
fig.tight_layout()
plt.show()
#
Exponentiated quadratic kernel
The exponentiated quadratic kernel (also known as squared exponential kernel, Gaussian kernel or radial basis function kernel) is one of the most popular kernels used in Gaussian process modelling. It can be computed as:
$$k(x_a, x_b) = \sigma^2 \exp \left(-\frac{ \left\Vert x_a - x_b \right\Vert^2}{2\ell^2}\right)$$ With:
- $\sigma^2$ the overall variance ($\sigma$ is also known as amplitude).
- $\ell$ the lengthscale.
Using the exponentiated quadratic kernel will result in a smooth prior on functions sampled from the Gaussian process.
The exponentiated quadratic is visualized in the next figures. The first figure shows the distance plot with respect to $0$: $k(0, x)$. Note that the similarity outputted by the kernel decreases exponentially towards $0$ the farther we move move away from the center, and that the similarity is maximum at the center $x_a = x_b$.
# Plot exponentiated quadratic distance
xlim = (-4, 4)
X = np.expand_dims(np.linspace(*xlim, num=75), 1)
zero = np.array([[0.0]])
# Make the plots
fig, ax = plt.subplots(figsize=(5.4, 3))
Σ = exponentiated_quadratic_kernel(xa=zero, xb=X, amplitude=1.0, length_scale=1.0)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell = 1$, $\\sigma = 1$")
Σ = exponentiated_quadratic_kernel(xa=zero, xb=X, amplitude=1.0, length_scale=0.5)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell = 0.5$, $\\sigma = 1$")
Σ = exponentiated_quadratic_kernel(xa=zero, xb=X, amplitude=0.5, length_scale=1.0)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell = 1$, $\\sigma = 0.5$")
ax.set_xlabel("$x_a - x_b$", fontsize=11)
ax.set_ylabel("$K(x_a,x_b)$", fontsize=11)
ax.set_title("Exponentiated quadratic distance plot")
ax.set_ylim((0, 1.1))
ax.set_xlim(*xlim)
ax.legend(loc=1)
plt.tight_layout()
plt.show()
#
The following figure shows samples from the exponentiated quadratic kernel together with a visual representation of its covariance matrix.
Observe in the previous and following figure that increasing the lengthscale parameter $\ell$ increases the spread of the covariance. Increasing the amplitude parameter $\sigma$ increases the maximum value of the covariance.
# Plot kernel matrix and samples of exponentiated quadratic
nb_of_samples = 150 # Number of test points.
nb_of_realizations = 3 # Number of function realizations
# Generate input points
xlim = (-4, 4)
X = np.expand_dims(np.linspace(*xlim, nb_of_samples), 1)
# Start plotting
fig = plt.figure(figsize=(7, 10))
gs = gridspec.GridSpec(4, 1, figure=fig, wspace=0.2, hspace=0.4)
# Plot first
Σ = exponentiated_quadratic_kernel(xa=X, xb=X, amplitude=1.0, length_scale=1.0)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell = 1$, $\\sigma = 1$",
fig=fig,
subplot_spec=gs[0],
xlim=xlim,
)
# Plot second
Σ = exponentiated_quadratic_kernel(xa=X, xb=X, amplitude=1.0, length_scale=0.3)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell = 0.3$, $\\sigma = 1$",
fig=fig,
subplot_spec=gs[1],
xlim=xlim,
)
# Plot third
Σ = exponentiated_quadratic_kernel(xa=X, xb=X, amplitude=1.0, length_scale=2.0)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell = 2$, $\\sigma = 1$",
fig=fig,
subplot_spec=gs[2],
xlim=xlim,
)
# Plot fourth
Σ = exponentiated_quadratic_kernel(xa=X, xb=X, amplitude=10.0, length_scale=1.0)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell = 1$, $\\sigma = 10$",
fig=fig,
subplot_spec=gs[3],
xlim=xlim,
)
plt.suptitle("Exponentiated quadratic", y=0.99)
fig.subplots_adjust(left=0.07, bottom=0.04, right=0.93, top=0.94)
plt.show()
#
Rational quadratic kernel
$$k(x_a, x_b) = \sigma^2 \left( 1 + \frac{ \left\Vert x_a - x_b \right\Vert^2}{2 \alpha \ell^2} \right)^{-\alpha}$$ With:
- $\sigma^2$ the overall variance ($\sigma$ is also known as amplitude).
- $\ell$ the lengthscale.
- $\alpha$ the scale-mixture ($\alpha$ > 0).
Similar to the exponentiated quadratic the rational quadratic kernel will result in a somewhat smooth prior on functions sampled from the Gaussian process. The rational quadratic can be interpreted as an infinite sum of different exponentiated quadratic kernels with different lengthscales with $\alpha$ determining the weighting between different lengthscales. When $\alpha \rightarrow \infty$ the rational quadratic kernel converges into the exponentiated quadratic kernel.
The rational quadratic is visualized in the next figures. The first figure shows the distance plot with respect to $0$: $k(0, x)$ with the amplitude $\sigma$ fixed to $1$.
Note that just like the exponentiated quadratic the similarity outputted by the kernel decreases towards $0$ the farther we move move away from the center, and that the similarity is maximum at the center $x_a = x_b$.
# Plot rational quadratic distance
xlim = (-7, 7)
X = np.expand_dims(np.linspace(*xlim, num=151), 1)
zero = np.array([[0.0]])
# Make the plots
fig, ax = plt.subplots(figsize=(5.4, 3))
Σ = rational_quadratic_kernel(
xa=zero, xb=X, amplitude=1.0, length_scale=1.0, scale_mixture=1.0
)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=1$, $\\alpha=1$")
Σ = rational_quadratic_kernel(
xa=zero, xb=X, amplitude=1.0, length_scale=5.0, scale_mixture=1.0
)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=5$, $\\alpha=1$")
Σ = rational_quadratic_kernel(
xa=zero, xb=X, amplitude=1.0, length_scale=0.2, scale_mixture=1.0
)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=0.2$, $\\alpha=1$")
Σ = rational_quadratic_kernel(
xa=zero, xb=X, amplitude=1.0, length_scale=1.0, scale_mixture=0.1
)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=1$, $\\alpha=0.1$")
Σ = rational_quadratic_kernel(
xa=zero, xb=X, amplitude=1.0, length_scale=1.0, scale_mixture=1000.0
)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=1$, $\\alpha=1000$")
ax.set_xlabel("$x_a - x_b$", fontsize=11)
ax.set_ylabel("$k(x_a,x_b)$", fontsize=11)
ax.set_title("Rational quadratic distance plot ($\\sigma=1$)")
ax.set_ylim((0, 1.1))
ax.set_xlim(*xlim)
ax.legend(loc=1)
plt.tight_layout()
plt.show()
#
The following figure shows samples from the rational quadratic kernel together with a visual representation of its covariance matrix. The amplitude $\sigma$ is fixed to $1$ in all figures, changing the amplitude will have the same effect as changing the amplitude in the exponentiated quadratic.
Observe that in the previous and following figures, increasing the lengthscale parameter $\ell$ increases the overall spread of the covariance. Decreasing the scale-mixture $\alpha$ will allow for more minor local variations while still keeping the longer scale trends defined by the lengthscale parameter. Increasing the scale-mixture to a large value reduces the minor local variations.
# Plot kernel matrix and samples of rational quadratic
nb_of_samples = 150 # Number of test points.
nb_of_realizations = 3 # Number of function realizations
# Generate input points
xlim = (-5, 5)
X = np.expand_dims(np.linspace(*xlim, nb_of_samples), 1)
# Start plotting
fig = plt.figure(figsize=(7, 12))
gs = gridspec.GridSpec(5, 1, figure=fig, wspace=0.2, hspace=0.5)
# Plot first
Σ = rational_quadratic_kernel(
xa=X, xb=X, amplitude=1.0, length_scale=1.0, scale_mixture=1.0
)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell=1$, $\\alpha=1$",
fig=fig,
subplot_spec=gs[0],
xlim=xlim,
)
# Plot second
Σ = rational_quadratic_kernel(
xa=X, xb=X, amplitude=1.0, length_scale=5.0, scale_mixture=1.0
)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell=5$, $\\alpha=1$",
fig=fig,
subplot_spec=gs[1],
xlim=xlim,
)
# Plot third
Σ = rational_quadratic_kernel(
xa=X, xb=X, amplitude=1.0, length_scale=0.2, scale_mixture=1.0
)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell=0.2$, $\\alpha=1$",
fig=fig,
subplot_spec=gs[2],
xlim=xlim,
)
# Plot fourth
Σ = rational_quadratic_kernel(
xa=X, xb=X, amplitude=1.0, length_scale=1.0, scale_mixture=0.1
)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell=1$, $\\alpha=0.1$",
fig=fig,
subplot_spec=gs[3],
xlim=xlim,
)
# Plot fifth
Σ = rational_quadratic_kernel(
xa=X, xb=X, amplitude=1.0, length_scale=1.0, scale_mixture=1000.0
)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell=1$, $\\alpha=1000$",
fig=fig,
subplot_spec=gs[4],
xlim=xlim,
)
fig.suptitle("Rational quadratic ($\\sigma=1$)", y=0.99)
fig.subplots_adjust(left=0.06, bottom=0.04, right=0.94, top=0.95)
plt.show()
#
Periodic kernel
$$k(x_a, x_b) = \sigma^2 \exp \left(-\frac{2}{\ell^2}\sin^2 \left( \pi \frac{\lvert x_a - x_b \rvert}{p}\right) \right)$$ With:
- $\sigma^2$ the overall variance ($\sigma$ is also known as amplitude).
- $\ell$ the lengthscale.
- $p$ the period, which is the distance between repetitions.
The periodic kernel allows us to model periodic functions .
The periodic kernel is visualized in the next figures. The first two figures show the distance plot with respect to $0$: $k(0, x)$ with the amplitude $\sigma$ fixed to $1$ and different variations of the other parameters.
# Plot periodic distance
xlim = (-2, 2)
X = np.expand_dims(np.linspace(*xlim, num=150), 1)
zero = np.array([[0.0]])
# Make the plots
fig, ax = plt.subplots(figsize=(5.4, 3))
Σ = periodic_kernel(xa=zero, xb=X, amplitude=1.0, length_scale=1.0, period=1.0)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=1$, $p=1$")
Σ = periodic_kernel(xa=zero, xb=X, amplitude=1.0, length_scale=2.0, period=1.0)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=2$, $p=1$")
Σ = periodic_kernel(xa=zero, xb=X, amplitude=1.0, length_scale=0.5, period=1.0)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=0.5$, $p=1$")
ax.set_xlabel("$x_a - x_b$", fontsize=11)
ax.set_ylabel("$K(x_a,x_b)$", fontsize=11)
ax.set_title("Periodic distance plot ($\\sigma=1$)")
ax.set_ylim((0, 1.1))
ax.set_xlim(*xlim)
ax.legend(loc=1)
fig.tight_layout()
fig.show()
# Second plot
fig, ax = plt.subplots(figsize=(5.4, 3))
Σ = periodic_kernel(xa=zero, xb=X, amplitude=1.0, length_scale=1.0, period=1.0)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=1$, $p=1$")
Σ = periodic_kernel(xa=zero, xb=X, amplitude=1.0, length_scale=1.0, period=0.5)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=1$, $p=0.5$")
Σ = periodic_kernel(xa=zero, xb=X, amplitude=1.0, length_scale=1.0, period=2.0)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell=1$, $p=2$")
ax.set_xlabel("$x_a - x_b$", fontsize=11)
ax.set_ylabel("$K(x_a,x_b)$", fontsize=11)
ax.set_title("Periodic distance plot ($\\sigma=1$)")
ax.set_ylim((0, 1.1))
ax.set_xlim(*xlim)
ax.legend(loc=1)
fig.tight_layout()
plt.show()
#
The following figure shows samples from the periodic kernel together with a visual representation of its covariance matrix. The amplitude $\sigma$ is fixed to $1$ in all figures.
Observe that in the previous and following figures, increasing the period $p$ increases the distance between the repetitions (increasing the wavelength). Increasing the lengthscale parameter $\ell$ decreases the local variations within a repetition in the same way that increasing the lengthscale in the exponentiated quadratic kernel decreases the variations over a longer range.
# Plot kernel matrix and samples of periodic
nb_of_samples = 150 # Number of test points.
nb_of_realizations = 3 # Number of function realizations
# Generate input points
xlim = (-2, 2)
X = np.expand_dims(np.linspace(*xlim, nb_of_samples), 1)
# Start plotting
fig = plt.figure(figsize=(7, 12))
gs = gridspec.GridSpec(5, 1, figure=fig, wspace=0.2, hspace=0.5)
# Plot first
Σ = periodic_kernel(xa=X, xb=X, amplitude=1.0, length_scale=1.0, period=1.0)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell=1$, $p=1$",
fig=fig,
subplot_spec=gs[0],
xlim=xlim,
)
# Plot second
Σ = periodic_kernel(xa=X, xb=X, amplitude=1.0, length_scale=2.0, period=1.0)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell=2$, $p=1$",
fig=fig,
subplot_spec=gs[1],
xlim=xlim,
)
# Plot third
Σ = periodic_kernel(xa=X, xb=X, amplitude=1.0, length_scale=0.5, period=1.0)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell=0.5$, $p=1$",
fig=fig,
subplot_spec=gs[2],
xlim=xlim,
)
# Plot fourth
Σ = periodic_kernel(xa=X, xb=X, amplitude=1.0, length_scale=1.0, period=0.5)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell=1$, $p=0.5$",
fig=fig,
subplot_spec=gs[3],
xlim=xlim,
)
# Plot fifth
Σ = periodic_kernel(xa=X, xb=X, amplitude=1.0, length_scale=1.0, period=2.0)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell=1$, $p=2$",
fig=fig,
subplot_spec=gs[4],
xlim=xlim,
)
fig.suptitle("Periodic ($\\sigma=1$)", y=0.99)
fig.subplots_adjust(left=0.06, bottom=0.04, right=0.94, top=0.95)
plt.show()
#
Combining kernels by multiplication
Kernels can be combined by multiplying them together. Multiplying kernels is an elementwise multiplication of their corresponding covariance matrices. This means that the covariances of the two multiplied kernels will only have a high value if both covariances have a high value. The multiply operation can thus be interpreted as an AND operation.
Local periodic kernel
The local periodic kernel is a multiplication of the periodic kernel with the exponentiated quadratic kernel to allow the periods to vary over longer distances. Note that the variance parameters $\sigma^2$ are combined into one.
$$k(x_a, x_b) = \sigma^2 \exp \left(-\frac{2}{\ell_p^2}\sin^2 \left( \pi \frac{\lvert x_a - x_b \rvert}{p}\right) \right) \exp \left(-\frac{ \left\Vert x_a - x_b \right\Vert^2}{2\ell_{eq}^2}\right)$$ With:
- $\sigma^2$ the overall variance ($\sigma$ is also known as amplitude).
- $\ell_p$ lengthscale of the periodic function.
- $p$ the period.
- $\ell_{eq}$ the lengthscale of the exponentiated quadratic.
The local periodic kernel is visualized in the next figures. The first figure shows the distance plot with respect to $0$: $k(0, x)$ with only variations of the lengthscale of the exponentiated quadratic.
# Plot locally periodic distance
# Define the local periodic kernel
def get_local_periodic_kernel(
periodic_length_scale: float,
period: float,
amplitude: float,
local_length_scale: float,
) -> KernelFn:
"""Freeze local-periodic hyperparameters for plotting examples."""
def kernel(*, xa: ArrayLike, xb: ArrayLike) -> KernelMatrix:
"""Evaluate the configured local-periodic covariance."""
return local_periodic_kernel(
xa=xa,
xb=xb,
amplitude=amplitude,
periodic_length_scale=periodic_length_scale,
period=period,
local_length_scale=local_length_scale,
)
return kernel
# Make the plots
xlim = (-2, 2)
X = np.expand_dims(np.linspace(*xlim, num=150), 1)
zero = np.array([[0.0]])
fig, ax = plt.subplots(figsize=(5.4, 3))
Σ = get_local_periodic_kernel(
periodic_length_scale=1.0, period=1.0, amplitude=1.0, local_length_scale=1.0
)(xa=zero, xb=X)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell_{eq}=1$")
Σ = get_local_periodic_kernel(
periodic_length_scale=1, period=1, amplitude=1, local_length_scale=0.5
)(xa=zero, xb=X)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell_{eq}=0.5$")
Σ = get_local_periodic_kernel(
periodic_length_scale=1.0, period=1.0, amplitude=1.0, local_length_scale=2.0
)(xa=zero, xb=X)
ax.plot(X[:, 0], Σ[0, :], label="$\\ell_{eq}=2$")
ax.set_xlabel("$x_a - x_b$", fontsize=11)
ax.set_ylabel("$K(x_a,x_b)$", fontsize=11)
ax.set_title("Local periodic distance plot ($\\sigma=1$, $\\ell_p=1$, $p=1$)")
ax.set_ylim((0, 1.1))
ax.set_xlim(*xlim)
ax.legend(loc=1)
fig.tight_layout()
plt.show()
#
The following figure shows samples from the local periodic kernel together with a visual representation of its covariance matrix. Only variations of the lengthscale of the exponentiated quadratic $\ell_{eq}$ are shown, the rest of the parameters are fixed to $1$.
Observe that increasing the lengthscale parameter $\ell_{eq}$ increases the periodic covariance over a longer lengthscale and keeps the repetitions close to each other more consistent.
# Plot kernel matrix and samples of local periodic
nb_of_samples = 150 # Number of test points.
nb_of_realizations = 3 # Number of function realizations
# Generate input points
xlim = (-3, 3)
X = np.expand_dims(np.linspace(*xlim, nb_of_samples), 1)
# Start plotting
fig = plt.figure(figsize=(7, 8))
gs = gridspec.GridSpec(3, 1, figure=fig, wspace=0.2, hspace=0.4)
# Plot first
Σ = get_local_periodic_kernel(
periodic_length_scale=1.0, period=1.0, amplitude=1.0, local_length_scale=1.0
)(xa=X, xb=X)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell_{eq}=1$",
fig=fig,
subplot_spec=gs[0],
xlim=xlim,
)
# Plot second
Σ = get_local_periodic_kernel(
periodic_length_scale=1.0, period=1.0, amplitude=1.0, local_length_scale=0.5
)(xa=X, xb=X)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell_{eq}=0.5$",
fig=fig,
subplot_spec=gs[1],
xlim=xlim,
)
# Plot third
Σ = get_local_periodic_kernel(
periodic_length_scale=1.0, period=1.0, amplitude=1.0, local_length_scale=3.0
)(xa=X, xb=X)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="$\\ell_{eq}=3$",
fig=fig,
subplot_spec=gs[2],
xlim=xlim,
)
fig.suptitle("Local periodic ($\\sigma=1$, $\\ell_p=1$, $p=1$)", y=0.99)
fig.subplots_adjust(left=0.07, bottom=0.05, right=0.93, top=0.91)
plt.show()
#
Combining kernels by addition
Kernels can be combined by adding them together. Adding kernels is an elementwise addition of their corresponding covariance matrices. This means that the covariances of the two added kernels will only have a low value if both of the covariances have a low value. The addition operation can thus be interpreted as an OR operation.
Atmospheric CO₂ kernel
An example of a kernel combined by addition is the kernel we fitted in our previous example of fitting a Gaussian process on atmospheric CO₂ concentrations. This kernel needed to combine different characteristics of the data such as: long term smooth change in CO₂ levels, seasonality, and short to medium term irregularities.
The final fitted model used a sum of the exponentiated quadratic kernel, local periodic kernel, and rational quadratic kernel, plus an observational-noise term on the diagonal. The hyperparameters were fitted on the data using a maximum likelihood method. We keep the fitted noise variance in
fitted_params
so the values match the previous post, but for the visualizations below we show the latent covariance kernel without adding the diagonal observational-noise term.
The prior defined by this kernel with fitted hyperparameters is illustrated in the following figures. These figures will show some samples of the kernel and a visual representation of its covariance over different ranges of input.
# Explore the kernel from the previous notebook
fitted_params: FittedParams = {
"smooth_amplitude": 124.3517,
"smooth_length_scale": 99.9925,
"periodic_amplitude": 2.2775,
"periodic_length_scale": 1.3713,
"periodic_period": 0.9998,
"periodic_local_length_scale": 99.0098,
"irregular_amplitude": 2.3398,
"irregular_length_scale": 1.6404,
"irregular_scale_mixture": 0.0073,
"observation_noise_variance": 0.0404,
}
def get_combined_kernel(params: FittedParams = fitted_params) -> KernelFn:
"""Freeze fitted hyperparameters into the combined CO2 covariance."""
def kernel(xa: ArrayLike, xb: ArrayLike) -> KernelMatrix:
"""Evaluate the configured combined covariance."""
smooth = exponentiated_quadratic_kernel(
xa=xa,
xb=xb,
amplitude=params["smooth_amplitude"],
length_scale=params["smooth_length_scale"],
)
local_periodic = get_local_periodic_kernel(
periodic_length_scale=params["periodic_length_scale"],
period=params["periodic_period"],
amplitude=params["periodic_amplitude"],
local_length_scale=params["periodic_local_length_scale"],
)(xa=xa, xb=xb)
irregular = rational_quadratic_kernel(
xa=xa,
xb=xb,
amplitude=params["irregular_amplitude"],
length_scale=params["irregular_length_scale"],
scale_mixture=params["irregular_scale_mixture"],
)
return smooth + local_periodic + irregular
return kernel
#
# Plot kernel matrix and samples of combined kernel
nb_of_samples = 150 # Number of test points.
# Start plotting
fig = plt.figure(figsize=(7, 8))
gs = gridspec.GridSpec(3, 1, figure=fig, wspace=0.2, hspace=0.5)
# Plot first
nb_of_realizations = 1
xlim = (0, 5)
X = np.expand_dims(np.linspace(*xlim, nb_of_samples), 1)
Σ = get_combined_kernel()(xa=X, xb=X)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="fitted kernel",
fig=fig,
subplot_spec=gs[0],
xlim=xlim,
)
# Plot second
nb_of_realizations = 1
xlim = (0, 10)
X = np.expand_dims(np.linspace(*xlim, nb_of_samples), 1)
Σ = get_combined_kernel()(xa=X, xb=X)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="fitted kernel",
fig=fig,
subplot_spec=gs[1],
xlim=xlim,
)
# Plot third
nb_of_realizations = 5
xlim = (0, 50)
X = np.expand_dims(np.linspace(*xlim, nb_of_samples), 1)
Σ = get_combined_kernel()(xa=X, xb=X)
y = sample_from_kernel(
covariance=Σ, nb_of_samples=nb_of_samples, nb_of_realizations=nb_of_realizations
)
plot_kernel(
X=X,
y=y,
covariance=Σ,
description="fitted kernel",
fig=fig,
subplot_spec=gs[2],
xlim=xlim,
)
fig.suptitle("Combined kernel", y=0.97)
plt.show()
#
This post was the last part of a series on Gaussian processes:
References
- Gaussian Processes for Machine Learning. Chapter 4: Covariance Functions by Carl Edward Rasmussen and Christopher K. I. Williams.
- The Kernel Cookbook by David Duvenaud.
- Kernel Design GP Summer School, Sheffield, September 2015. By Nicolas Durrande.
# Python package versions used
%load_ext watermark
%watermark --python
%watermark --iversions
#
This post is generated from an IPython notebook file. Link to the full IPython notebook file