Gaussian Process Library

Gaussian Process Library

This section describes a library for Gaussian process time series models. A technical overview of key concepts can be found in the following references.

Roberts S, Osborne M, Ebden M, Reece S, Gibson N, Aigrain S. 2013. Gaussian processes for time-series modelling. Phil Trans R Soc A 371: 20110550. http://dx.doi.org/10.1098/rsta.2011.0550

Rasmussen C, Williams C. 2006. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA. http://gaussianprocess.org/gpml/chapters/

AutoGP.GP — Module

Module for Gaussian process modeling library.

source

Covariance Kernels

AutoGP.GP.Node — Type

abstract type Node end

Abstract class for covariance kernels.

source

AutoGP.GP.LeafNode — Type

abstract type LeafNode <: Node end

Abstract class for primitive covariance kernels.

source

AutoGP.GP.BinaryOpNode — Type

abstract type BinaryOpNode <: Node end

Abstract class for [composite covariance kernels](@ref (@ref gpcovkernel_comp).

source

AutoGP.GP.pretty — Function

pretty(node::Node)

Return a pretty String representation of node.

source

Base.size — Function

Base.size(node::Node)
Base.size(node::LeafNode) = 1
Base.size(a::LeafNode, b::Node) = size(a) + size(b)

Return the total number of subexpressions in a Node, as defined above.

source

AutoGP.GP.eval_cov — Function

eval_cov(node::Node, t1::Real, t2::Real)
eval_cov(node::Node, ts::Vector{Float64})

Evaluate the covariance function node at the given time indexes. The first form returns a Real number and the second form returns a covariance Matrix.

source

AutoGP.GP.compute_cov_matrix — Function

compute_cov_matrix(node::Node, noise, ts)

Non-vectorized implementation of compute_cov_matrix_vectorized.

source

AutoGP.GP.compute_cov_matrix_vectorized — Function

compute_cov_matrix_vectorized(node::Node, noise, ts)

Compute covariance matrix by evaluating node on all pair of ts. The noise is added to the diagonal of the covariance matrix, which means that if ts[i] == ts[j], then X[ts[i]] and Xs[ts[j]] are i.i.d. samples of the true function at ts[i] plus mean zero Gaussian noise.

source

AutoGP.GP.extract_kernel — Function

extract_kernel(node::Node, ::Type{T}; retain::Bool=false) where T<:LeafNode

Retain only those primitive kernels in node of type T <: LeafNode, by replacing all other primitive kernels with an appropriate dummy kernel:

If all primitive kernels in node are of type T, the return value is Constant(0).

If retain=false then the behavior is flipped: the primitive kernels of type T are removed, while the others are retained.

source

AutoGP.GP.split_kernel_sop — Function

split_kernel_sop(node::Node, ::Type{T}) where T<:LeafNode

Splits the kernel $k$ denoted by node according to a sum-of-products interpretation. In particular, write

\[k = k_{11}k_{12}\cdots k_{1n_1} + k_{21}k_{22}\cdots k_{2n_2} + \dots + k_{m1}k_{m2}\cdots k_{m n_m}.\]

For a given primitive base kernel type T we can rewrite the above expression as

\[k = k^{\rm T} + k^{\rm nT},\]

where $k^{\rm T}$ contains all addends with a factor of type T, and $k^{\rm nT}$ are the addends without a factor of type T.

The function returns a pair (node_a, node_b) corresponding to $k^{\rm T}$ and $k^{\rm nT}$ above, with Constant(0) serving as the sentinel value.

Examples

julia> l = Linear(1); p = Periodic(1,1); c = Constant(1)
julia> split_kernel_sop(l, Linear)
(l, Constant(0))
julia> split_kernel_sop(l, Periodic)
(Constant(0), l)
julia> split_kernel_sop(l*p + l*c, Periodic)
(l*p, l*c)
julia> split_kernel_sop(p*p, Periodic)
(p*p, Constant(0))

source

AutoGP.GP.reparameterize — Function

reparameterize(node::Node, t::LinearTransform)

Reparameterize the covariance kernel according to the given LinearTransform applied to the input (known as an "input warping"). For a kernel $k(\cdot,\cdot; \theta)$ and a linear transform $f(t) = at+b$ over the time domain, this function returns a kernel with new parameters $\theta'$ such that $k(at+b, au+b; \theta) = k(t, u; \theta')$.

source

AutoGP.GP.rescale — Function

rescale(node::Node, t::LinearTransform)

Rescale the covariance kernel according to the given LinearTransform applied to the output. In particular, for a GP $X \sim \mathrm{GP}(0, k(\cdot,\cdot; \theta))$ and a transformation $Y = aX + b$, this function returns a kernel with new parameters $\theta'$ such that $Y \sim \mathrm{GP}(b, k(\cdot,\cdot; \theta'))$.

source

Primitive Kernels

Notation. In this section, generic parameters (e.g., $\theta$, $\theta_1$, $\theta_2$), are used to denote fieldnames of the corresponding Julia structs in the same order as they appear in the constructors.

AutoGP.GP.WhiteNoise — Type

WhiteNoise(value)

White noise covariance kernel.

\[k(t, t') = \mathbf{I}[t = t'] \theta\]

The random variables $X[t]$ and $X[t']$ are perfectly correlated whenever $t = t'$ and independent otherwise. This kernel cannot be used to represent the joint distribution of multiple i.i.d. measurements of $X[t]$, instead see compute_cov_matrix_vectorized.

source

AutoGP.GP.Constant — Type

Constant(value)

Constant covariance kernel.

\[k(t,t') = \theta\]

Draws from this kernel are horizontal lines, where $\theta$ determines the variance of the constant value around the mean (typically zero).

source

AutoGP.GP.Linear — Type

Linear(intercept[, bias=1, amplitude=1])

Linear covariance kernel.

\[k(t, t') = \theta_2 + \theta_3 (t - \theta_1)(t'-\theta_1)\]

Draws from this kernel are sloped lines in the 2D plane. The time intercept is $\theta_1$. The variance around the time intercept is $\theta_2$. The scale factor, which dictates the slope, is $\theta_3$.

source

AutoGP.GP.SquaredExponential — Type

SquaredExponential(lengthscale[, amplitude=1])

Squared Exponential covariance kernel.

\[k(t,t') = \theta_2 \exp\left(-1/2|t-t'|/\theta_2)^2 \right)\]

Draws from this kernel are smooth functions.

source

AutoGP.GP.GammaExponential — Type

GammaExponential(lengthscale, gamma[, amplitude=1])

Gamma Exponential covariance kernel.

\[k(t,t') = \theta_3 \exp(-(|t-t'|/\theta_1)^{\theta_2})\]

Requires 0 < gamma <= 2. Recovers the SquaredExponential kernel when gamma = 2.

source

AutoGP.GP.Periodic — Type

Periodic(lengthscale, period[, amplitude=1])

Periodic covariance kernel.

\[k(t,t') = \exp\left( (-2/\theta_1^2) \sin^2((\pi/\theta_2) |t-t'|) \right)\]

The lengthscale determines how smooth the periodic function is within each period. Heuristically, the periodic kernel can be understood as:

Sampling $[X(t), t \in [0,p]] \sim \mathrm{GP}(0, \mathrm{SE}(\theta_1))$.
Repeating this fragment for all intervals $[jp, (j+1)p], j \in \mathbb{Z}$.

source

Composite Kernels

AutoGP.GP.Times — Type

Times(left::Node, right::Node)
Base.:*(left::Node, right::Node)

Covariance kernel obtained by multiplying two covariance kernels pointwise.

\[k(t,t') = k_{\rm left}(t,t') \times k_{\rm right}(t,t')\]

source

AutoGP.GP.Plus — Type

Plus(left::Node, right::Node)
Base.:+(left::Node, right::Node)

Covariance kernel obtained by summing two covariance kernels pointwise.

\[k(t,t') = k_{\rm left}(t,t') + k_{\rm right}(t,t')\]

source

AutoGP.GP.ChangePoint — Type

ChangePoint(left::Node, right::Node, location::Real, scale::Real)

Covariance kernel obtained by switching between two kernels at location.

\[\begin{aligned} k(t,t') &= [\sigma_1 \cdot k_{\rm left}(t, t') \cdot \sigma_2] + [(1 - \sigma_1) \cdot k_{\rm right}(t, t') \cdot (1-\sigma_2)] \\ \mathrm{where}\, \sigma_1 &= (1 + \tanh((t - \theta_1) / \theta_2))/2, \\ \sigma_2 &= (1 + \tanh((t' - \theta_1) / \theta_2))/2. \end{aligned}\]

The location parameter $\theta_1$ denotes the time point at which the change occurs. The scale parameter $\theta_2$ is a nonnegative number that controls the rate of change; its behavior can be understood by analyzing the two extreme values:

If location=0 then $k_{\rm left}$ is active and $k_{\rm right}$ is inactive for all times less than location; $k_{\rm right}$ is active and $k_{\rm left}$ is inactive for all times greater than location; and $X[t] \perp X[t']$ for all $t$ and $t'$ on opposite sides of location.
If location=Inf then $k_{\rm left}$ and $k_{\rm right}$ have equal effect for all time points, and $k(t,t') = 1/2 (k_{\rm left}(t,'t) + k_{\rm right}(t,t'))$, which is equivalent to a Plus kernel scaled by a factor of $1/2$.

source

Prediction Utilities

Distributions.MvNormal — Type

dist = Distributions.MvNormal(
        node::Node,
        noise::Float64,
        ts::Vector{Float64},
        xs::Vector{Float64},
        ts_pred::Vector{Float64};
        noise_pred::Union{Nothing,Float64}=nothing)

Return MvNormal posterior predictive distribution over xs_pred at time indexes ts_pred, given noisy observations [ts, xs] and covariance function node with given level of observation noise. The model is

\[\begin{aligned} \begin{bmatrix} X(\mathbf{t})\\ X(\mathbf{t}^*) \end{bmatrix} \sim \mathrm{MultivariteNormal} \left( \mathbf{0}, \begin{bmatrix} k(\mathbf{t}, \mathbf{t}) + \eta I & k(\mathbf{t}, \mathbf{t}^*) \\ k(\mathbf{t}^*, \mathbf{t}) + \eta I & k(\mathbf{t}^*, \mathbf{t}^*) + \eta^* I \end{bmatrix} \right). \end{aligned}\]

The function returns the conditional multivariate normal distribution

\[X(\mathbf{t}^*) \mid X(\mathbf{t}) = x(\mathbf{t}).\]

By default, the observation noise (noise_pred) of the new data is equal to the noise of the observed data; use noise_pred = 0. to obtain the predictive distribution over noiseless future values.

See also

To compute log probabilities, Distributions.logpdf
To generate samples, Base.rand
To compute quantiles, Distributions.quantile

source

Statistics.quantile — Function

Distributions.quantile(dist::Distributions.MvNormal, p)

Compute quantiles of marginal distributions of dist.

Examples

Distributions.quantile(Distributions.MvNormal([0,1,2,3], LinearAlgebra.I(4)), .5)
Distributions.quantile(Distributions.MvNormal([0,1,2,3], LinearAlgebra.I(4)), [[.1, .5, .9]])

source

AutoGP.GP.infer_gp_sum — Function

(mvn, indexes) = infer_gp_sum(
            nodes::Vector{Node},
            noise::Float64,
            ts::Vector{Float64},
            xs::Vector{Float64},
            ts_pred::Vector{Float64};
            noise_pred::Union{Nothing,Float64}=nothing)

Consider a family of $m$ independent Gaussian process kernels

\[\begin{aligned} F_i \sim \mathrm{GP}(\mathbf{0}, k_i) && (1 \le i \le m). \end{aligned}\]

and let $\varepsilon(t)$ be an i.i.d. noise process with variance $\eta$.

Suppose we can observe the noisy sum of these latent GP functions:

\[\begin{aligned} X(t) = \sum_{i=1}^{m} F_i(t) + \varepsilon(t) && X \sim \mathrm{GP}(0, k_1 + \dots + k_m + \eta). \end{aligned}\]

Given observed data $(\mathbf{t}, x(\mathbf{t}))$ and test points $\mathbf{t}^*$, this function computes the joint multivariate normal posterior over all unknown components

\[\left[F_1(\mathbf{t}^*), \dots, F_m(\mathbf{t}^*), X(\mathbf{t}^*) \right] \mid X(\mathbf{t})=x(\mathbf{t}).\]

Inference is performed on the multivariate Gaussian system, which has a block diagonal structure

\[\begin{aligned} Z \coloneqq \begin{bmatrix} F_1(\mathbf{t}^*) \\ \vdots \\ F_m(\mathbf{t}^*) \\ X(\mathbf{t}^*) \\ X(\mathbf{t}) \end{bmatrix}, && Z \sim \mathrm{MultivariteNormal} \left( \mathbf{0}, \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix}\right), \end{aligned}\]

where

\[\begin{aligned} \Sigma_{aa} &\coloneqq \begin{bmatrix} \mathrm{blkdiag}\left( k_1(\mathbf{t}^*,\mathbf{t}^*), \dots k_m(\mathbf{t}^*,\mathbf{t}^*) \right) & \begin{matrix} k_1(\mathbf{t}^*,\mathbf{t}^*) \\ \vdots \\ k_m(\mathbf{t}^*,\mathbf{t}^*) \end{matrix} \\ \begin{matrix} k_1(\mathbf{t}^*,\mathbf{t}^*) & \dots & k_m(\mathbf{t}^*,\mathbf{t}^*) \end{matrix} & s(\mathbf{t}^*, \mathbf{t}^*) + \eta^*I \end{bmatrix}, \\[15pt] \Sigma_{ab} &\coloneqq \begin{bmatrix} k_1(\mathbf{t}^*, \mathbf{t}) \\ \vdots \\ k_m(\mathbf{t}^*, \mathbf{t}) \\ s(\mathbf{t}^*, \mathbf{t}) \end{bmatrix}, \quad \Sigma_{ba} \coloneqq \Sigma_{ba}^{\top}, \\[15pt] \Sigma_{bb} &\coloneqq s(\mathbf{t}, \mathbf{t}) + \eta I, \end{aligned}\]

with $s(t,u) \coloneqq k_1(t,u) + \dots + k_m(t,u)$.

The posterior is then

\[\begin{aligned} \mu_{a \mid b} &= \Sigma_{ab} \Sigma_{bb}^{-1} x(\mathbf{t}) \\ \Sigma_{a \mid b} &= \Sigma_{aa} - \Sigma_{ab} \Sigma_{bb}^{-1} \Sigma_{ba} \end{aligned}\]

Here, nodes::Vector{Node} is the list of covariance kernels for the latent GPs; ts and xs are the observed data, noise is the the observation noise, ts_pred are the test indexes. By default, the observation noise (noise_pred) of the test data is equal to the noise of the observed data; use noise_pred = 0. to obtain the predictive distribution over noiseless future values.

The return value v is a named tuple where

v.mvn an instance of MvNormal for the posterior predictive.
v.indexes is named tuple where
- v.indexes.F are the indexes in the covariance matrix for the latent functions at the test points.
- v.indexes.X are the indexes in the covariance matrix for the observable functions at the test points.

For predictions on in-sample time points ts, include all the requested time points in ts_pred.

See also

To compute log probabilities, Distributions.logpdf
To generate samples, Base.rand
To compute quantiles, Distributions.quantile

source

Prior Configuration

AutoGP.GP.GPConfig — Type

config = GPConfig(kwargs...)

Configuration of prior distribution over Gaussian process kernels, i.e., an instance of Node. The main kwargs (all optional) are:

node_dist_leaf::Vector{Real}: Prior distribution over LeafNode kernels; default is uniform.
node_dist_nocp::Vector{Real}: Prior distribution over BinaryOpNode kernels; only used if changepoints=false.
node_dist_cp::Vector{Real}: Prior distribution over BinaryOpNode kernels; only used if changepoints=true.
max_depth::Integer: Maximum depth of covariance node; default is -1 for unbounded.
changepoints::Bool: Whether to permit ChangePoint compositions; default is true.
noise::Union{Nothing,Float64}: Whether to use a fixed observation noise; default is nothing to infer automatically.

source