Gaussian Process Library
This section describes a library for Gaussian process time series models. A technical overview of key concepts can be found in the following references.
Roberts S, Osborne M, Ebden M, Reece S, Gibson N, Aigrain S. 2013. Gaussian processes for time-series modelling. Phil Trans R Soc A 371: 20110550. http://dx.doi.org/10.1098/rsta.2011.0550
Rasmussen C, Williams C. 2006. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA. http://gaussianprocess.org/gpml/chapters/
AutoGP.GP
— ModuleModule for Gaussian process modeling library.
Covariance Kernels
AutoGP.GP.Node
— Typeabstract type Node end
Abstract class for covariance kernels.
AutoGP.GP.LeafNode
— Typeabstract type LeafNode <: Node end
Abstract class for primitive covariance kernels.
AutoGP.GP.BinaryOpNode
— Typeabstract type BinaryOpNode <: Node end
Abstract class for [composite covariance kernels](@ref (@ref gpcovkernel_comp).
AutoGP.GP.pretty
— Functionpretty(node::Node)
Return a pretty String
representation of node
.
Base.size
— FunctionBase.size(node::Node)
Base.size(node::LeafNode) = 1
Base.size(a::LeafNode, b::Node) = size(a) + size(b)
Return the total number of subexpressions in a Node
, as defined above.
AutoGP.GP.eval_cov
— Functioneval_cov(node::Node, t1::Real, t2::Real)
eval_cov(node::Node, ts::Vector{Float64})
Evaluate the covariance function node
at the given time indexes. The first form returns a Real
number and the second form returns a covariance Matrix
.
AutoGP.GP.compute_cov_matrix
— Functioncompute_cov_matrix(node::Node, noise, ts)
Non-vectorized implementation of compute_cov_matrix_vectorized
.
AutoGP.GP.compute_cov_matrix_vectorized
— Functioncompute_cov_matrix_vectorized(node::Node, noise, ts)
Compute covariance matrix by evaluating node
on all pair of ts
. The noise
is added to the diagonal of the covariance matrix, which means that if ts[i] == ts[j]
, then X[ts[i]]
and Xs[ts[j]]
are i.i.d. samples of the true function at ts[i]
plus mean zero Gaussian noise.
Primitive Kernels
Notation. In this section, generic parameters (e.g., $\theta$, $\theta_1$, $\theta_2$), are used to denote fieldnames of the corresponding Julia structs in the same order as they appear in the constructors.
AutoGP.GP.WhiteNoise
— TypeWhiteNoise(value)
White noise covariance kernel.
\[k(t, t') = \mathbf{I}[t = t'] \theta\]
The random variables $X[t]$ and $X[t']$ are perfectly correlated whenever $t = t'$ and independent otherwise. This kernel cannot be used to represent the joint distribution of multiple i.i.d. measurements of $X[t]$, instead see compute_cov_matrix_vectorized
.
AutoGP.GP.Constant
— TypeConstant(value)
Constant covariance kernel.
\[k(t,t') = \theta\]
Draws from this kernel are horizontal lines, where $\theta_1$ determines the variance of the constant value around the mean (typically zero).
AutoGP.GP.Linear
— TypeLinear(intercept[, bias=1, amplitude=1])
Linear covariance kernel.
\[k(t, t') = \theta_2 + \theta_3 (t - \theta_1)(t'-\theta_1)\]
Draws from this kernel are sloped lines in the 2D plane. The time intercept is $\theta_1$. The variance around the time intercept is $\theta_2$. The scale factor, which dictates the slope, is $\theta_3$.
AutoGP.GP.SquaredExponential
— TypeSquaredExponential(lengthscale[, amplitude=1])
Squared Exponential covariance kernel.
\[k(t,t') = \theta_2 \exp\left(-1/2|t-t'|/\theta_2)^2 \right)\]
Draws from this kernel are smooth functions.
AutoGP.GP.GammaExponential
— TypeGammaExponential(lengthscale, gamma[, amplitude=1])
Gamma Exponential covariance kernel.
\[k(t,t') = \theta_3 \exp(-(|t-t'|/\theta_1)^{\theta_2})\]
Requires 0 < gamma <= 2
. Recovers the SquaredExponential
kernel when gamma = 2
.
AutoGP.GP.Periodic
— TypePeriodic(lengthscale, period[, amplitude=1])
Periodic covariance kernel.
\[k(t,t') = \exp\left( (-2/\theta_1^2) \sin^2((\pi/\theta_2) |t-t'|) \right)\]
The lengthscale determines how smooth the periodic function is within each period. Heuristically, the periodic kernel can be understood as:
- Sampling $[X(t), t \in [0,p]] \sim \mathrm{GP}(0, \mathrm{SE}(\theta_1))$.
- Repeating this fragment for all intervals $[jp, (j+1)p], j \in \mathbb{Z}$.
Composite Kernels
AutoGP.GP.Times
— TypeTimes(left::Node, right::Node)
Base.:*(left::Node, right::Node)
Covariance kernel obtained by multiplying two covariance kernels pointwise.
\[k(t,t') = k_{\rm left}(t,t') \times k_{\rm right}(t,t')\]
AutoGP.GP.Plus
— TypePlus(left::Node, right::Node)
Base.:+(left::Node, right::Node)
Covariance kernel obtained by summing two covariance kernels pointwise.
\[k(t,t') = k_{\rm left}(t,t') + k_{\rm right}(t,t')\]
AutoGP.GP.ChangePoint
— TypeChangePoint(left::Node, right::Node, location::Real, scale::Real)
Covariance kernel obtained by switching between two kernels at location
.
\[\begin{aligned} k(t,t') &= [\sigma_1 \cdot k_{\rm left}(t, t') \cdot \sigma_2] + [(1 - \sigma_1) \cdot k_{\rm right}(t, t') \cdot (1-\sigma_2)] \\ \mathrm{where}\, \sigma_1 &= (1 + \tanh((t - \theta_1) / \theta_2))/2, \\ \sigma_2 &= (1 + \tanh((t' - \theta_1) / \theta_2))/2. \end{aligned}\]
The location
parameter $\theta_1$ denotes the time point at which the change occurs. The scale
parameter $\theta_2$ is a nonnegative number that controls the rate of change; its behavior can be understood by analyzing the two extreme values:
If
location=0
then $k_{\rm left}$ is active and $k_{\rm right}$ is inactive for all times less thanlocation
; $k_{\rm right}$ is active and $k_{\rm left}$ is inactive for all times greater thanlocation
; and $X[t] \perp X[t']$ for all $t$ and $t'$ on opposite sides oflocation
.If
location=Inf
then $k_{\rm left}$ and $k_{\rm right}$ have equal effect for all time points, and $k(t,t') = 1/2 (k_{\rm left}(t,'t) + k_{\rm right}(t,t'))$, which is equivalent to aPlus
kernel scaled by a factor of $1/2$.
Prediction Utilities
Distributions.MvNormal
— Typedist = Distributions.MvNormal(
node::Node,
noise::Float64,
ts::Vector{Float64},
xs::Vector{Float64},
ts_pred::Vector{Float64};
noise_pred::Union{Nothing,Float64}=nothing)
Return MvNormal
posterior predictive distribution over xs_pred
at time indexes ts_pred
, given noisy observations [ts, xs]
and covariance function node
with given level of observation noise
.
By default, the observation noise (noise_pred
) of the new data is equal to the noise
of the observed data; use noise_pred = 0.
to obtain the predictive distribution over noiseless future values.
See also
- To compute log probabilities,
Distributions.logpdf
- To generate samples,
Base.rand
- To compute quantiles,
Distributions.quantile
Statistics.quantile
— FunctionDistributions.quantile(dist::Distributions.MvNormal, p)
Compute quantiles of marginal distributions of dist
.
Examples
Distributions.quantile(Distributions.MvNormal([0,1,2,3], LinearAlgebra.I(4)), .5)
Distributions.quantile(Distributions.MvNormal([0,1,2,3], LinearAlgebra.I(4)), [[.1, .5, .9]])
Prior Configuration
AutoGP.GP.GPConfig
— Typeconfig = GPConfig(kwargs...)
Configuration of prior distribution over Gaussian process kernels, i.e., an instance of Node
. The main kwargs
(all optional) are:
node_dist_leaf::Vector{Real}
: Prior distribution overLeafNode
kernels; default is uniform.node_dist_nocp::Vector{Real}
: Prior distribution overBinaryOpNode
kernels; only used ifchangepoints=false
.node_dist_cp::Vector{Real}
: Prior distribution overBinaryOpNode
kernels; only used ifchangepoints=true
.max_depth::Integer
: Maximum depth of covariance node; default is-1
for unbounded.changepoints::Bool
: Whether to permitChangePoint
compositions; default istrue
.noise::Union{Nothing,Float64}
: Whether to use a fixed observation noise; default isnothing
to infer automatically.