Gaussian Process Library
This section describes a library for Gaussian process time series models. A technical overview of key concepts can be found in the following references.
Roberts S, Osborne M, Ebden M, Reece S, Gibson N, Aigrain S. 2013. Gaussian processes for time-series modelling. Phil Trans R Soc A 371: 20110550. http://dx.doi.org/10.1098/rsta.2011.0550
Rasmussen C, Williams C. 2006. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA. http://gaussianprocess.org/gpml/chapters/
Covariance Kernels
AutoGP.GP.Node — Type
AutoGP.GP.LeafNode — Type
AutoGP.GP.BinaryOpNode — Type
abstract type BinaryOpNode <: Node endAbstract class for [composite covariance kernels](@ref (@ref gpcovkernel_comp).
sourceAutoGP.GP.pretty — Function
AutoGP.GP.eval_cov — Function
eval_cov(node::Node, t1::Real, t2::Real)
eval_cov(node::Node, ts::Vector{Float64})Evaluate the covariance function node at the given time indexes. The first form returns a Real number and the second form returns a covariance Matrix.
AutoGP.GP.compute_cov_matrix — Function
compute_cov_matrix(node::Node, noise, ts)Non-vectorized implementation of compute_cov_matrix_vectorized.
AutoGP.GP.compute_cov_matrix_vectorized — Function
compute_cov_matrix_vectorized(node::Node, noise, ts)Compute covariance matrix by evaluating node on all pair of ts. The noise is added to the diagonal of the covariance matrix, which means that if ts[i] == ts[j], then X[ts[i]] and Xs[ts[j]] are i.i.d. samples of the true function at ts[i] plus mean zero Gaussian noise.
AutoGP.GP.extract_kernel — Function
extract_kernel(node::Node, ::Type{T}; retain::Bool=false) where T<:LeafNodeRetain only those primitive kernels in node of type T <: LeafNode, by replacing all other primitive kernels with an appropriate dummy kernel:
If all primitive kernels in node are of type T, the return value is Constant(0).
If retain=false then the behavior is flipped: the primitive kernels of type T are removed, while the others are retained.
AutoGP.GP.split_kernel_sop — Function
split_kernel_sop(node::Node, ::Type{T}) where T<:LeafNodeSplits the kernel $k$ denoted by node according to a sum-of-products interpretation. In particular, write
\[k = k_{11}k_{12}\cdots k_{1n_1} + k_{21}k_{22}\cdots k_{2n_2} + \dots + k_{m1}k_{m2}\cdots k_{m n_m}.\]
For a given primitive base kernel type T we can rewrite the above expression as
\[k = k^{\rm T} + k^{\rm nT},\]
where $k^{\rm T}$ contains all addends with a factor of type T, and $k^{\rm nT}$ are the addends without a factor of type T.
The function returns a pair (node_a, node_b) corresponding to $k^{\rm T}$ and $k^{\rm nT}$ above, with Constant(0) serving as the sentinel value.
Examples
julia> l = Linear(1); p = Periodic(1,1); c = Constant(1)
julia> split_kernel_sop(l, Linear)
(l, Constant(0))
julia> split_kernel_sop(l, Periodic)
(Constant(0), l)
julia> split_kernel_sop(l*p + l*c, Periodic)
(l*p, l*c)
julia> split_kernel_sop(p*p, Periodic)
(p*p, Constant(0))sourceAutoGP.GP.reparameterize — Function
reparameterize(node::Node, t::LinearTransform)Reparameterize the covariance kernel according to the given LinearTransform applied to the input (known as an "input warping"). For a kernel $k(\cdot,\cdot; \theta)$ and a linear transform $f(t) = at+b$ over the time domain, this function returns a kernel with new parameters $\theta'$ such that $k(at+b, au+b; \theta) = k(t, u; \theta')$.
AutoGP.GP.rescale — Function
rescale(node::Node, t::LinearTransform)Rescale the covariance kernel according to the given LinearTransform applied to the output. In particular, for a GP $X \sim \mathrm{GP}(0, k(\cdot,\cdot; \theta))$ and a transformation $Y = aX + b$, this function returns a kernel with new parameters $\theta'$ such that $Y \sim \mathrm{GP}(b, k(\cdot,\cdot; \theta'))$.
Primitive Kernels
Notation. In this section, generic parameters (e.g., $\theta$, $\theta_1$, $\theta_2$), are used to denote fieldnames of the corresponding Julia structs in the same order as they appear in the constructors.
AutoGP.GP.WhiteNoise — Type
WhiteNoise(value)White noise covariance kernel.
\[k(t, t') = \mathbf{I}[t = t'] \theta\]
The random variables $X[t]$ and $X[t']$ are perfectly correlated whenever $t = t'$ and independent otherwise. This kernel cannot be used to represent the joint distribution of multiple i.i.d. measurements of $X[t]$, instead see compute_cov_matrix_vectorized.
AutoGP.GP.Constant — Type
Constant(value)Constant covariance kernel.
\[k(t,t') = \theta\]
Draws from this kernel are horizontal lines, where $\theta$ determines the variance of the constant value around the mean (typically zero).
sourceAutoGP.GP.Linear — Type
Linear(intercept[, bias=1, amplitude=1])Linear covariance kernel.
\[k(t, t') = \theta_2 + \theta_3 (t - \theta_1)(t'-\theta_1)\]
Draws from this kernel are sloped lines in the 2D plane. The time intercept is $\theta_1$. The variance around the time intercept is $\theta_2$. The scale factor, which dictates the slope, is $\theta_3$.
sourceAutoGP.GP.SquaredExponential — Type
SquaredExponential(lengthscale[, amplitude=1])Squared Exponential covariance kernel.
\[k(t,t') = \theta_2 \exp\left(-1/2|t-t'|/\theta_2)^2 \right)\]
Draws from this kernel are smooth functions.
sourceAutoGP.GP.GammaExponential — Type
GammaExponential(lengthscale, gamma[, amplitude=1])Gamma Exponential covariance kernel.
\[k(t,t') = \theta_3 \exp(-(|t-t'|/\theta_1)^{\theta_2})\]
Requires 0 < gamma <= 2. Recovers the SquaredExponential kernel when gamma = 2.
AutoGP.GP.Periodic — Type
Periodic(lengthscale, period[, amplitude=1])Periodic covariance kernel.
\[k(t,t') = \exp\left( (-2/\theta_1^2) \sin^2((\pi/\theta_2) |t-t'|) \right)\]
The lengthscale determines how smooth the periodic function is within each period. Heuristically, the periodic kernel can be understood as:
- Sampling $[X(t), t \in [0,p]] \sim \mathrm{GP}(0, \mathrm{SE}(\theta_1))$.
- Repeating this fragment for all intervals $[jp, (j+1)p], j \in \mathbb{Z}$.
Composite Kernels
AutoGP.GP.Times — Type
Times(left::Node, right::Node)
Base.:*(left::Node, right::Node)Covariance kernel obtained by multiplying two covariance kernels pointwise.
\[k(t,t') = k_{\rm left}(t,t') \times k_{\rm right}(t,t')\]
sourceAutoGP.GP.Plus — Type
Plus(left::Node, right::Node)
Base.:+(left::Node, right::Node)Covariance kernel obtained by summing two covariance kernels pointwise.
\[k(t,t') = k_{\rm left}(t,t') + k_{\rm right}(t,t')\]
sourceAutoGP.GP.ChangePoint — Type
ChangePoint(left::Node, right::Node, location::Real, scale::Real)Covariance kernel obtained by switching between two kernels at location.
\[\begin{aligned} k(t,t') &= [\sigma_1 \cdot k_{\rm left}(t, t') \cdot \sigma_2] + [(1 - \sigma_1) \cdot k_{\rm right}(t, t') \cdot (1-\sigma_2)] \\ \mathrm{where}\, \sigma_1 &= (1 + \tanh((t - \theta_1) / \theta_2))/2, \\ \sigma_2 &= (1 + \tanh((t' - \theta_1) / \theta_2))/2. \end{aligned}\]
The location parameter $\theta_1$ denotes the time point at which the change occurs. The scale parameter $\theta_2$ is a nonnegative number that controls the rate of change; its behavior can be understood by analyzing the two extreme values:
If
location=0then $k_{\rm left}$ is active and $k_{\rm right}$ is inactive for all times less thanlocation; $k_{\rm right}$ is active and $k_{\rm left}$ is inactive for all times greater thanlocation; and $X[t] \perp X[t']$ for all $t$ and $t'$ on opposite sides oflocation.If
location=Infthen $k_{\rm left}$ and $k_{\rm right}$ have equal effect for all time points, and $k(t,t') = 1/2 (k_{\rm left}(t,'t) + k_{\rm right}(t,t'))$, which is equivalent to aPluskernel scaled by a factor of $1/2$.
Prediction Utilities
Distributions.MvNormal — Type
dist = Distributions.MvNormal(
node::Node,
noise::Float64,
ts::Vector{Float64},
xs::Vector{Float64},
ts_pred::Vector{Float64};
noise_pred::Union{Nothing,Float64}=nothing)Return MvNormal posterior predictive distribution over xs_pred at time indexes ts_pred, given noisy observations [ts, xs] and covariance function node with given level of observation noise. The model is
\[\begin{aligned} \begin{bmatrix} X(\mathbf{t})\\ X(\mathbf{t}^*) \end{bmatrix} \sim \mathrm{MultivariteNormal} \left( \mathbf{0}, \begin{bmatrix} k(\mathbf{t}, \mathbf{t}) + \eta I & k(\mathbf{t}, \mathbf{t}^*) \\ k(\mathbf{t}^*, \mathbf{t}) + \eta I & k(\mathbf{t}^*, \mathbf{t}^*) + \eta^* I \end{bmatrix} \right). \end{aligned}\]
The function returns the conditional multivariate normal distribution
\[X(\mathbf{t}^*) \mid X(\mathbf{t}) = x(\mathbf{t}).\]
By default, the observation noise (noise_pred) of the new data is equal to the noise of the observed data; use noise_pred = 0. to obtain the predictive distribution over noiseless future values.
See also
- To compute log probabilities,
Distributions.logpdf - To generate samples,
Base.rand - To compute quantiles,
Distributions.quantile
Statistics.quantile — Function
Distributions.quantile(dist::Distributions.MvNormal, p)Compute quantiles of marginal distributions of dist.
Examples
Distributions.quantile(Distributions.MvNormal([0,1,2,3], LinearAlgebra.I(4)), .5)
Distributions.quantile(Distributions.MvNormal([0,1,2,3], LinearAlgebra.I(4)), [[.1, .5, .9]])sourceAutoGP.GP.infer_gp_sum — Function
(mvn, indexes) = infer_gp_sum(
nodes::Vector{Node},
noise::Float64,
ts::Vector{Float64},
xs::Vector{Float64},
ts_pred::Vector{Float64};
noise_pred::Union{Nothing,Float64}=nothing)Consider a family of $m$ independent Gaussian process kernels
\[\begin{aligned} F_i \sim \mathrm{GP}(\mathbf{0}, k_i) && (1 \le i \le m). \end{aligned}\]
and let $\varepsilon(t)$ be an i.i.d. noise process with variance $\eta$.
Suppose we can observe the noisy sum of these latent GP functions:
\[\begin{aligned} X(t) = \sum_{i=1}^{m} F_i(t) + \varepsilon(t) && X \sim \mathrm{GP}(0, k_1 + \dots + k_m + \eta). \end{aligned}\]
Given observed data $(\mathbf{t}, x(\mathbf{t}))$ and test points $\mathbf{t}^*$, this function computes the joint multivariate normal posterior over all unknown components
\[\left[F_1(\mathbf{t}^*), \dots, F_m(\mathbf{t}^*), X(\mathbf{t}^*) \right] \mid X(\mathbf{t})=x(\mathbf{t}).\]
Inference is performed on the multivariate Gaussian system, which has a block diagonal structure
\[\begin{aligned} Z \coloneqq \begin{bmatrix} F_1(\mathbf{t}^*) \\ \vdots \\ F_m(\mathbf{t}^*) \\ X(\mathbf{t}^*) \\ X(\mathbf{t}) \end{bmatrix}, && Z \sim \mathrm{MultivariteNormal} \left( \mathbf{0}, \begin{bmatrix} \Sigma_{aa} & \Sigma_{ab} \\ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix}\right), \end{aligned}\]
where
\[\begin{aligned} \Sigma_{aa} &\coloneqq \begin{bmatrix} \mathrm{blkdiag}\left( k_1(\mathbf{t}^*,\mathbf{t}^*), \dots k_m(\mathbf{t}^*,\mathbf{t}^*) \right) & \begin{matrix} k_1(\mathbf{t}^*,\mathbf{t}^*) \\ \vdots \\ k_m(\mathbf{t}^*,\mathbf{t}^*) \end{matrix} \\ \begin{matrix} k_1(\mathbf{t}^*,\mathbf{t}^*) & \dots & k_m(\mathbf{t}^*,\mathbf{t}^*) \end{matrix} & s(\mathbf{t}^*, \mathbf{t}^*) + \eta^*I \end{bmatrix}, \\[15pt] \Sigma_{ab} &\coloneqq \begin{bmatrix} k_1(\mathbf{t}^*, \mathbf{t}) \\ \vdots \\ k_m(\mathbf{t}^*, \mathbf{t}) \\ s(\mathbf{t}^*, \mathbf{t}) \end{bmatrix}, \quad \Sigma_{ba} \coloneqq \Sigma_{ba}^{\top}, \\[15pt] \Sigma_{bb} &\coloneqq s(\mathbf{t}, \mathbf{t}) + \eta I, \end{aligned}\]
with $s(t,u) \coloneqq k_1(t,u) + \dots + k_m(t,u)$.
The posterior is then
\[\begin{aligned} \mu_{a \mid b} &= \Sigma_{ab} \Sigma_{bb}^{-1} x(\mathbf{t}) \\ \Sigma_{a \mid b} &= \Sigma_{aa} - \Sigma_{ab} \Sigma_{bb}^{-1} \Sigma_{ba} \end{aligned}\]
Here, nodes::Vector{Node} is the list of covariance kernels for the latent GPs; ts and xs are the observed data, noise is the the observation noise, ts_pred are the test indexes. By default, the observation noise (noise_pred) of the test data is equal to the noise of the observed data; use noise_pred = 0. to obtain the predictive distribution over noiseless future values.
The return value v is a named tuple where
v.mvnan instance ofMvNormalfor the posterior predictive.v.indexesis named tuple wherev.indexes.Fare the indexes in the covariance matrix for the latent functions at the test points.v.indexes.Xare the indexes in the covariance matrix for the observable functions at the test points.
For predictions on in-sample time points ts, include all the requested time points in ts_pred.
See also
- To compute log probabilities,
Distributions.logpdf - To generate samples,
Base.rand - To compute quantiles,
Distributions.quantile
Prior Configuration
AutoGP.GP.GPConfig — Type
config = GPConfig(kwargs...)Configuration of prior distribution over Gaussian process kernels, i.e., an instance of Node. The main kwargs (all optional) are:
node_dist_leaf::Vector{Real}: Prior distribution overLeafNodekernels; default is uniform.node_dist_nocp::Vector{Real}: Prior distribution overBinaryOpNodekernels; only used ifchangepoints=false.node_dist_cp::Vector{Real}: Prior distribution overBinaryOpNodekernels; only used ifchangepoints=true.max_depth::Integer: Maximum depth of covariance node; default is-1for unbounded.changepoints::Bool: Whether to permitChangePointcompositions; default istrue.noise::Union{Nothing,Float64}: Whether to use a fixed observation noise; default isnothingto infer automatically.