I can't understand how computing kernel, K(x,z) takes only linear time despite working in O(n^d) dimensional space. Please explain me. I'm a newbie in ML.
In order to compute K(x, z)
, you have to:
O(n)
multiplications x1 * z1
, x2 * x2
, ..., xn * zn
,O(n)
additions (x1 * z1) + (x2 * x2) + ... + (xn * zn)
,O(1)
operations _ + c
and _ ^ d
.Thus, computing K(x, z) = (dot(x, z) + c)^d
takes O(n)
time.
It's completely normal that the feature space has much higher dimension than the time it takes to compute the kernel: otherwise we wouldn't need the kernels in the first place, because we could just compute the feature vectors directly.
If you want a more extreme example, take a look at K(x, y) = min(x, y)
on non-negative real numbers x, y
. It takes constant time to evaluate min(x, y)
. However, the feature space is L^2(R)
(square-integrable functions on the real line, with the standard Hilbert-space scalar product), and the feature mapping is phi(x) = chi_{[0, x]}
, where chi_{[0, x]}
denotes the characteristic function of the interval [0, x]
. Thus, the feature space is infinite-dimensional, but the time it takes to evaluate the kernel is constant.