machine-learning kernel-density probability-density

Calculate Bias of Parzen WIndows analytically

I'm still having some trouble understanding what Bias and Variance for a specific estimator actually are.

I'm working with the definition of Bias as it is found on Wikipedia:

Bias of an estimator

If we define kernel-density-estimates as

Wikipedia definition of kernel density estimate

But how can I apply this to kernel density estimation, or to be more exact Parzen Windows? Can someone at least give me an idea how the estimated density f_hat(x) relates to Bias (and Variance)?

Qualitative I can already tell, that a box-window containing the whole data space will have maximum bias and no variance as the estimated density will simply be the average of the whole training data set.

Solution

I think I just figured it out myself. The parameter theta in the case of density estimation is .. drumroll... the density function f(x). So the bias is defined as

Bias = E[f_hat(x)] - f(x)

The E[f_hat(x)] term is the expected value or the mean of the window function. Calculating it involves a simple integral.

f(x) is the true density function of the data, which in reality is likely to be unknown.