Nonparametric Regression
Published:
This post summarizes nonparametric regression.
Non-parametric Regression
Assume that we have data $\{(X_1,Y_1),\ldots,(X_n,Y_n)\}$ and our goal is to estimate the regression function $$r(x)=\mathbb{E}[Y\mid X=x].$$ Recall that we can estimate the cdf without distributional assumption by Glivenko-Cantelli Theorem; however, we need smoothness assumption on $r(x)$ for non-parametric regression.
For any estimator $\hat{r}$, we define the integrated squared loss by $$L(\hat{r},r)=\int_{\mathcal{X}}[\hat{r}(x)-r(x)]^2\mathrm{d}x,$$ which depends on both $(X_i,y_i)$'s. The risk is thus given by $$R(\hat{r},r)=\mathbb{E}[L(\hat{r},r)]=\int_{\mathcal{X}}b^2(x)\mathrm{d}x+\int_{\mathcal{X}}v(x)\mathrm{d}x,$$ by switching the integral and expectation, where $$\begin{align*} b(x) &= \mathbb{E}[\hat{r}(x)] - r(x)\\ v(x) &= \mathbb{E}[(\hat{r}(x) - r(x))^2]. \end{align*}$$
Suppose we knew the joint distribution over $(X,Y)$. Then an alternative measurement of risk can be the prediction error $$R(\hat{r})=\mathbb{E}[(Y-\hat{r}(X))^2],$$ which is minimized by $r(x)=\mathbb{E}[Y\mid X=x].$
Kernel Smoothing
We use the first risk measurement for the following analysis. We assume that there is only one covariate, while it can be readily generalized to the multivariate case. The estimator takes the form $$\hat{r}(x)=\sum\limits_{i=1}^nw_i(x)Y_i,$$ where the weights are defined by$$w_i(x)=\frac{K\left(\frac{x-X_i}{h}\right)}{\sum\limits_{j=1}^nK\left(\frac{x-X_j}{h}\right)},$$ $h$ is the bandwidth controlling the amount of smoothing, and $K$ is the kernel.