EDA: Plotting non linear decision boundaries in R

Jun 5th, 2014 1:22 pm

'Figure 2.13, ISLR'

So continuing with the beautiful plots in ISLR, here is a discussion I had on SO today about hot to plot decision boundaries or arbitrary non-linear curves. The discussion on SE that was linked to in the answer was even more useful. Plus, I picked up two new functions today: curve and contour.

ISLR: Notes - Chapter 2

Jun 5th, 2014 10:56 am

Non-parametric methods seek an estimate of $f$ that gets as close to the data points as possible without being too rough or wiggly. Non-parametric approaches completely avoid the danger of the chosen functional form being too far from the true $f$. The disadvantage of non-parametric methods is that they need a large set of observations to obtain an accurate estimate of $f$. Therefore, the informational requirements of non-parametric methods are larger.

Contrast this to parametric methods. Parametric methods essentially extrapolate information from one region of the domain to another. This is because global regularities are assumed in the functional form. A non-parametric method however has to trace the surface $f$ in all regions of the domain to be valid.

EDA: Plotting least squares fit line in R

Jun 4th, 2014 12:22 pm

I have recently started reading ISLR and am finding the plots in the book very useful.

A visualization aid one often uses for exploratory data analysis is a scatter plot of the response variable against a potential predictor. Overlaying the ordinary least squares fit line on this scatter provides a readily accessible visual representation of the effect of the predictor on the response (if any).

Following is a simple snippet that I wrote in R to plot such graphs for any arbitrary dataset with some numeric response variable. Note that the function only attempts the plots for predictors which are numeric (or integer). It also attempts a crude adjustment of the layout of the plot according to the number of predictors.

R: Finding identifier variables

Apr 3rd, 2014 3:31 pm

I had never expected such a problem, much less a solution, to exist till I was asked yesterday to solve it. The problem statement: given a dataset and a list of candidate variables, find which minimal combination, if any, is a valid identifier for the observations in the dataset.

The corrected AIC

Mar 22nd, 2014 11:35 pm

Only today I discovered that the Akaike Information Criterion is valid only asymptotically and that there exists a correction (in fact, a strongly recommended correction) for finite samples. Here is a quick copy-paste from Wikipedia.

← Older Blog Archives Newer →

asb: head /dev/brain > /dev/www

My home, musings, and wanderings on the world wide web.

EDA: Plotting non linear decision boundaries in R

ISLR: Notes - Chapter 2

EDA: Plotting least squares fit line in R

R: Finding identifier variables

The corrected AIC