My home, musings, and wanderings on the world wide web.
EDA: Plotting least squares fit line in R
I have recently started reading ISLR and am finding the plots in the
book very useful.
A visualization aid one often uses for exploratory data analysis is a scatter
plot of the response variable against a potential predictor. Overlaying the
ordinary least squares fit line on this scatter provides a readily accessible
visual representation of the effect of the predictor on the response (if any).
Following is a simple snippet that I wrote in R to plot such graphs for any
arbitrary dataset with some numeric response variable. Note that the function
only attempts the plots for predictors which are numeric (or integer). It also
attempts a crude adjustment of the layout of the plot according to the number
of predictors.
Plotting OLS fit of features against the response
1234567891011121314151617
plotLeastSqFit =function(df, responseVar){stopifnot(is.data.frame(df), responseVar %in%colnames(df),is.numeric(df[[responseVar]])) areNumeric =setdiff(colnames(df)[sapply(df,is.numeric)], responseVar)if(length(areNumeric)<=3){ mfRow =c(1,length(areNumeric))}else{ mfRow =c(ceiling(length(areNumeric)/2),2)} par(mfrow = mfRow)lapply(X = areNumeric, FUN =function(x){ plot(y = df[[responseVar]], x = df[[x]], col ="red", lwd =1.5, ylab = responseVar, xlab = x, main =sprintf("LS fit of %s against %s", responseVar, x)) abline(lm(as.formula(paste(responseVar,"~", x)), data = df), col ="blue", lwd =2)})}
Here are sample plots from this function for a couple of the ISLR datasets.
For the mtcars dataset
12345
library(ISLR)data(mtcars)## Choose only a few columns.plotLeastSqFit(df = mtcars[c("mpg","cyl","hp","wt")], responseVar ="mpg")