k-Nearest-Neighbors

k-nearest-neighbors or knn is arguably one of the most simple predictive models for classification. Given a training set and a new data point, it looks for the closest k data points and predicts whatever class the majority of them has. Typical values for k are 1, 3 and 5.

Code

library("class")

# set a seed value to make your analysis reproductible

set.seed(123)

data("iris")

m = as.matrix(iris)

# training/test split

size <- floor(nrow(m) * 0.75)

index <- sample(seq_len(nrow(m)), size = size)

train <- m[index, ]

test <- m[-index, ]

Xtrain = train[,1:4]

ytrain = train[,5]

Xtest = test[,1:4]

ytest = test[,5]

# build k-nearest-neighbor model

predicted <- knn(train=Xtrain, test=Xtest, cl=as.factor(ytrain), k=3)

# evaluate prediction

correct = ytest == predicted

sum(correct) / length(correct)

knn is useful as a baseline to compare other models.

results matching ""

    No results matching ""