Here we will see how to combine the use of PCA and K-means so that interpretation of clustering may be (hopefully) easier.

First, of course, retrieve the data.

url <- "http://mribatet.perso.math.cnrs.fr/CentraleNantes/Data/Ligue1Fifa22.csv"
data <- read.table(url)
dim(data)
[1] 129  52
## Ouch plenty of features

Let’s start with the PCA. Here I treat qualitative variable as we should do and add as supplementary quantitative the overall score (since it is probably a (weighted?) average of other scores).

library(FactoMineR)
Registered S3 method overwritten by 'htmlwidgets':
  method           from         
  print.htmlwidget tools:rstudio
pca <- PCA(data, quali.sup = c(1, 9, 10, 12), quanti.sup = 2, ncp = 10)


## Now I should interpret the PCA as we learnt in the lecture

We can move to unsupervised clustering with the k–means using the PCA coordinates. Here I will keep only the first 10 factorial axis (hence npc = 10 in the call to PCA above). I will use 4 clusters but you know how to select the right number of clusters…

pca_coord <- pca$ind$coord
my_clust <- kmeans(pca_coord, 4, nstart = 100)

Based on this clustering we can do a nice plot mixing individual positions on the first factorial plane and clustering.

player_names <- rownames(data)
plot(pca_coord[,1:2], type = "n")##set the plot without actually plotting points
text(pca_coord[,1:2], labels = player_names, col = my_clust$cluster)

Now, because of the first analysis done in PCA, you can easily interpret the meaning of each clusters.

LS0tCnRpdGxlOiAiTWl4aW5nIFBDQSBhbmQgSy1tZWFucyIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKSGVyZSB3ZSB3aWxsIHNlZSBob3cgdG8gY29tYmluZSB0aGUgdXNlIG9mIFBDQSBhbmQgSy1tZWFucyBzbyB0aGF0IGludGVycHJldGF0aW9uIG9mIGNsdXN0ZXJpbmcgbWF5IGJlIChob3BlZnVsbHkpIGVhc2llci4KCkZpcnN0LCBvZiBjb3Vyc2UsIHJldHJpZXZlIHRoZSBkYXRhLgpgYGB7cn0KdXJsIDwtICJodHRwOi8vbXJpYmF0ZXQucGVyc28ubWF0aC5jbnJzLmZyL0NlbnRyYWxlTmFudGVzL0RhdGEvTGlndWUxRmlmYTIyLmNzdiIKZGF0YSA8LSByZWFkLnRhYmxlKHVybCkKZGltKGRhdGEpCiMjIE91Y2ggcGxlbnR5IG9mIGZlYXR1cmVzCmBgYAoKTGV0J3Mgc3RhcnQgd2l0aCB0aGUgUENBLiBIZXJlIEkgdHJlYXQgcXVhbGl0YXRpdmUgdmFyaWFibGUgYXMgd2Ugc2hvdWxkIGRvIGFuZCBhZGQgYXMgc3VwcGxlbWVudGFyeSBxdWFudGl0YXRpdmUgdGhlIG92ZXJhbGwgc2NvcmUgKHNpbmNlIGl0IGlzIHByb2JhYmx5IGEgKHdlaWdodGVkPykgYXZlcmFnZSBvZiBvdGhlciBzY29yZXMpLgpgYGB7cn0KbGlicmFyeShGYWN0b01pbmVSKQpwY2EgPC0gUENBKGRhdGEsIHF1YWxpLnN1cCA9IGMoMSwgOSwgMTAsIDEyKSwgcXVhbnRpLnN1cCA9IDIsIG5jcCA9IDEwKQoKIyMgTm93IEkgc2hvdWxkIGludGVycHJldCB0aGUgUENBIGFzIHdlIGxlYXJudCBpbiB0aGUgbGVjdHVyZQpgYGAKCldlIGNhbiBtb3ZlIHRvIHVuc3VwZXJ2aXNlZCBjbHVzdGVyaW5nIHdpdGggdGhlIGstLW1lYW5zIHVzaW5nIHRoZSBQQ0EgY29vcmRpbmF0ZXMuIEhlcmUgSSB3aWxsIGtlZXAgb25seSB0aGUgZmlyc3QgMTAgZmFjdG9yaWFsIGF4aXMgKGhlbmNlIG5wYyA9IDEwIGluIHRoZSBjYWxsIHRvIFBDQSBhYm92ZSkuIEkgd2lsbCB1c2UgNCBjbHVzdGVycyBidXQgeW91IGtub3cgaG93IHRvIHNlbGVjdCB0aGUgcmlnaHQgbnVtYmVyIG9mIGNsdXN0ZXJzLi4uCmBgYHtyfQpwY2FfY29vcmQgPC0gcGNhJGluZCRjb29yZApteV9jbHVzdCA8LSBrbWVhbnMocGNhX2Nvb3JkLCA0LCBuc3RhcnQgPSAxMDApCmBgYAoKQmFzZWQgb24gdGhpcyBjbHVzdGVyaW5nIHdlIGNhbiBkbyBhIG5pY2UgcGxvdCBtaXhpbmcgaW5kaXZpZHVhbCBwb3NpdGlvbnMgb24gdGhlIGZpcnN0IGZhY3RvcmlhbCBwbGFuZSBhbmQgY2x1c3RlcmluZy4KYGBge3J9CnBsYXllcl9uYW1lcyA8LSByb3duYW1lcyhkYXRhKQpwbG90KHBjYV9jb29yZFssMToyXSwgdHlwZSA9ICJuIikjI3NldCB0aGUgcGxvdCB3aXRob3V0IGFjdHVhbGx5IHBsb3R0aW5nIHBvaW50cwp0ZXh0KHBjYV9jb29yZFssMToyXSwgbGFiZWxzID0gcGxheWVyX25hbWVzLCBjb2wgPSBteV9jbHVzdCRjbHVzdGVyKQpgYGAKTm93LCBiZWNhdXNlIG9mIHRoZSBmaXJzdCBhbmFseXNpcyBkb25lIGluIFBDQSwgeW91IGNhbiBlYXNpbHkgaW50ZXJwcmV0IHRoZSBtZWFuaW5nIG9mIGVhY2ggY2x1c3RlcnMuCgo=