nroPrune {Numero} | R Documentation |
Detect and merge collinear columns into first principal components.
nroPrune(data, modules)
data |
A matrix or a data frame. |
modules |
Pruning parameter, see details. |
The pruning parameter modules
is an integer that sets the desired
number of columns in the pruned dataset. If necessary, the number is
automatically revised if the original value cannot be applied to the dataset.
The input argument modules
can also be a list object that is attached
to the output of a previous call to the function, see the description of
the return value.
To determine modules of collinear variables, the function uses K-means clustering with Spearman correlation as the distance metric.
A data frame or a matrix where a module of collinear columns has been replaced by a single column. The aggregated values are linear combinations of the module columns; the coefficients define the principal component of the module data.
The output also contains the attribute "modules", which can be passed to the function to replicate the same pruning procedure for another dataset.
Ville-Petteri Makinen
# Import data. fname <- system.file("extdata", "finndiane.txt", package = "Numero") dataset <- read.delim(file = fname) # Split into men and women. ds.men <- dataset[which(dataset$MALE == 1),] ds.women <- dataset[which(dataset$MALE == 0),] # Exclude unusable columns. ds.men$INDEX <- NULL ds.women$INDEX <- NULL ds.men$MALE <- NULL ds.women$MALE <- NULL # Merge collinear variables in one dataset according to the other. results.men <- nroPrune(data = ds.men, modules = 3) results.women <- nroPrune(data = ds.women, modules = results.men) print(attr(results.men, "modules")) print(summary(results.men$MODULE.1)) print(summary(results.women$MODULE.1))