numero.prepare {Numero} | R Documentation |
Prepare training data by mitigating confounding factors and standardizing values.
numero.prepare(data, variables = NULL, confounders = NULL, batch = NULL, method = "standard", pipeline = NULL)
data |
A matrix or a data frame. |
variables |
A character vector of column names. |
confounders |
Names of columns that contain confounder data. |
batch |
The name of the column that contains batch labels. |
method |
Method to standardize values, see |
pipeline |
Processing parameters from a previous use of the function. |
We recommend first applying numero.clean()
to the full
dataset, then selecting a subset for training using the input argument
variables
. This preserves any attributes that may be used in
Numero functions.
If a previous pipeline
is available, it overrides all processing
parameters irrespective of other input arguments.
Due to safeguards against numerical instability, the standardized values may deviate slightly from the expected range (<0.1
A data frame with the attributes "pipeline" that contains the processing parameters and "subsets" that contains row names divided into batches if batch correction was applied.
Ville-Petteri Makinen
# Import data. fname <- system.file("extdata", "finndiane.txt", package = "Numero") dataset <- read.delim(file = fname) # Set identities and manage missing data. dataset <- numero.clean(dataset, identity = "INDEX") # Prepare training variables using default standardization. trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB") trdata <- numero.prepare(data = dataset, variables = trvars) print(summary(trdata)) # Prepare training values adjusted for age and sex and # standardized by rank-based method. trdata <- numero.prepare(data = dataset, variables = trvars, batch = "MALE", confounders = "AGE", method = "tapered") print(summary(trdata))