step_lowerimpute {recipes} | R Documentation |
step_lowerimpute
creates a specification of a recipe step
designed for cases where the non-negative numeric data cannot be
measured below a known value. In these cases, one method for
imputing the data is to substitute the truncated value by a
random uniform number between zero and the truncation point.
step_lowerimpute(recipe, ..., role = NA, trained = FALSE, threshold = NULL) ## S3 method for class 'step_lowerimpute' tidy(x, ...)
recipe |
A recipe object. The step will be added to the sequence of operations for this recipe. |
... |
One or more selector functions to choose which
variables are affected by the step. See |
role |
Not used by this step since no new variables are created. |
trained |
A logical to indicate if the quantities for preprocessing have been estimated. |
threshold |
A named numeric vector of lower bounds This is
|
x |
A |
step_lowerimpute
estimates the variable minimums
from the data used in the training
argument of prep.recipe
.
bake.recipe
then simulates a value for any data at the minimum
with a random uniform value between zero and the minimum.
An updated version of recipe
with the new step
added to the sequence of existing steps (if any). For the
tidy
method, a tibble with columns terms
(the
selectors or variables selected) and value
for the estimated
threshold.
library(recipes) data(biomass) ## Truncate some values to emulate what a lower limit of ## the measurement system might look like biomass$carbon <- ifelse(biomass$carbon > 40, biomass$carbon, 40) biomass$hydrogen <- ifelse(biomass$hydrogen > 5, biomass$carbon, 5) biomass_tr <- biomass[biomass$dataset == "Training",] biomass_te <- biomass[biomass$dataset == "Testing",] rec <- recipe(HHV ~ carbon + hydrogen + oxygen + nitrogen + sulfur, data = biomass_tr) impute_rec <- rec %>% step_lowerimpute(carbon, hydrogen) tidy(impute_rec, number = 1) impute_rec <- prep(impute_rec, training = biomass_tr) tidy(impute_rec, number = 1) transformed_te <- bake(impute_rec, biomass_te) plot(transformed_te$carbon, biomass_te$carbon, xlab = "pre-imputation", ylab = "imputed")