pddply {Smisc} | R Documentation |
Parallel implementation of plyr::ddply
that suppresses a spurious warning when
plyr::ddply
is called in parallel.
All of the arguments except njobs
are passed directly to arguments of the same name in
plyr::ddply
.
pddply(.data, .variables, .fun = NULL, ..., njobs = parallel::detectCores() - 1, .progress = "none", .inform = FALSE, .drop = TRUE, .paropts = NULL)
.data |
data frame to be processed |
.variables |
character vector of variables in |
.fun |
function to apply to each piece |
... |
other arguments passed on to '.fun' |
njobs |
the number of parallel jobs to launch, defaulting to one less than the number of available cores on the machine |
.progress |
name of the progress bar to use, see |
.inform |
produce informative error messages? This is turned off by default because it substantially slows processing speed, but is very useful for debugging |
.drop |
should combinations of variables that do not appear in the input data be preserved (FALSE) or dropped (TRUE, default) |
.paropts |
a list of additional options passed into the |
An innocuous warning is thrown when plyr::ddply
is called in parallel:
https://github.com/hadley/plyr/issues/203. This function catches and hides that warning, which looks like this:
Warning messages:
1: <anonymous>: ... may be used in an incorrect context: '.fun(piece, ...)'
If njobs = 1
, a call to plyr::ddply
is made without parallelization, and anything
supplied to .paropts
is ignored. See the documentation for plyr::ddply
for additional details.
The object data frame returned by plyr::ddply
data(baseball, package = "plyr") # Summarize the number of entries for each year in the baseball dataset with 2 jobs o1 <- pddply(baseball, ~ year, nrow, njobs = 2) head(o1) # Verify it's the same as the non-parallel version of plyr::ddply() o2 <- plyr::ddply(baseball, ~ year, nrow) identical(o1, o2) # Another possibility o3 <- pddply(baseball, "lg", c("nrow", "ncol"), njobs = 2) o3 o4 <- plyr::ddply(baseball, "lg", c("nrow", "ncol")) identical(o3, o4) # A nonsense example where we need to pass objects and packages into the cluster number1 <- 7 f <- function(x, number2 = 10) { paste(x$id[1], padZero(number1, num = 2), number2, sep = "-") } # In parallel o5 <- pddply(baseball[1:100,], "year", f, number2 = 13, njobs = 2, .paropts = list(.packages = "Smisc", .export = "number1")) o5 # Non parallel o6 <- plyr::ddply(baseball[1:100,], "year", f, number2 = 13) identical(o5, o6)