spark_apply {sparklyr} | R Documentation |
Applies an R function to a Spark object (typically, a Spark DataFrame).
spark_apply(x, f, columns = colnames(x), memory = TRUE, group_by = NULL, packages = TRUE, context = NULL, ...)
x |
An object (usually a |
f |
A function that transforms a data frame partition into a data frame.
The function |
columns |
A vector of column names or a named vector of column types for the transformed object. Defaults to the names from the original object and adds indexed column names when not enough columns are specified. |
memory |
Boolean; should the table be cached into memory? |
group_by |
Column name used to group by data frame partitions. |
packages |
Boolean to distribute For clusters using Yarn cluster mode, For offline clusters where |
context |
Optional object to be serialized and passed back to |
... |
Optional arguments; currently unused. |
spark_config()
settings can be specified to change the workers
environment.
For instance, to set additional environment variables to each
worker node use the sparklyr.apply.env.*
config, to launch workers
without --vanilla
use sparklyr.apply.options.vanilla
set to
FALSE
, to run a custom script before launching Rscript use
sparklyr.apply.options.rscript.before
.
## Not run: library(sparklyr) sc <- spark_connect(master = "local") # creates an Spark data frame with 10 elements then multiply times 10 in R sdf_len(sc, 10) %>% spark_apply(function(df) df * 10) ## End(Not run)