scatter {rmr2} | R Documentation |
scatter
takes in input a file and pushes it through a mapreduce jobs that writes it over a number of parts (system dependent, specifically on the number of reducers). This helps with parallelization of the next map phase. Gather does the opposite.
scatter(input, output = NULL, ...) gather(input, output = NULL, ...)
input |
The input file |
output |
Output, defaults to the same as |
... |
Other options passed directly to mapreduce |
Same as for mapreduce
.
Scatter discards keys. This is a limitation that should be addressed in the future