stringdist-parallelization {stringdist} | R Documentation |
This page describes how stringdist uses parallel processing.
The core
functions of stringdist are implemented in C. On systems where
openMP
is available, stringdist will automatically take
advantage of multiple cores. The
section
on OpenMP of the
Writing
R Extensions manual discusses on what systems OpenMP is available (at the time of writing more or
less, anywhere except on OSX).
By default, the number of threads to use is taken from options('sd_num_thread')
.
When the package is loaded, the value for this option is determined as follows:
The number of available cores is determined with parallel::detectCores()
If available, the environment variable OMP_THREAD_LIMIT
is determined
The number of threads is set to the lesser of OMP_THREAD_LIMIT
and the number of detected cores.
If the number of threads larger then or equal to 4, and OMP_THREAD_LIMIT
is not set, it is set to 'sd_num_thread'-1
.
The latter step makes sure that on machines with n>3 cores, n-1 cores are used. Some benchmarking showed that using all cores is often slower in such cases. This is probably because at least one of the threads will be shared with the operating system.
Functions that use multithreading have an option named nthread
that
controls the maximum number of threads to use. If you need to do large
calculations, it is probably a good idea to benchmark the performance on your
machine(s) as a function of 'nthread'
, for example using the
microbenchmark
package of Mersmann.
Functions running multithreaded: stringdist
, stringdistmatrix
, amatch
, ain