Я пытаюсь заставить параллельную обработку работать в моей локальной установке RStudio или в облаке RStudio, используя пакет doParallel
и следуя руководству здесь.
К сожалению, включение параллельной обработки скорее замедляет вычисления, чем ускоряет их.
Тестовая операция:
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Результаты без параллельной обработки
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %do% sum(tanh(1:i)) 183.1157 196.3723 222.237 206.3648 227.4821 417.8161 100
user system elapsed
0.33 0.04 0.19
Результаты после включения параллельной обработки - занимает в 2 раза больше времени!
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 331.3142 371.2502 406.0369 389.7049 412.8814 814.3407 100
user system elapsed
0.28 0.10 0.37
Как странно! Какие-нибудь советы? Ниже я привожу полный сценарий, который я запускал, а также журналы моего локального сеанса RStudio и сеанса из облака RStudio.
Полный сценарий
install.packages('doParallel')
library(doParallel)
install.packages('microbenchmark')
library(microbenchmark)
# Without parallel processing
microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
# Without parallel processing, get a warning
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
# Turn on parallel with several cores
registerDoParallel(detectCores() - 2)
# See number of cores
getDoParWorkers()
# Test for speed improvement With parallel processing
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
# Return to one worker
registerDoParallel(1)
registerDoSEQ()
Лог локального запуска:
Restarting R session...
Warning message:
<REDACTED LINE>
Error 6 (The handle is invalid)
Features disabled: R source file indexing, Diagnostics
Error in summary.connection(connection) : invalid connection
Error in summary.connection(connection) : invalid connection
<REDACTED LINE>
> install.packages('doParallel')
Installing doParallel [1.0.16] ...
OK [linked cache]
> library(doParallel)
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Warning messages:
1: package ‘doParallel’ was built under R version 4.0.3
2: package ‘foreach’ was built under R version 4.0.3
3: package ‘iterators’ was built under R version 4.0.3
> install.packages('microbenchmark')
Installing microbenchmark [1.4-7] ...
OK [linked cache]
> library(microbenchmark)
Warning message:
package ‘microbenchmark’ was built under R version 4.0.3
>
> # Without parallel processing
> microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %do% sum(tanh(1:i)) 183.1157 196.3723 222.237 206.3648 227.4821 417.8161 100
>
> system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
user system elapsed
0.33 0.04 0.19
>
> # Without parallel processing, get a warning
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 178.1788 188.879 213.9808 197.2124 227.6921 698.484 100
Warning message:
executing %dopar% sequentially: no parallel backend registered
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.22 0.03 0.25
>
> # Turn on parallel with several cores
> registerDoParallel(detectCores() - 2)
>
> # See number of cores
> getDoParWorkers()
[1] 6
>
> # Test for speed improvement With parallel processing
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 331.3142 371.2502 406.0369 389.7049 412.8814 814.3407 100
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.28 0.10 0.37
>
> # Return to one worker
> registerDoParallel(1)
> registerDoSEQ()
Лог из облака RStudio:
Restarting R session...
> install.packages('doParallel')
Installing package into ‘/home/rstudio-user/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'http://package-proxy/src/contrib/doParallel_1.0.16.tar.gz'
Content type 'application/x-tar' length 59776 bytes (58 KB)
==================================================
downloaded 58 KB
* installing *binary* package ‘doParallel’ ...
* DONE (doParallel)
The downloaded source packages are in
‘/tmp/RtmplDZYAT/downloaded_packages’
> library(doParallel)
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
> install.packages('microbenchmark')
Installing package into ‘/home/rstudio-user/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'http://package-proxy/src/contrib/microbenchmark_1.4-7.tar.gz'
Content type 'application/x-tar' length 61382 bytes (59 KB)
==================================================
downloaded 59 KB
* installing *binary* package ‘microbenchmark’ ...
* DONE (microbenchmark)
The downloaded source packages are in
‘/tmp/RtmplDZYAT/downloaded_packages’
> library(microbenchmark)
>
> # Without parallel processing
> microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %do% sum(tanh(1:i)) 121.6417 126.5681 130.8152 129.7511 133.3043 171.6484 100
>
> system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
user system elapsed
0.126 0.000 0.126
>
> # Without parallel processing, get a warning
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 117.6518 124.2508 127.9016 127.1467 129.9798 171.9952 100
Warning message:
executing %dopar% sequentially: no parallel backend registered
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.169 0.000 0.169
>
> # Turn on parallel with several cores
> registerDoParallel(detectCores() - 2)
>
> # See number of cores
> getDoParWorkers()
[1] 14
>
> # Test for speed improvement With parallel processing
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 262.9285 302.7655 340.1377 325.8734 359.3806 707.4004 100
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.136 0.176 0.313
>
> # Return to one worker
> registerDoParallel(1)
> registerDoSEQ()
>