Looks like IntelMPI has a bug in one tried to call `MPI_Put` after `MPI_Get`, so `work_stealing` load balancer uses MPI_Accumulate instead.