HDF Data Loader is setup to support simultaneous loads by multiple python processes. And this functionality works as evidenced by debugging. However, the multi process support isn't leading to a speedup during training.
e.g. In the consensus sample, GPU utilization sits at around 80% using both 1 and 32 threads for data loader.