Conversation
|
I am in favor of the proposed changes! |
|
I am also in favor. I would just like to know downstream complications you mught expect. Would this require changes for atomate2 itself? |
|
This looks very nice @gpetretto, thanks for tackling it. I need to spend some time to sit down and go through the changes and what they might mean. But I hope to get time early next week. |
Great, thanks! Let me know if you have any doubt or you want to have a discussion on some aspects of the PR.
Sorry for the delay. I have spent some time checking the atomate2 tests and potential downsides.
However, there are two tests that fail due to a change in the behaviour in the current PR. One example is the Gruneises workflow, due to The problem here is that the Another issue that emerged, unrelated from the atomate2 tests, is how to deal with the Flow output when some of its jobs fail. Since there may be jobs with Concerning the breaking change of the Overall, we think that some minor downsides may be worth for the addition of these functionalities. |
Partially related to #842, but aiming at more general modification of the code.
On top of the issues described in #842, in general there are a few issues that users have pointed out
generate_frequencies_eigenvectors). For users that are not very familiar with the whole internal structure of the Flow this results to be unintuitive and requires knowing the name of the last job for the different kind of Flows.After discussing with @davidwaroquiers we though that one possible way to address the previous issues would be to introduce an output document representing a Flow output. So, in this draft we attempt to introduce this new entity. Here are the main changes:
indexattribute is added to the Flow.JobStoreDocumentwhen all the contained Jobs are processed. Stored in the standard JobStore.store_inputsoutput document).make()are added as an attribute to the Flow object and are then stored in the Flow output document.replacecontaining a Flow, there is nostore_inputsJob anymore, but the replacing Flow gets the uuid of the original Job and index=job-index+1. The Flow output takes the role of thestore_inputs.run_locallyfor testing purposes. Tested it with several combinations of Flows/Jobs/Makers.So, to make an example, consider the following code:
After everything is completed, the output store will contain an output from the
addjob, with the numerical output, and another document corresponding to the uuid of the Flowf. The latter contains a reference to the output of the jobjand the details of the Maker object. The advantage here is that a user can fetch the results as an output of a Flow, without having to know the actual internal structure of the Flow and also have the inputs used to generate it.Currently, this approach does not really solve problem 1 in #842, but I think that there are options to use the Flow output to solve it in a similar way to how it is done in #843.
There are of course several critical points in this proposal:
index, thehostsattribute of Job and Flow is not a list of uuids anymore, but a list of uuid+index pairs.We are aware that such a change would not be a trivial one, but we thought that the potential improvements in the user experience would be worth opening this PR so that it can be better discussed.
@utf @JaGeo @Andrew-S-Rosen What do you think?