-
Notifications
You must be signed in to change notification settings - Fork 0
add support for stopping jobs #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This PR is part of a stack of 8 bookmarks:
Created with jj-stack |
olevski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments but otherwise things look good.
coman/src/main.rs
Outdated
| image, | ||
| command, | ||
| } => cli_cscs_job_start(script_file, image, command).await?, | ||
| cli::CscsJobCommands::Stop { job_id } => cli_cscs_job_stop(job_id).await?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment(non-blocking): If this results in the irrecoverable loss / removal of data probably stop is not a good name. Stop makes me think that whatever is running will stop but data is not lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this cancels the job. i.e. makes it go from Running to Non-Running. Data is not lost.
Actually the Firecrest API does not have a method to delete a job, as far as I can tell, which might be an issue where things get pretty cluttered over time. It is pretty weird that the HTTP verb DELETE on the job endpoint is for cancellation, I want to raise an issue with the firecrest people on this. Calling this on a job that is Finished or Failed gives an exception that the underlying Slurm job does not exist, it only works for running jobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and the openapi spec summary says Delete Job Cancel for what it does, which is just gibberish 🙈
coman/src/cscs/api_client.rs
Outdated
| Ok(Some((job, job_metadata).into())) | ||
| } | ||
|
|
||
| pub async fn delete_stop(&self, system_name: &str, job_id: i64) -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: confusing name... maybe my other comment is more relevant... it may be important to distinguish stoping vs deletion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a typo and I was going for delete_job and renamed the wrong part to stop after I realized that the DELETE endpoint actually cancels... I'll rename it to cancel everywhere, I think that's the most clear. I originally only went for stop because it's less characters to type on the CLI 😆
ea1ef80 to
e4169f4
Compare
No description provided.