-
Notifications
You must be signed in to change notification settings - Fork 64
feat(myopencre): improve CSV import validation (export-compatible) #682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(myopencre): improve CSV import validation (export-compatible) #682
Conversation
|
Hi! @northdpole Sir I spent a few hours validating this against the CSV produced by the CRE catalogue exporter and tried to keep the importer behavior as close to that format as possible. If any of the validation rules here feel too permissive, too strict, or misaligned with intended usage, I’d really appreciate feedback and guidance. |
|
|
||
| if file is None: | ||
| abort(400, "No file provided") | ||
| return ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! thanks for adding the validation.
One thing: this very helpful validation is a bit confusing and out of place in the "controllers" file.
Ideally this file only handles requests/responses. Why don't we move this validation to the spreadsheet parsers file in the appropriate function?
This way not only we get a bit clearer code but also we get a start on a spreadsheet validation utility we can eventually make into its own module.
For now just moving this validation logic into the appropriate function in the spreadsheet handlers module is enough 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent work! thanks for adding the validation. One thing: this very helpful validation is a bit confusing and out of place in the "controllers" file. Ideally this file only handles requests/responses. Why don't we move this validation to the spreadsheet parsers file in the appropriate function?
This way not only we get a bit clearer code but also we get a start on a spreadsheet validation utility we can eventually make into its own module. For now just moving this validation logic into the appropriate function in the spreadsheet handlers module is enough 👍
Thank you @northdpole , this makes complete sense — agreed 👍
You’re right that this validation logic doesn’t really belong in the controller layer. I’ll move the CSV validation into the appropriate spreadsheet parser / handler function so the controller is limited to request/response handling.
This should also give us a cleaner starting point for a reusable spreadsheet validation utility, as you suggested. I’ll refactor accordingly and update the PR shortly. Thanks again for the guidance!
- Validate file type, encoding, and required headers - Accept CSVs generated from CRE catalogue export - Skip empty and padding rows present in exported templates - Validate CRE format only when CRE references exist - Guard against misaligned rows with extra columns - Return structured validation errors before import This keeps the importer aligned with the exporter while preventing malformed inputs from causing server errors.
53b85b0 to
197653b
Compare
|
Thanks a lot @northdpole Sir for the thoughtful feedback — much appreciated 👍 I’ve addressed the concern about validation placement across this PR stack . What changed based on your feedback This keeps the controller focused strictly on request/response handling, while centralizing CSV validation in a place that’s easier to reason about, test, and evolve (with a clear path toward a standalone validation utility if needed). PR overview (for context) Each PR builds incrementally on the previous one, but validation itself now consistently happens at the parser level, not in the controllers. I’m definitely looking forward to improving the validation further — especially around clearer error semantics and extensibility as more CSV use-cases come in. Thanks again for the guidance, it helped clean up the design significantly. |
Please review this PR after the above PR is merged.
Note for reviewers
The intended change in this PR is limited to
application/web/web_main.py(CSV import validation logic).Any additional file diffs shown by GitHub are inherited from branch history / stacking and are not part of the validation change itself.
Please let me know if you’d prefer this PR to be rebased or split into a narrower diff.
Summary
This PR improves server-side validation for the MyOpenCRE CSV import endpoint, aligning it closely with the CSV format produced by the CRE catalogue export flow.
The intent is not to introduce strict or surprising validation rules, but to ensure that:
What this PR does
File-level validation
Schema / header validation
CRE*columnstandard|nameandstandard|idRow-level validation (export-compatible)
<CRE-ID>|<Name>)No-op import guard
What is intentionally ignored (by design)
This mirrors how production importers typically behave and allows exported files to be round-tripped without manual cleanup.
What is out of scope for this PR
These will be handled in follow-up PRs to keep this change focused and reviewable.
Why this approach
The importer now accepts everything the exporter produces, ignores exporter padding rows, and enforces validation only where semantic meaning exists.
This keeps the import process resilient while still preventing invalid data from entering the system.
Dependency note
This PR is stacked on top of:
feat(myopencre): add CSV upload UI and wire to existing import endpoint (#664)It should be reviewed and merged after that PR.
Feedback welcome
If any validation behavior is too permissive or too strict, or if there are exporter edge cases I may have missed, I’d really appreciate guidance and am happy to adjust.