-
Notifications
You must be signed in to change notification settings - Fork 0
Add new valuesets and expand existing enums #41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
New schema modules added: - bio/assays.yaml - Biological assay types (OBI-based) - bio/expression_units.yaml - Gene expression measurement units - bio/gene_perturbation.yaml - Gene perturbation methods - bio/specimen_processing.yaml - Specimen processing methods - bio/transplantation.yaml - Transplantation types and related terms - data_catalog/ - Data catalog terms (access, contributor_roles, relations, resource_types) - medical/imaging_platforms.yaml - Medical imaging platforms - preservation/ - Digital preservation (events, fixity, digital_objects) - publishing/osti_record.yaml - OSTI record types Updates to existing enums: - developmental_stages, structural_biology, uniprot_species - pediatric oncology enums - environmental health exposures - energy/renewable enums - fao_soil Also includes ontology cache updates, generated code, and documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Claude finished @cmungall's task —— View job Pull Request ReviewI've conducted a comprehensive review of this large PR that adds significant new value sets and expands existing enums. Overall, this is a well-structured addition that follows project conventions. Below are my findings: ✅ Strengths1. Schema Organization & Structure
2. Ontology Integration
3. Documentation & Metadata
4. Code Generation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR significantly expands the valuesets schema by adding multiple new domain-specific modules across biological research, data preservation, data catalogs, and publishing metadata, alongside updates to existing enumerations with corrected ontology mappings.
Key Changes:
- Addition of 12 new schema modules covering assays, expression units, gene perturbation, specimen processing, transplantation, imaging platforms, data catalog (access, contributor roles, relations, resource types), preservation (events, fixity, digital objects), and OSTI record metadata
- Correction of ontology term mappings across multiple existing enums including NCIT, ENVO, CHMO, and EDAM identifiers
- Updates to existing medical, energy, and environmental health domain schemas with improved standardization
Reviewed changes
Copilot reviewed 78 out of 88 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/valuesets/validators/oak_config.yaml | Extended ontology adapter configuration to support preservation vocabularies (PREMIS, DataCite, DCAT) and data catalog standards |
| src/valuesets/schema/valuesets.yaml | Added imports for 12 new schema modules spanning biological assays, preservation, and data catalog domains |
| src/valuesets/schema/publishing/osti_record.yaml | New comprehensive OSTI E-Link 2 metadata schema with workflow statuses, access limitations, and identifier types |
| src/valuesets/schema/preservation/*.yaml | Three new PREMIS-based schemas for preservation events, digital object metadata, and cryptographic hash functions |
| src/valuesets/schema/data_catalog/*.yaml | Four new DataCite/DCAT-based schemas for resource types, relations, contributor roles, and access rights |
| src/valuesets/schema/bio/*.yaml | Six new biological research schemas covering assays, expression units, gene perturbation, specimen processing, transplantation, and imaging platforms |
| src/valuesets/schema/medical/*.yaml | Corrected NCIT term mappings for neuroblastoma staging and pediatric oncology diagnosis categories |
| src/valuesets/schema/energy/*.yaml | Fixed ontology mappings for hydrogen, geothermal, and bioenergy enums |
| src/valuesets/schema/environmental_health/exposures.yaml | Added HHEAR exposure assessment enum with 24 environmental exposure categories |
| Generated Python enums | Auto-generated Python code for all new schemas with proper metadata and rich enum support |
Comments suppressed due to low confidence (10)
src/valuesets/schema/medical/pediatric_oncology/staging/neuroblastoma.yaml:1
- The titles for INRG stages have been changed from descriptive format (e.g., 'L1 - Localized, no IDRFs') to standard format ('INRG Stage L1'). While more standardized, the descriptive titles were more informative. Consider whether the additional context in the original titles was valuable for users.
src/valuesets/schema/medical/pediatric_oncology/staging/neuroblastoma.yaml:1 - The 'meaning' field has been removed from VERY_LOW, LOW, and INTERMEDIATE risk categories while retained for HIGH. This inconsistency suggests either incomplete ontology mapping or intentional distinction. Ensure all risk categories have ontology mappings if they exist in NCIT, or document why HIGH is the only mapped category.
src/valuesets/schema/bio/structural_biology.yaml:1 - The meaning for NUCLEIC_ACID changed from NCIT:C813 to NCIT:C706. The title also changed to 'Nucleic Acids' (plural). Verify that NCIT:C706 is the correct term and that the plural form is intentional, as this affects semantic consistency.
src/valuesets/schema/bio/structural_biology.yaml:1 - Multiple CHMO ontology term corrections throughout this file (NEUTRON_CRYSTALLOGRAPHY from CHMO:0000175 to CHMO:0000698, WAXS from CHMO:0000213 to CHMO:0000207, NEGATIVE_STAIN_EM from FBbi:00000568 to FBbi:00000399). While these corrections improve accuracy, they represent breaking changes for any downstream systems using these mappings. Consider documenting these mapping changes.
src/valuesets/schema/bio/structural_biology.yaml:1 - The 'meaning' field (previously CHMO:0001107) has been removed from ROTATING_ANODE, CCD, CMOS, PHOTOSTIMULABLE_PHOSPHOR, and several other enum values. This reduces semantic interoperability. Verify these terms lack appropriate ontology mappings or restore them if removed in error.
src/valuesets/schema/data_catalog/access.yaml:1 - The CONTINUOUS frequency value uses the generic 'dcterms:Frequency' as its meaning rather than a specific term. Consider using a more specific Dublin Core term if available, or document why the generic term is appropriate for this value.
src/valuesets/schema/energy/renewable/bioenergy.yaml:1 - SYNGAS changed from having 'meaning: CHMO:0001501' to 'close_mappings: CHMO:0001501', and added 'gasification' as an alias. This downgrade from exact to close mapping changes semantic precision. Document why this term is now considered a close rather than exact match.
src/valuesets/schema/energy/renewable/bioenergy.yaml:1 - The title for BUTANOL changed from 'Biobutanol' to 'butan-1-ol' (chemical name), with 'Biobutanol' moved to aliases. While the chemical name is more precise, 'Biobutanol' in the context of renewable energy may be more meaningful. Verify this naming change aligns with domain conventions.
src/valuesets/schema/earth_science/fao_soil.yaml:1 - Multiple ENVO ontology term corrections in this file (ANDOSOLS from ENVO:00002030 to ENVO:00002232, CRYOSOLS from ENVO:00000134 to ENVO:00002236, etc.). These represent significant mapping changes that could affect data integration. Ensure these corrections are documented and communicated to users.
src/valuesets/schema/bio/plant_sex.yaml:1 - The title for HERMAPHRODITIC moved from after the meaning to before it. This is a formatting change only and doesn't affect functionality, but maintaining consistent field ordering improves schema readability.
Summary
Test plan
just validateto ensure schema validityjust testto verify tests pass🤖 Generated with Claude Code