Skip to content

feat: ✨ Align Citation generation based off latest citation schema (1.3.0)#172

Merged
slugb0t merged 2 commits intomainfrom
staging
Dec 10, 2025
Merged

feat: ✨ Align Citation generation based off latest citation schema (1.3.0)#172
slugb0t merged 2 commits intomainfrom
staging

Conversation

@slugb0t
Copy link
Member

@slugb0t slugb0t commented Dec 10, 2025

Summary by Sourcery

Align citation generation and metadata handling with the updated citation schema by normalizing DOI values and avoiding redundant DOI URL storage.

New Features:

  • Normalize and extract bare DOI values from various identifier formats when generating citation metadata.

Bug Fixes:

  • Ensure DOI values in generated citation files are consistently stored as canonical DOI strings rather than resolver URLs or arbitrary identifiers.

Enhancements:

  • Avoid redundant DOI information in citation metadata by using the dedicated doi field instead of identifiers arrays where appropriate.

@fairdataihub-bot
Copy link

Thank you for submitting this pull request! We appreciate your contribution to the project. Before we can merge it, we need to review the changes you've made to ensure they align with our code standards and meet the requirements of the project. We'll get back to you as soon as we can with feedback. Thanks again!

@sourcery-ai
Copy link

sourcery-ai bot commented Dec 10, 2025

Reviewer's Guide

Aligns DOI handling and citation.cff generation with citation schema 1.3.0 by extracting clean DOI values from various identifier formats and using them consistently in both the API code-metadata handler and the metadata compliance bot.

File-Level Changes

Change Details Files
Normalize and extract a canonical DOI value from uniqueIdentifier for citation.cff generation in the code-metadata API handler.
  • Introduce a DOI extraction regex and helper logic to parse different DOI/URL formats from uniqueIdentifier.
  • Prefer extracting the bare DOI from resolver URLs (doi.org/dx.doi.org) and fall back to the raw uniqueIdentifier when no pattern matches.
  • Stop constructing full https://doi.org URLs for the doi field and instead set it only when a parsed DOI value is available.
  • Disable the identifiers block to avoid redundancy with the dedicated doi field.
ui/server/api/[owner]/[repo]/code-metadata/index.post.ts
Update metadata compliance-check DOI normalization to store a cleaned DOI value instead of a full resolver URL.
  • Add shared-style DOI regex and parsing logic to normalize identifier into a bare DOI where possible.
  • Handle resolver URLs (https://doi.org/...) by extracting and validating the DOI portion, otherwise attempt direct DOI matching or fallback to the original identifier string.
  • Assign the normalized DOI value to citationFile.doi instead of always generating a https://doi.org URL, while still keeping the original identifier in codeMetaFile.identifier.
bot/compliance-checks/metadata/index.js

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@fairdataihub-bot
Copy link

Thanks for making updates to your pull request. Our team will take a look and provide feedback as soon as possible. Please wait for any GitHub Actions to complete before editing your pull request. If you have any additional questions or concerns, feel free to let us know. Thank you for your contributions!

@what-the-diff
Copy link

what-the-diff bot commented Dec 10, 2025

PR Summary

  • Improvement in DOI Handling in index.js File:

    • Utilized a formula to validate the correctness of DOI formats.
    • Developed a robust system to manage the cases of DOI resolver URLs and direct DOI identifiers.
    • Enhanced clarity by renaming the variable from doiUrl to doiValue and modifying the assignment pattern to the citationFile.doi.
  • Advanced DOI Extraction Process in index.post.ts File:

    • Incorporated a new method to obtain DOIs from within the codeMetadataRecord.
    • The same formula is used for validating DOIs in this process.
    • Designed to retain the raw identifier value if no matches are found to preserve the old practice for unmatched identifiers.
    • Tweaked the method for including DOIs in the citationCFF object to prevent repetition and supports the new extraction method.

@slugb0t slugb0t merged commit 06de7a0 into main Dec 10, 2025
4 of 5 checks passed
@fairdataihub-bot
Copy link

Thanks for closing this pull request! If you have any further questions, please feel free to open a new issue. We are always happy to help!

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The DOI extraction logic and DOI_REGEX are duplicated in both the API handler and the compliance-checks script; consider extracting a shared utility to keep the behavior and pattern consistent and easier to maintain.
  • In code-metadata/index.post.ts, the comment above the (disabled) identifiers block now mixes a JSDoc opening /** with a line comment // Note: ...; converting this to a single coherent block comment or a plain line comment would improve readability.
  • In updateMetadataIdentifier, when identifierString is empty citationFile.doi is set to an empty string; if the intention is to omit the DOI in that case, consider leaving citationFile.doi undefined instead of assigning an empty value.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The DOI extraction logic and `DOI_REGEX` are duplicated in both the API handler and the compliance-checks script; consider extracting a shared utility to keep the behavior and pattern consistent and easier to maintain.
- In `code-metadata/index.post.ts`, the comment above the (disabled) `identifiers` block now mixes a JSDoc opening `/**` with a line comment `// Note: ...`; converting this to a single coherent block comment or a plain line comment would improve readability.
- In `updateMetadataIdentifier`, when `identifierString` is empty `citationFile.doi` is set to an empty string; if the intention is to omit the DOI in that case, consider leaving `citationFile.doi` undefined instead of assigning an empty value.

## Individual Comments

### Comment 1
<location> `bot/compliance-checks/metadata/index.js:678-680` </location>
<code_context>
+      }
     }
-    citationFile.doi = doiUrl;
+    citationFile.doi = doiValue;
     citationFile["date-released"] = updated_date;
     citationFile.version = zenodoMetadata?.zenodo_metadata?.version;
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Avoid assigning an empty string to `citationFile.doi` when no identifier is provided.

With the current logic, a falsy `identifier` leaves `doiValue` as `""`, so `citationFile.doi` is explicitly set to an empty string. Instead, consider only setting the field when `doiValue` is truthy, e.g.:

```js
if (doiValue) {
  citationFile.doi = doiValue;
}
```

This prevents persisting an empty DOI and keeps the field’s presence meaningful.

```suggestion
    }
    if (doiValue) {
      citationFile.doi = doiValue;
    }
    citationFile["date-released"] = updated_date;
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@slugb0t slugb0t deleted the staging branch December 10, 2025 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments