From 3dca49173093e462b706a4a7143e436dce773d9f Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 16 Sep 2025 13:18:28 -0600 Subject: [PATCH 01/28] enh(blog): Add blog post on generative AI peer review policy This blog post outlines pyOpenSci's new peer review policy regarding the use of generative AI tools in scientific software, emphasizing transparency, ethical considerations, and the importance of human oversight in the review process. --- .../2025-09-16-generative-ai-peer-review.md | 139 ++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 _posts/2025-09-16-generative-ai-peer-review.md diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md new file mode 100644 index 00000000..381d0001 --- /dev/null +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -0,0 +1,139 @@ +--- +layout: single +title: "Navigating LLMs in Open Source: pyOpenSci's New Peer Review Policy" +excerpt: "Generative AI tools are making is easier to generate large amounts of code which in some cases is causing a strain on volunteer peer review programs like ours. Learn about pyOpenSci's policy on generative AI in peer review in this blog post." +author: "pyopensci" +permalink: /blog/generative-ai-peer-review-policy.html +header: + overlay_image: images/headers/pyopensci-floral.png +categories: + - blog-post + - community +classes: wide +toc: true +comments: true +last_modified: 2025-09-16 +--- + +authors: Leah Wasser, Mandy Moore, + +## Generative AI meets scientific open source + +It has been suggested that for some developers, using AI tools for tasks can increase efficiency by as much as 55%. But in open source scientific software, speed isn't everything—transparency, quality, and community trust matter just as much. So do the ethical questions these tools raise. + +**Edit this.** Whatever breakout content we want here.... needs to be all on a single line. +{: .notice--success} + + +## Why we need guidelines + +At [pyOpenSci](https://www.pyopensci.org/), we’ve drafted a new policy for our peer review process to set clear expectations around disclosing use of LLMs in scientific software packages. + +This is not about banning AI tools. We recognize their value to some. Instead, our goal is transparency. We want maintainers to **disclose when and how they’ve used LLMs** so editors and reviewers can fairly and efficiently evaluate submissions. + +## Our Approach: Transparency and Disclosure + +We know that people will continue to use LLMs. We also know they can meaningfully increase productivity and lower barriers to contribution for some. We also know that there are significant ethical, societal and other challenges that come with the development and use of LLM’s. + +Our community’s expectation is simple: **be open about it**. + +* Disclose LLM use in your README and at the top of relevant modules. +* Describe how the tools were used +* Be clear about what human review you performed. + +Transparency helps reviewers understand context, trace decisions, and focus their time where it matters most. + +### Human oversight + +LLM-assisted code must be **reviewed, edited, and tested by humans** before submission. + +* Run tests and confirm correctness. +* Check for security and quality issues. +* Ensure style, readability, and clear docstrings. +* Explain your review process in your software submission to pyOpenSci. + +Please don’t offload vetting to volunteer reviewers. Arrive with human-reviewed code that you understand, have tested, and can maintain. + +### Licensing awareness + +LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3). + +* Acknowledge potential license ambiguity in your disclosure. +* Avoid pasting verbatim outputs that resemble known copyrighted code. +* Prefer human-edited, transformative outputs you fully understand. + +We can’t control upstream model training data, but we can be cautious, explicit and critical about our usage. + +### Ethics and inclusion + +LLM outputs can reflect and amplify bias in training data. In documentation and tutorials, that bias can harm the very communities we want to support. + +* Review AI-generated text for stereotypes or exclusionary language. +* Prefer plain, inclusive language. +* Invite feedback and review from diverse contributors. + +Inclusion is part of quality. Treat AI-generated text with the same care as code. + +## Supporting volunteer peer review + +Peer review runs on **volunteer time**. Rapid, AI-assisted submissions can overwhelm reviewers—especially when code hasn’t been vetted. + +* Submit smaller PRs with clear scopes. +* Summarize changes and provide test evidence. +* Flag AI-assisted sections so reviewers know where to look closely. +* Be responsive to feedback, especially on AI-generated code. + +These safeguards protect human capacity so high-quality packages can move through review efficiently. + +## Benefits and opportunities + +LLMs are already helping developers: + +* Explaining complex codebases +* Generating unit tests and docstrings +* In some cases, simplifying language barriers for participants in open source around the world +* Speeding up everyday workflows + +For some contributors, these tools make open source more accessible. + +## Challenges we must address + +### Overloaded peer review + +Peer review relies on volunteers. LLMs can produce large volumes of code quickly, increasing submissions with content that may not have been carefully reviewed by a human before reaching our review system. + +### Ethical and legal complexities + +LLMs are often trained on copyrighted or licensed material. Outputs may create conflicts when used in projects under different licenses. They can also reflect extractive practices, like data colonialism, and disproportionately harm underserved communities. + +### Bias and equity concerns + +AI-generated text can perpetuate bias. When it appears in documentation or tutorials, it can alienate the very groups open source most needs to welcome. + +### Environmental impacts + +Training and running LLMs [requires massive energy consumption](https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/), raising sustainability concerns that sit uncomfortably alongside much of the scientific research our community supports. + +### Impact on learning + +Heavy reliance on LLMs risks producing developers who can prompt, but not debug or maintain, code—undermining long-term project sustainability and growth. + +## What you can do now + +* **Be transparent.** Disclose LLM use in your README and modules. +* **Be accountable.** Thoroughly review, test, and edit AI-assisted code. +* **Be license-aware.** Note uncertainties and avoid verbatim look-alikes. +* **Be inclusive.** Check AI-generated docs for bias and clarity. +* **Be considerate.** Respect volunteer reviewers’ time. + + +
+## Join the conversation + +This policy is just the beginning. As AI continues to evolve, so will our practices. We invite you to: + +👉 Read the full draft policy +👉 Share your feedback and help us shape how the scientific Python community approaches AI in open source. + +The conversation is only starting, and your voice matters. +
From bfb4547b3f6c2104dc2f5f5801550bc26759529d Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 23 Sep 2025 15:08:45 -0600 Subject: [PATCH 02/28] Apply suggestion from @jedbrown Co-authored-by: Jed Brown --- _posts/2025-09-16-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index 381d0001..8f2a48c8 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -1,7 +1,7 @@ --- layout: single title: "Navigating LLMs in Open Source: pyOpenSci's New Peer Review Policy" -excerpt: "Generative AI tools are making is easier to generate large amounts of code which in some cases is causing a strain on volunteer peer review programs like ours. Learn about pyOpenSci's policy on generative AI in peer review in this blog post." +excerpt: "Generative AI products are reducing the effort and skill necessary to generate large amounts of code, which in some cases is causing a strain on volunteer peer review programs like ours. Learn about pyOpenSci's policy on generative AI in peer review in this blog post." author: "pyopensci" permalink: /blog/generative-ai-peer-review-policy.html header: From d72239f6ccd5253ca0c266339a5c340a0c047b60 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 23 Sep 2025 16:46:47 -0600 Subject: [PATCH 03/28] Apply suggestion from @lwasser --- _posts/2025-09-16-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index 8f2a48c8..e6c873fa 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -35,7 +35,7 @@ This is not about banning AI tools. We recognize their value to some. Instead, o We know that people will continue to use LLMs. We also know they can meaningfully increase productivity and lower barriers to contribution for some. We also know that there are significant ethical, societal and other challenges that come with the development and use of LLM’s. -Our community’s expectation is simple: **be open about it**. +Our community’s expectation is simple: **be open about and disclose any generative AI use in your package**. * Disclose LLM use in your README and at the top of relevant modules. * Describe how the tools were used From 9e74f7c6006f5873a2d0f349b30a6133f3f59127 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 23 Sep 2025 16:55:24 -0600 Subject: [PATCH 04/28] Apply suggestion from @jedbrown Co-authored-by: Jed Brown --- _posts/2025-09-16-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index e6c873fa..3fd18300 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -87,7 +87,7 @@ These safeguards protect human capacity so high-quality packages can move throug ## Benefits and opportunities -LLMs are already helping developers: +LLMs are already perceived as helping developers: * Explaining complex codebases * Generating unit tests and docstrings From 44319c8a03acc8ba22457c099e951ea77d30e564 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 23 Sep 2025 16:55:49 -0600 Subject: [PATCH 05/28] Apply suggestion from @jedbrown Co-authored-by: Jed Brown --- _posts/2025-09-16-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index 3fd18300..3758881e 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -94,7 +94,7 @@ LLMs are already perceived as helping developers: * In some cases, simplifying language barriers for participants in open source around the world * Speeding up everyday workflows -For some contributors, these tools make open source more accessible. +Some contributors perceive these products as making open source more accessible. ## Challenges we must address From 616e9068119cbd49f659a41f0e1f841f3fad3906 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 18 Nov 2025 10:40:55 -0700 Subject: [PATCH 06/28] Update _posts/2025-09-16-generative-ai-peer-review.md Co-authored-by: Jed Brown --- _posts/2025-09-16-generative-ai-peer-review.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index 3758881e..04c90a8c 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -56,7 +56,9 @@ Please don’t offload vetting to volunteer reviewers. Arrive with human-reviewe ### Licensing awareness -LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3). +LLMs are trained on source code and documents with many licenses, most of which require attribution/preservation of a copyright notice (possibly in addition to other terms). LLM outputs sometimes produce verbatim or near-verbatim copies of [code](https://githubcopilotlitigation.com/case-updates.html) or [prose](https://arxiv.org/abs/2505.12546) from the training data, but with attribution stripped. Without attribution, such instances constitute a derivative work that violates the license, thus are likely to be copyright infringement and are certainly plagiarism. Copyright infringement and plagiarism are issues of process, not merely of the final artifact, so it is difficult to prescribe a reliable procedure for due diligence when working with LLM output, short of assuming that such output is always tainted and thus the generated code or derivative works can never come into the code base. We recognize that many users of LLM products for software development would consider such diligence impractical. + +If similarities with existing software is detected **and** the licenses are compatible, one can come into compliance with the license by complying with its terms, such as by adding attribution. When the source package has an [incompatible license](https://dwheeler.com/essays/floss-license-slide.html), there is no simple fix. For example, if LGPL-2.1 code is emitted by an LLM into an Apache-2.0 project, no amount of attribution or license changes can bring the project into compliance. The Apache-2.0 project cannot even relicense to LGPL-2.1 without consent from every contributor (or their copyright holder). In such cases, the project would be responsible for deleting all implicated code and derivative works, and rewriting it all using [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design). * Acknowledge potential license ambiguity in your disclosure. * Avoid pasting verbatim outputs that resemble known copyrighted code. From a0a86717a50c6a66709b3ecdda60501f70438066 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 18 Nov 2025 10:41:57 -0700 Subject: [PATCH 07/28] Apply suggestions from code review --- _posts/2025-09-16-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index 04c90a8c..9047fbfa 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -19,7 +19,7 @@ authors: Leah Wasser, Mandy Moore, ## Generative AI meets scientific open source -It has been suggested that for some developers, using AI tools for tasks can increase efficiency by as much as 55%. But in open source scientific software, speed isn't everything—transparency, quality, and community trust matter just as much. So do the ethical questions these tools raise. +Some developers believe that using AI products increases efficiency. However, in scientific open-source, speed isn't everything—transparency, quality, and community trust are just as important. Similarly, the ethical questions that these tools raise are also a concern. **Edit this.** Whatever breakout content we want here.... needs to be all on a single line. {: .notice--success} From 7763095c856c64a32115287d5d3459af0a0d46b8 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 18 Nov 2025 10:43:38 -0700 Subject: [PATCH 08/28] Apply suggestion from @jedbrown Co-authored-by: Jed Brown --- _posts/2025-09-16-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index 9047fbfa..58f5e73c 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -33,7 +33,7 @@ This is not about banning AI tools. We recognize their value to some. Instead, o ## Our Approach: Transparency and Disclosure -We know that people will continue to use LLMs. We also know they can meaningfully increase productivity and lower barriers to contribution for some. We also know that there are significant ethical, societal and other challenges that come with the development and use of LLM’s. +We acknowledge that social and ethical norms and concern for environmental and societal externalities varies greatly across the community, and yet few members of the community will look to pyOpenSci for guidance on whether to use LLMs in their own work. Our focus thus centers on assisting with informed decision-making and consent with respect to LLM use in the submission, reviewing, and editorial process. Our community’s expectation is simple: **be open about and disclose any generative AI use in your package**. From ebf9ebbd4f64753ea5f097d2f115582bd034a466 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 18 Nov 2025 10:44:14 -0700 Subject: [PATCH 09/28] Apply suggestion from @lwasser --- _posts/2025-09-16-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index 58f5e73c..f98757b9 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -49,7 +49,7 @@ LLM-assisted code must be **reviewed, edited, and tested by humans** before subm * Run tests and confirm correctness. * Check for security and quality issues. -* Ensure style, readability, and clear docstrings. +* Ensure style, readability, and concise docstrings. * Explain your review process in your software submission to pyOpenSci. Please don’t offload vetting to volunteer reviewers. Arrive with human-reviewed code that you understand, have tested, and can maintain. From 81e7b2ce36a5e02496a6aa505237a5682aa7bf23 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 18 Nov 2025 10:48:54 -0700 Subject: [PATCH 10/28] Apply suggestion from @lwasser --- _posts/2025-09-16-generative-ai-peer-review.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index f98757b9..08954f2c 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -60,8 +60,8 @@ LLMs are trained on source code and documents with many licenses, most of which If similarities with existing software is detected **and** the licenses are compatible, one can come into compliance with the license by complying with its terms, such as by adding attribution. When the source package has an [incompatible license](https://dwheeler.com/essays/floss-license-slide.html), there is no simple fix. For example, if LGPL-2.1 code is emitted by an LLM into an Apache-2.0 project, no amount of attribution or license changes can bring the project into compliance. The Apache-2.0 project cannot even relicense to LGPL-2.1 without consent from every contributor (or their copyright holder). In such cases, the project would be responsible for deleting all implicated code and derivative works, and rewriting it all using [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design). -* Acknowledge potential license ambiguity in your disclosure. -* Avoid pasting verbatim outputs that resemble known copyrighted code. +* Be aware that when you directly use content developed by an LLM, there will be inherent license conflicts. +* Be aware that LLM products can potentially return copyrighted code verbatim in some cases. Avoid pasting verbatim outputs from an LLM into your package. Rather, if you use LLMs in your work, carefully review, edit, and modify the content, and * Prefer human-edited, transformative outputs you fully understand. We can’t control upstream model training data, but we can be cautious, explicit and critical about our usage. From cd1abf4f10268ea9816d37cd8d9432da261ed2ce Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 18 Nov 2025 10:49:38 -0700 Subject: [PATCH 11/28] Apply suggestion from @lwasser --- _posts/2025-09-16-generative-ai-peer-review.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index 08954f2c..803ffcec 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -98,6 +98,10 @@ LLMs are already perceived as helping developers: Some contributors perceive these products as making open source more accessible. +### Incorrectness of LLMs and misleading time benefits + +Although it is commonly stated that LLMs help improve the productivity of high-level developers, recent scientific explorations of this hypothesis indicate the contrary (see https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ for an excellent discussion on this). What's more is that the responses of LLM's for complex coding tasks tend to be incorrect (e.g., https://arxiv.org/html/2407.06153v1). Therefore, it is crucial that, if an LLM is used to help produce code, that the correctness of the code is evaluated separately from the LLM. + ## Challenges we must address ### Overloaded peer review From 8c81af97581e2b1baeb081173a8ba57605c68c2e Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 18 Nov 2025 10:49:59 -0700 Subject: [PATCH 12/28] Apply suggestion from @jedbrown Co-authored-by: Jed Brown --- _posts/2025-09-16-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index 803ffcec..fee72ba8 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -110,7 +110,7 @@ Peer review relies on volunteers. LLMs can produce large volumes of code quickly ### Ethical and legal complexities -LLMs are often trained on copyrighted or licensed material. Outputs may create conflicts when used in projects under different licenses. They can also reflect extractive practices, like data colonialism, and disproportionately harm underserved communities. +LLMs are often trained on copyrighted material with varying (or no) licenses. Outputs may constitute copyright infringement and/or ethical violations such as plagiarism. They can also reflect extractive practices, like data colonialism, and disproportionately harm underserved communities. ### Bias and equity concerns From 43d941644ce8afd85ee5c86d23d1dc73887413d7 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 18 Nov 2025 10:50:50 -0700 Subject: [PATCH 13/28] Apply suggestion from @lwasser --- _posts/2025-09-16-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md index fee72ba8..75bb22b9 100644 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ b/_posts/2025-09-16-generative-ai-peer-review.md @@ -15,7 +15,7 @@ comments: true last_modified: 2025-09-16 --- -authors: Leah Wasser, Mandy Moore, +authors: Leah Wasser, Jed Brown, Carter Rhea, Ellie Abrahams ## Generative AI meets scientific open source From 468af7c75e7160560d43e65c5204f442d1bdd065 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 18 Nov 2025 12:09:37 -0700 Subject: [PATCH 14/28] enh: more edits and updates --- .../2025-09-16-generative-ai-peer-review.md | 145 ----------------- .../2025-11-18-generative-ai-peer-review.md | 146 ++++++++++++++++++ 2 files changed, 146 insertions(+), 145 deletions(-) delete mode 100644 _posts/2025-09-16-generative-ai-peer-review.md create mode 100644 _posts/2025-11-18-generative-ai-peer-review.md diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md deleted file mode 100644 index 75bb22b9..00000000 --- a/_posts/2025-09-16-generative-ai-peer-review.md +++ /dev/null @@ -1,145 +0,0 @@ ---- -layout: single -title: "Navigating LLMs in Open Source: pyOpenSci's New Peer Review Policy" -excerpt: "Generative AI products are reducing the effort and skill necessary to generate large amounts of code, which in some cases is causing a strain on volunteer peer review programs like ours. Learn about pyOpenSci's policy on generative AI in peer review in this blog post." -author: "pyopensci" -permalink: /blog/generative-ai-peer-review-policy.html -header: - overlay_image: images/headers/pyopensci-floral.png -categories: - - blog-post - - community -classes: wide -toc: true -comments: true -last_modified: 2025-09-16 ---- - -authors: Leah Wasser, Jed Brown, Carter Rhea, Ellie Abrahams - -## Generative AI meets scientific open source - -Some developers believe that using AI products increases efficiency. However, in scientific open-source, speed isn't everything—transparency, quality, and community trust are just as important. Similarly, the ethical questions that these tools raise are also a concern. - -**Edit this.** Whatever breakout content we want here.... needs to be all on a single line. -{: .notice--success} - - -## Why we need guidelines - -At [pyOpenSci](https://www.pyopensci.org/), we’ve drafted a new policy for our peer review process to set clear expectations around disclosing use of LLMs in scientific software packages. - -This is not about banning AI tools. We recognize their value to some. Instead, our goal is transparency. We want maintainers to **disclose when and how they’ve used LLMs** so editors and reviewers can fairly and efficiently evaluate submissions. - -## Our Approach: Transparency and Disclosure - -We acknowledge that social and ethical norms and concern for environmental and societal externalities varies greatly across the community, and yet few members of the community will look to pyOpenSci for guidance on whether to use LLMs in their own work. Our focus thus centers on assisting with informed decision-making and consent with respect to LLM use in the submission, reviewing, and editorial process. - -Our community’s expectation is simple: **be open about and disclose any generative AI use in your package**. - -* Disclose LLM use in your README and at the top of relevant modules. -* Describe how the tools were used -* Be clear about what human review you performed. - -Transparency helps reviewers understand context, trace decisions, and focus their time where it matters most. - -### Human oversight - -LLM-assisted code must be **reviewed, edited, and tested by humans** before submission. - -* Run tests and confirm correctness. -* Check for security and quality issues. -* Ensure style, readability, and concise docstrings. -* Explain your review process in your software submission to pyOpenSci. - -Please don’t offload vetting to volunteer reviewers. Arrive with human-reviewed code that you understand, have tested, and can maintain. - -### Licensing awareness - -LLMs are trained on source code and documents with many licenses, most of which require attribution/preservation of a copyright notice (possibly in addition to other terms). LLM outputs sometimes produce verbatim or near-verbatim copies of [code](https://githubcopilotlitigation.com/case-updates.html) or [prose](https://arxiv.org/abs/2505.12546) from the training data, but with attribution stripped. Without attribution, such instances constitute a derivative work that violates the license, thus are likely to be copyright infringement and are certainly plagiarism. Copyright infringement and plagiarism are issues of process, not merely of the final artifact, so it is difficult to prescribe a reliable procedure for due diligence when working with LLM output, short of assuming that such output is always tainted and thus the generated code or derivative works can never come into the code base. We recognize that many users of LLM products for software development would consider such diligence impractical. - -If similarities with existing software is detected **and** the licenses are compatible, one can come into compliance with the license by complying with its terms, such as by adding attribution. When the source package has an [incompatible license](https://dwheeler.com/essays/floss-license-slide.html), there is no simple fix. For example, if LGPL-2.1 code is emitted by an LLM into an Apache-2.0 project, no amount of attribution or license changes can bring the project into compliance. The Apache-2.0 project cannot even relicense to LGPL-2.1 without consent from every contributor (or their copyright holder). In such cases, the project would be responsible for deleting all implicated code and derivative works, and rewriting it all using [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design). - -* Be aware that when you directly use content developed by an LLM, there will be inherent license conflicts. -* Be aware that LLM products can potentially return copyrighted code verbatim in some cases. Avoid pasting verbatim outputs from an LLM into your package. Rather, if you use LLMs in your work, carefully review, edit, and modify the content, and -* Prefer human-edited, transformative outputs you fully understand. - -We can’t control upstream model training data, but we can be cautious, explicit and critical about our usage. - -### Ethics and inclusion - -LLM outputs can reflect and amplify bias in training data. In documentation and tutorials, that bias can harm the very communities we want to support. - -* Review AI-generated text for stereotypes or exclusionary language. -* Prefer plain, inclusive language. -* Invite feedback and review from diverse contributors. - -Inclusion is part of quality. Treat AI-generated text with the same care as code. - -## Supporting volunteer peer review - -Peer review runs on **volunteer time**. Rapid, AI-assisted submissions can overwhelm reviewers—especially when code hasn’t been vetted. - -* Submit smaller PRs with clear scopes. -* Summarize changes and provide test evidence. -* Flag AI-assisted sections so reviewers know where to look closely. -* Be responsive to feedback, especially on AI-generated code. - -These safeguards protect human capacity so high-quality packages can move through review efficiently. - -## Benefits and opportunities - -LLMs are already perceived as helping developers: - -* Explaining complex codebases -* Generating unit tests and docstrings -* In some cases, simplifying language barriers for participants in open source around the world -* Speeding up everyday workflows - -Some contributors perceive these products as making open source more accessible. - -### Incorrectness of LLMs and misleading time benefits - -Although it is commonly stated that LLMs help improve the productivity of high-level developers, recent scientific explorations of this hypothesis indicate the contrary (see https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ for an excellent discussion on this). What's more is that the responses of LLM's for complex coding tasks tend to be incorrect (e.g., https://arxiv.org/html/2407.06153v1). Therefore, it is crucial that, if an LLM is used to help produce code, that the correctness of the code is evaluated separately from the LLM. - -## Challenges we must address - -### Overloaded peer review - -Peer review relies on volunteers. LLMs can produce large volumes of code quickly, increasing submissions with content that may not have been carefully reviewed by a human before reaching our review system. - -### Ethical and legal complexities - -LLMs are often trained on copyrighted material with varying (or no) licenses. Outputs may constitute copyright infringement and/or ethical violations such as plagiarism. They can also reflect extractive practices, like data colonialism, and disproportionately harm underserved communities. - -### Bias and equity concerns - -AI-generated text can perpetuate bias. When it appears in documentation or tutorials, it can alienate the very groups open source most needs to welcome. - -### Environmental impacts - -Training and running LLMs [requires massive energy consumption](https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/), raising sustainability concerns that sit uncomfortably alongside much of the scientific research our community supports. - -### Impact on learning - -Heavy reliance on LLMs risks producing developers who can prompt, but not debug or maintain, code—undermining long-term project sustainability and growth. - -## What you can do now - -* **Be transparent.** Disclose LLM use in your README and modules. -* **Be accountable.** Thoroughly review, test, and edit AI-assisted code. -* **Be license-aware.** Note uncertainties and avoid verbatim look-alikes. -* **Be inclusive.** Check AI-generated docs for bias and clarity. -* **Be considerate.** Respect volunteer reviewers’ time. - - -
-## Join the conversation - -This policy is just the beginning. As AI continues to evolve, so will our practices. We invite you to: - -👉 Read the full draft policy -👉 Share your feedback and help us shape how the scientific Python community approaches AI in open source. - -The conversation is only starting, and your voice matters. -
diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md new file mode 100644 index 00000000..97e2a091 --- /dev/null +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -0,0 +1,146 @@ +--- +layout: single +title: "Navigating LLMs in Open Source: pyOpenSci's New Peer Review Policy" +excerpt: "Generative AI products are reducing the effort and skill necessary to generate large amounts of code. In some cases, this strains volunteer peer review programs like ours. Learn about pyOpenSci's approach to developing a Generative AI policy for our software peer review program." +author: "pyopensci" +permalink: /blog/generative-ai-peer-review-policy.html +header: + overlay_image: images/headers/pyopensci-floral.png +categories: + - blog-post + - community +classes: wide +toc: true +comments: true +last_modified: 2025-09-16 +--- + +authors: Leah Wasser, Jed Brown, Carter Rhea, Ellie Abrahams + +## Generative AI meets scientific open source + +Some developers believe that using AI products increases efficiency. However, in scientific open source, speed isn't everything—transparency, quality, and community trust are just as important as understanding the environmental impacts of using large language models in our everyday work. Similarly, the ethical questions that these tools raise are also a concern as some communities may benefit from the same tools that hurt others. + +## Why we need guidelines + +At pyOpenSci, we’ve drafted a new policy for our peer review process to set clear expectations for disclosing the use of LLMs in scientific open-source software. + +This is not about banning AI tools. We recognize their value to some people. Instead, our goal is transparency. We want maintainers to **disclose when and how they’ve used LLMs** so editors and reviewers can fairly and efficiently evaluate submissions. Further, we want to avoid burdening our volunteer editorial and reviewer team with being the first to review generated code. + +## A complex topic: Benefits and concerns + +LLMs are perceived as helping developers: + +* Explain complex codebases +* Generate unit tests and docstrings +* In some cases, simplifying language barriers for participants in open source around the world +* Speeding up everyday workflows + +Some contributors also perceive these products as making open source more accessible. However, LLM's also present +unprecedented social and environmental challenges. + +### Incorrectness of LLMs and misleading time benefits + +Although it is commonly stated that LLMs help improve the productivity of high-level developers, recent scientific explorations of this hypothesis [indicate the contrary](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/). What's more is that the responses of LLMs for complex coding tasks [tend to be incorrect](https://arxiv.org/html/2407.06153v1) and/or overly verbose/inefficient. It is crucial that, if you use an LLM to help produce code, you should independently evaluate code correctness and efficiency. + +### Environmental impacts + +Training and running LLMs [requires massive energy consumption](https://www.unep.org/news-and-stories/story/ai-has-environmental-problem-heres-what-world-can-do-about), raising sustainability concerns that sit uncomfortably alongside much of the global scale scientific research that our community supports. + +### Impact on learning + +Heavy reliance on LLMs risks producing developers who can prompt, but not debug or maintain, code—undermining long-term project sustainability and growth. This also in the long run will make it [harder for young developers to learn how to code, and troubleshoot independently](https://knowledge.wharton.upenn.edu/article/without-guardrails-generative-ai-can-harm-education/). + +> We’re really worried that if humans don’t learn, if they start using these tools as a crutch and rely on it, then they won’t actually build those fundamental skills to be able to use these tools effectively in the future. *Hamsa Bastani* + +### Ethics and inclusion + +LLM outputs can reflect and amplify bias in training data. In documentation and tutorials, that bias can harm the very communities we want to support. + +## Our Approach: Transparency and Disclosure + +We acknowledge that social and ethical norms, as well as concerns about environmental and societal impacts, vary widely across the community. We are not here to judge anyone who uses or doesn't use LLMs. Our focus centers on supporting informed decision-making and consent regarding LLM use in the pyOpenSci software submission, review, and editorial process. + +Our community’s expectation is simple: **be open about and disclose any Generative AI use in your package** when you submit it to our open software review process. + +* Disclose LLM use in your README and at the top of relevant modules. +* Describe how the Generative AI tools were used in your package's development. +* Be clear about what human review you performed on Generative AI outputs before submitting the package to our open peer review process. + +Transparency helps reviewers understand context, trace decisions, and focus their time where it matters most. + +### Human oversight + +LLM-assisted code must be **reviewed, edited, and tested by humans** before submission. + +* Run your tests and confirm the correctness of the code that you submitted. +* Check for security and quality issues. +* Ensure style, readability, and concise docstrings. +* Explain your review process in your software submission to pyOpenSci. + +Please **don’t offload vetting of generative AI content to volunteer reviewers**. Arrive with human-reviewed code that you understand, have tested, and can maintain. + +### Watch out for licensing issues. + +LLMs are trained on large amounts of open source code; most of that code has licenses that require attribution. +The problem? LLMs sometimes spit out near-exact copies of that training data, but without any attribution or copyright notices. + +Why this matters: + +* Using LLM output verbatim could violate the original code's license +* You might accidentally commit plagiarism or copyright infringement by using that output verbatim in your code +* Due diligence is nearly impossible since you can't trace what the LLM "learned from" (most LLM's are black boxes) + +When licenses clash, it gets messy. Say your package uses an MIT license (common in scientific Python), but an LLM outputs Apache-2.0 or GPL code—those licenses aren't compatible. You can't just add attribution to fix it. Technically, you'd have to delete everything and rewrite it from scratch to comply with the licensing requirements. + +While this is all tricky, here's what you can do, now: + +*Prefer human-edited, transformative outputs you fully understand* + +* Be aware that when you directly use content developed by an LLM, there will be inherent license conflicts. +* Be aware that LLM products can potentially return copyrighted code verbatim. **Don't paste LLM outputs directly into your code**. Instead, review, edit, and transform anything an LLM gives you. Consider using [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design) to achieve this. +* **Make sure you fully understand the code before using it:** This is actually in your best interest because you can learn a lot about programming by asking an LLM questions and reviewing the output critically. + +You can't control what's in training data, but you can be thoughtful about how you use these tools. + +
+Examples of how these licensing issues are impacting and stressing our legal systems: + +* [GitHub Copilot litication](https://githubcopilotlitigation.com/case-updates.html) +* [Litigation around text from LLMs](https://arxiv.org/abs/2505.12546) +* [incompatible licenses](https://dwheeler.com/essays/floss-license-slide.html) +
+ +### Review for bias + +Inclusion is part of quality. Treat AI-generated text with the same care as code. +Given the known biases that can manifest in Generative AI-derived text: + +* Review AI-generated text for stereotypes or exclusionary language. +* Prefer plain, inclusive language. +* Invite feedback and review from diverse contributors. + +## Things to consider in your development workflows + +If you are a maintainer or a contributor, some of the above can apply to your development and contribution process, too. +Similar to how peer review systems are being taxed, rapid, AI-assisted pull requests and issues can also overwhelm maintainers too. To combat this: + +* Open an issue first before submitting a pull request to ensure it's welcome and needed +* Keep your pull requests small with clear scopes. +* If you use LLMs, test and edit all of the output before you submit a pull request or issue. +* Flag AI-assisted sections of any contribution so maintainers know where to look closely. +* Be responsive to feedback from maintainers, especially when submitting code that is AI-generated. + +## Where we go from here + +A lot of thought and consideration has gone into the development of pyOpenSci's Generative AI policies. +We will continue to suggest best practices for embracing modern technologies while critically evaluating their realities and the impacts they have on our ecosystem. These guidelines help us maintain the quality and integrity of packages in our peer review process while protecting the volunteer community that makes open peer review possible. As AI tools evolve, so will our approach—but transparency, human oversight, and community trust will always remain at the center of our work. + +## Join the conversation + +This policy is just the beginning. As AI continues to evolve, so will our practices. We invite you to: + +👉 [Read the full draft policy and discussion](https://github.com/pyOpenSci/software-peer-review/pull/344) +👉 Share your feedback and help us shape how the scientific Python community approaches Generative AI in open source. + +The conversation is only starting, and your voice matters. From a93d2beea7f10eb14c092cfc2added772dd4bdaf Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 13:25:23 -0700 Subject: [PATCH 15/28] Apply suggestion from @willingc Co-authored-by: Carol Willing --- _posts/2025-11-18-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 97e2a091..93bbf9ec 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -19,7 +19,7 @@ authors: Leah Wasser, Jed Brown, Carter Rhea, Ellie Abrahams ## Generative AI meets scientific open source -Some developers believe that using AI products increases efficiency. However, in scientific open source, speed isn't everything—transparency, quality, and community trust are just as important as understanding the environmental impacts of using large language models in our everyday work. Similarly, the ethical questions that these tools raise are also a concern as some communities may benefit from the same tools that hurt others. +Some developers believe that using AI products increases efficiency. However, in scientific open source, speed isn't everything—transparency, quality, and community trust are just as important as understanding the environmental impacts of using large language models in our everyday work. Similarly, ethical questions arise when tools may benefit some communities while harming others. ## Why we need guidelines From eb97f6b0e3ab3f6a9de88adc4ec880ea920d9dc0 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 13:25:46 -0700 Subject: [PATCH 16/28] Apply suggestion from @willingc Co-authored-by: Carol Willing --- _posts/2025-11-18-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 93bbf9ec..fea9d908 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -25,7 +25,7 @@ Some developers believe that using AI products increases efficiency. However, in At pyOpenSci, we’ve drafted a new policy for our peer review process to set clear expectations for disclosing the use of LLMs in scientific open-source software. -This is not about banning AI tools. We recognize their value to some people. Instead, our goal is transparency. We want maintainers to **disclose when and how they’ve used LLMs** so editors and reviewers can fairly and efficiently evaluate submissions. Further, we want to avoid burdening our volunteer editorial and reviewer team with being the first to review generated code. +Our goal is transparency and fostering reproducible research. For scientific rigor, we want maintainers to **disclose when and how they’ve used LLMs** so editors and reviewers can fairly and efficiently evaluate submissions. Further, we want to avoid burdening our volunteer editorial and reviewer team with being the initial reviewers of generated code. ## A complex topic: Benefits and concerns From 6cd7e980a6c12b4e77b39b1386986b799358d51a Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 13:26:45 -0700 Subject: [PATCH 17/28] Apply suggestion from @willingc Co-authored-by: Carol Willing --- _posts/2025-11-18-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index fea9d908..16175c55 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -36,7 +36,7 @@ LLMs are perceived as helping developers: * In some cases, simplifying language barriers for participants in open source around the world * Speeding up everyday workflows -Some contributors also perceive these products as making open source more accessible. However, LLM's also present +Some contributors also perceive these products as making open source more accessible. However, LLMs also present unprecedented social and environmental challenges. ### Incorrectness of LLMs and misleading time benefits From 18305ed4693a759fb6b6bb4237060a5bb15273a3 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 13:26:59 -0700 Subject: [PATCH 18/28] Apply suggestion from @willingc Co-authored-by: Carol Willing --- _posts/2025-11-18-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 16175c55..ce2d5b93 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -45,7 +45,7 @@ Although it is commonly stated that LLMs help improve the productivity of high-l ### Environmental impacts -Training and running LLMs [requires massive energy consumption](https://www.unep.org/news-and-stories/story/ai-has-environmental-problem-heres-what-world-can-do-about), raising sustainability concerns that sit uncomfortably alongside much of the global scale scientific research that our community supports. +Training and running LLMs [requires massive energy consumption](https://www.unep.org/news-and-stories/story/ai-has-environmental-problem-heres-what-world-can-do-about), raising sustainability concerns that sit uncomfortably alongside much of the global-scale scientific research that our community supports. ### Impact on learning From b6e8cb77634a8def657748f5430baae9ba34276e Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 13:27:15 -0700 Subject: [PATCH 19/28] Apply suggestion from @willingc Co-authored-by: Carol Willing --- _posts/2025-11-18-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index ce2d5b93..9565201a 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -49,7 +49,7 @@ Training and running LLMs [requires massive energy consumption](https://www.unep ### Impact on learning -Heavy reliance on LLMs risks producing developers who can prompt, but not debug or maintain, code—undermining long-term project sustainability and growth. This also in the long run will make it [harder for young developers to learn how to code, and troubleshoot independently](https://knowledge.wharton.upenn.edu/article/without-guardrails-generative-ai-can-harm-education/). +Heavy reliance on LLMs risks producing developers who can prompt, but not debug, maintain, or secure production code. This risk undermines long-term project sustainability and growth. In the long run, it will make it [harder for young developers to learn how to code and troubleshoot independently](https://knowledge.wharton.upenn.edu/article/without-guardrails-generative-ai-can-harm-education/). > We’re really worried that if humans don’t learn, if they start using these tools as a crutch and rely on it, then they won’t actually build those fundamental skills to be able to use these tools effectively in the future. *Hamsa Bastani* From c9e450e5d953ad312346163540456817f550410b Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 13:27:29 -0700 Subject: [PATCH 20/28] Apply suggestion from @willingc Co-authored-by: Carol Willing --- _posts/2025-11-18-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 9565201a..4da40381 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -61,7 +61,7 @@ LLM outputs can reflect and amplify bias in training data. In documentation and We acknowledge that social and ethical norms, as well as concerns about environmental and societal impacts, vary widely across the community. We are not here to judge anyone who uses or doesn't use LLMs. Our focus centers on supporting informed decision-making and consent regarding LLM use in the pyOpenSci software submission, review, and editorial process. -Our community’s expectation is simple: **be open about and disclose any Generative AI use in your package** when you submit it to our open software review process. +Our community’s expectation is simple: **be open and disclose any Generative AI use in your package** when you submit it to our open software review process. * Disclose LLM use in your README and at the top of relevant modules. * Describe how the Generative AI tools were used in your package's development. From 19f587cf1c7ae94e72f5d8ee3a13413441a6626e Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 13:51:21 -0700 Subject: [PATCH 21/28] Apply suggestion from @lwasser --- _posts/2025-11-18-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 4da40381..1e19607a 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -88,7 +88,7 @@ The problem? LLMs sometimes spit out near-exact copies of that training data, bu Why this matters: * Using LLM output verbatim could violate the original code's license -* You might accidentally commit plagiarism or copyright infringement by using that output verbatim in your code +* License conflicts can occur if your package's license (e.g., MIT) is incompatible with code patterns that the LLM learned on such as code licensed as GPL or Apache-2.0. * Due diligence is nearly impossible since you can't trace what the LLM "learned from" (most LLM's are black boxes) When licenses clash, it gets messy. Say your package uses an MIT license (common in scientific Python), but an LLM outputs Apache-2.0 or GPL code—those licenses aren't compatible. You can't just add attribution to fix it. Technically, you'd have to delete everything and rewrite it from scratch to comply with the licensing requirements. From 47bbda37e79bc91ce44d0a5cb3029b85580fc6a5 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 13:51:28 -0700 Subject: [PATCH 22/28] Apply suggestion from @lwasser --- _posts/2025-11-18-generative-ai-peer-review.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 1e19607a..9a0dca68 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -93,7 +93,9 @@ Why this matters: When licenses clash, it gets messy. Say your package uses an MIT license (common in scientific Python), but an LLM outputs Apache-2.0 or GPL code—those licenses aren't compatible. You can't just add attribution to fix it. Technically, you'd have to delete everything and rewrite it from scratch to comply with the licensing requirements. -While this is all tricky, here's what you can do, now: +The reality of all of this is that you can't eliminate this risk of license infringement or plagiarism with current LLM technology. But you can be more thoughtful about how you use the technology. + +**What you can do now:** *Prefer human-edited, transformative outputs you fully understand* From 7a6217a2df84c82e8812f95b70edbacfa7fe52a1 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 13:51:38 -0700 Subject: [PATCH 23/28] Apply suggestion from @lwasser --- _posts/2025-11-18-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 9a0dca68..866777a0 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -89,7 +89,7 @@ Why this matters: * Using LLM output verbatim could violate the original code's license * License conflicts can occur if your package's license (e.g., MIT) is incompatible with code patterns that the LLM learned on such as code licensed as GPL or Apache-2.0. -* Due diligence is nearly impossible since you can't trace what the LLM "learned from" (most LLM's are black boxes) +* * You can't trace what content the LLM learned from (the black box problem); this makes due diligence impossible on your part. You might accidentally commit plagiarism or copyright infringement by using LLM output in your code even if you modify it. When licenses clash, it gets messy. Say your package uses an MIT license (common in scientific Python), but an LLM outputs Apache-2.0 or GPL code—those licenses aren't compatible. You can't just add attribution to fix it. Technically, you'd have to delete everything and rewrite it from scratch to comply with the licensing requirements. From e0c0fb24525914f101811ea6f6a945e79dbaa4fd Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 13:51:47 -0700 Subject: [PATCH 24/28] Apply suggestion from @lwasser --- _posts/2025-11-18-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 866777a0..90629c45 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -87,7 +87,7 @@ The problem? LLMs sometimes spit out near-exact copies of that training data, bu Why this matters: -* Using LLM output verbatim could violate the original code's license +* LLM-generated code may be *substantially similar* to copyrighted training data; sometimes it is identical. Copyright law focuses on how similar your content is compared to the original. * License conflicts can occur if your package's license (e.g., MIT) is incompatible with code patterns that the LLM learned on such as code licensed as GPL or Apache-2.0. * * You can't trace what content the LLM learned from (the black box problem); this makes due diligence impossible on your part. You might accidentally commit plagiarism or copyright infringement by using LLM output in your code even if you modify it. From ffbe238e7b626b86ed5308b394d9db8623b32b03 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 14:07:08 -0700 Subject: [PATCH 25/28] enh: edits from review --- _posts/2025-11-18-generative-ai-peer-review.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 90629c45..5262140b 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -36,7 +36,7 @@ LLMs are perceived as helping developers: * In some cases, simplifying language barriers for participants in open source around the world * Speeding up everyday workflows -Some contributors also perceive these products as making open source more accessible. However, LLMs also present +Some contributors also believe these products open source more accessible. And for some, maybe they do. However, LLMs also present unprecedented social and environmental challenges. ### Incorrectness of LLMs and misleading time benefits @@ -88,20 +88,20 @@ The problem? LLMs sometimes spit out near-exact copies of that training data, bu Why this matters: * LLM-generated code may be *substantially similar* to copyrighted training data; sometimes it is identical. Copyright law focuses on how similar your content is compared to the original. -* License conflicts can occur if your package's license (e.g., MIT) is incompatible with code patterns that the LLM learned on such as code licensed as GPL or Apache-2.0. -* * You can't trace what content the LLM learned from (the black box problem); this makes due diligence impossible on your part. You might accidentally commit plagiarism or copyright infringement by using LLM output in your code even if you modify it. +* You can't trace what content the LLM learned from (the black box problem); this makes due diligence impossible on your part. You might accidentally commit plagiarism or copyright infringement by using LLM output in your code even if you modify it. +* License conflicts can occur because of both items above. Read on... When licenses clash, it gets messy. Say your package uses an MIT license (common in scientific Python), but an LLM outputs Apache-2.0 or GPL code—those licenses aren't compatible. You can't just add attribution to fix it. Technically, you'd have to delete everything and rewrite it from scratch to comply with the licensing requirements. -The reality of all of this is that you can't eliminate this risk of license infringement or plagiarism with current LLM technology. But you can be more thoughtful about how you use the technology. +The reality of all of this is that you can't eliminate this risk of license infringement or plagiarism with current LLM technology. But you can be more thoughtful about how you use the technology. **What you can do now:** -*Prefer human-edited, transformative outputs you fully understand* - * Be aware that when you directly use content developed by an LLM, there will be inherent license conflicts. -* Be aware that LLM products can potentially return copyrighted code verbatim. **Don't paste LLM outputs directly into your code**. Instead, review, edit, and transform anything an LLM gives you. Consider using [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design) to achieve this. -* **Make sure you fully understand the code before using it:** This is actually in your best interest because you can learn a lot about programming by asking an LLM questions and reviewing the output critically. +* Understand and transform code that is returned from a LLM: Don't paste LLM outputs directly. Review, edit, and ensure you fully understand what you're using. You can ask the LLM questions to better understand it's outputs. This approach also helps you learn which addresses the education concerns that we raised earlier. +* **Use LLMs as learning tools**: Ask questions, review outputs critically, then write your own implementation based on understanding. Often the outputs of LLMs are messy or inefficient. Use them to learn, not to copy. +* Consider [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design): Have one person review LLM suggestions for approach; have another person implement from that high-level description +* **Document your process**: If you plan to submit a Python package for pyOpenSci review, we will ask you about your use of LLM's in your work. Document the use of LLMs in your project's README file and in any modules with LLM outputs have been applied. Confirm that it has been reviewed by a human prior to submitting it to us, or any other volunteer lead peer review process. You can't control what's in training data, but you can be thoughtful about how you use these tools. From 9af29eda0a6b05f2c937714bc78bed3782f83d84 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Wed, 19 Nov 2025 14:09:33 -0700 Subject: [PATCH 26/28] Apply suggestion from @lwasser --- _posts/2025-11-18-generative-ai-peer-review.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 5262140b..17556a4c 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -15,7 +15,7 @@ comments: true last_modified: 2025-09-16 --- -authors: Leah Wasser, Jed Brown, Carter Rhea, Ellie Abrahams +authors: Leah Wasser, Jed Brown, Carter Rhea, Ellie Abrahams, Carol Willing ## Generative AI meets scientific open source From fbf49934b736bdf3f4424de4dc2fff8d88d153ee Mon Sep 17 00:00:00 2001 From: Eliot Robson Date: Mon, 8 Dec 2025 17:54:07 -0500 Subject: [PATCH 27/28] Update 2025-11-18-generative-ai-peer-review.md --- .../2025-11-18-generative-ai-peer-review.md | 72 +++++++++---------- 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index 17556a4c..cd5f68ea 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -31,12 +31,12 @@ Our goal is transparency and fostering reproducible research. For scientific rig LLMs are perceived as helping developers: -* Explain complex codebases -* Generate unit tests and docstrings -* In some cases, simplifying language barriers for participants in open source around the world -* Speeding up everyday workflows +- Explain complex codebases +- Generate unit tests and docstrings +- In some cases, simplifying language barriers for participants in open source around the world +- Speeding up everyday workflows -Some contributors also believe these products open source more accessible. And for some, maybe they do. However, LLMs also present +Some contributors perceive these products as making open source more accessible. And for some, maybe they do. However, LLMs also present unprecedented social and environmental challenges. ### Incorrectness of LLMs and misleading time benefits @@ -51,7 +51,7 @@ Training and running LLMs [requires massive energy consumption](https://www.unep Heavy reliance on LLMs risks producing developers who can prompt, but not debug, maintain, or secure production code. This risk undermines long-term project sustainability and growth. In the long run, it will make it [harder for young developers to learn how to code and troubleshoot independently](https://knowledge.wharton.upenn.edu/article/without-guardrails-generative-ai-can-harm-education/). -> We’re really worried that if humans don’t learn, if they start using these tools as a crutch and rely on it, then they won’t actually build those fundamental skills to be able to use these tools effectively in the future. *Hamsa Bastani* +> We’re really worried that if humans don’t learn, if they start using these tools as a crutch and rely on it, then they won’t actually build those fundamental skills to be able to use these tools effectively in the future. _Hamsa Bastani_ ### Ethics and inclusion @@ -63,9 +63,9 @@ We acknowledge that social and ethical norms, as well as concerns about environm Our community’s expectation is simple: **be open and disclose any Generative AI use in your package** when you submit it to our open software review process. -* Disclose LLM use in your README and at the top of relevant modules. -* Describe how the Generative AI tools were used in your package's development. -* Be clear about what human review you performed on Generative AI outputs before submitting the package to our open peer review process. +- Disclose LLM use in your README and at the top of relevant modules. +- Describe how the Generative AI tools were used in your package's development. +- Be clear about what human review you performed on Generative AI outputs before submitting the package to our open peer review process. Transparency helps reviewers understand context, trace decisions, and focus their time where it matters most. @@ -73,44 +73,44 @@ Transparency helps reviewers understand context, trace decisions, and focus thei LLM-assisted code must be **reviewed, edited, and tested by humans** before submission. -* Run your tests and confirm the correctness of the code that you submitted. -* Check for security and quality issues. -* Ensure style, readability, and concise docstrings. -* Explain your review process in your software submission to pyOpenSci. +- Run your tests and confirm the correctness of the code that you submitted. +- Check for security and quality issues. +- Ensure style, readability, and concise docstrings. Depending on the AI tool, generated docstrings can sometimes be overly verbose without adding meaningful understanding. +- Explain your review process in your software submission to pyOpenSci. -Please **don’t offload vetting of generative AI content to volunteer reviewers**. Arrive with human-reviewed code that you understand, have tested, and can maintain. +Please **don't offload vetting of generative AI content to volunteer reviewers**. Arrive with human-reviewed code that you understand, have tested, and can maintain. As the submitter, you are accountable for your submission: you take responsibility for the quality, correctness, and provenance of all code in your package, regardless of how it was generated. ### Watch out for licensing issues. -LLMs are trained on large amounts of open source code; most of that code has licenses that require attribution. -The problem? LLMs sometimes spit out near-exact copies of that training data, but without any attribution or copyright notices. +LLMs are trained on large amounts of open source code, and most of that code has licenses that require attribution (including permissive licenses like MIT and BSD-3). +The problem? LLMs sometimes produce near-exact copies of that training data, but without any attribution or copyright notices. **LLM output does not comply with the license requirements of the input code, even when the input is permissively licensed**, because it fails to provide the required attribution. Why this matters: -* LLM-generated code may be *substantially similar* to copyrighted training data; sometimes it is identical. Copyright law focuses on how similar your content is compared to the original. -* You can't trace what content the LLM learned from (the black box problem); this makes due diligence impossible on your part. You might accidentally commit plagiarism or copyright infringement by using LLM output in your code even if you modify it. -* License conflicts can occur because of both items above. Read on... +- LLM-generated code may be _substantially similar_ to copyrighted training data; sometimes it is identical. Copyright law focuses on how similar your content is compared to the original. +- You can't trace what content the LLM learned from (the black box problem); this makes due diligence impossible on your part. You might accidentally commit plagiarism or copyright infringement by using LLM output in your code even if you modify it. +- License conflicts occur because of both items above. Read on... -When licenses clash, it gets messy. Say your package uses an MIT license (common in scientific Python), but an LLM outputs Apache-2.0 or GPL code—those licenses aren't compatible. You can't just add attribution to fix it. Technically, you'd have to delete everything and rewrite it from scratch to comply with the licensing requirements. +When licenses clash, it gets particularly messy. Even when licenses are compatible (e.g., MIT-licensed training data and MIT-licensed output), you still have a violation because attribution is missing. With incompatible licenses (say, an LLM outputs GPL code and your package uses MIT), you can't just add attribution to fix it—you'd technically have to delete everything and rewrite it from scratch using clean-room methods to comply with licensing requirements. The reality of all of this is that you can't eliminate this risk of license infringement or plagiarism with current LLM technology. But you can be more thoughtful about how you use the technology. **What you can do now:** -* Be aware that when you directly use content developed by an LLM, there will be inherent license conflicts. -* Understand and transform code that is returned from a LLM: Don't paste LLM outputs directly. Review, edit, and ensure you fully understand what you're using. You can ask the LLM questions to better understand it's outputs. This approach also helps you learn which addresses the education concerns that we raised earlier. -* **Use LLMs as learning tools**: Ask questions, review outputs critically, then write your own implementation based on understanding. Often the outputs of LLMs are messy or inefficient. Use them to learn, not to copy. -* Consider [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design): Have one person review LLM suggestions for approach; have another person implement from that high-level description -* **Document your process**: If you plan to submit a Python package for pyOpenSci review, we will ask you about your use of LLM's in your work. Document the use of LLMs in your project's README file and in any modules with LLM outputs have been applied. Confirm that it has been reviewed by a human prior to submitting it to us, or any other volunteer lead peer review process. +- Be aware that when you directly use content from an LLM, there will be inherent license conflicts and attribution issues. +- Understand and transform code returned from an LLM: Don't paste LLM outputs directly. Review, edit, and ensure you fully understand what you're using. You can ask the LLM questions to better understand its outputs. This approach also helps you learn, which addresses the education concerns that we raised earlier. +- **Use LLMs as learning tools**: Ask questions, review outputs critically, then write your own implementation based on understanding. Often the outputs of LLMs are messy or inefficient. Use them to learn, not to copy. +- Consider [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design): Have one person review LLM suggestions for approach; have another person implement from that high-level description +- **Document your process**: If you plan to submit a Python package for pyOpenSci review, we will ask you about your use of LLM's in your work. Document the use of LLMs in your project's README file and in any modules with LLM outputs have been applied. Confirm that it has been reviewed by a human prior to submitting it to us, or any other volunteer lead peer review process. You can't control what's in training data, but you can be thoughtful about how you use these tools.
Examples of how these licensing issues are impacting and stressing our legal systems: -* [GitHub Copilot litication](https://githubcopilotlitigation.com/case-updates.html) -* [Litigation around text from LLMs](https://arxiv.org/abs/2505.12546) -* [incompatible licenses](https://dwheeler.com/essays/floss-license-slide.html) +- [GitHub Copilot litication](https://githubcopilotlitigation.com/case-updates.html) +- [Litigation around text from LLMs](https://arxiv.org/abs/2505.12546) +- [incompatible licenses](https://dwheeler.com/essays/floss-license-slide.html)
### Review for bias @@ -118,20 +118,20 @@ Examples of how these licensing issues are impacting and stressing our legal sys Inclusion is part of quality. Treat AI-generated text with the same care as code. Given the known biases that can manifest in Generative AI-derived text: -* Review AI-generated text for stereotypes or exclusionary language. -* Prefer plain, inclusive language. -* Invite feedback and review from diverse contributors. +- Review AI-generated text for stereotypes or exclusionary language. +- Prefer plain, inclusive language. +- Invite feedback and review from diverse contributors. ## Things to consider in your development workflows If you are a maintainer or a contributor, some of the above can apply to your development and contribution process, too. Similar to how peer review systems are being taxed, rapid, AI-assisted pull requests and issues can also overwhelm maintainers too. To combat this: -* Open an issue first before submitting a pull request to ensure it's welcome and needed -* Keep your pull requests small with clear scopes. -* If you use LLMs, test and edit all of the output before you submit a pull request or issue. -* Flag AI-assisted sections of any contribution so maintainers know where to look closely. -* Be responsive to feedback from maintainers, especially when submitting code that is AI-generated. +- Open an issue first before submitting a pull request to ensure it's welcome and needed +- Keep your pull requests small with clear scopes. +- If you use LLMs, test and edit all of the output before you submit a pull request or issue. +- Flag AI-assisted sections of any contribution so maintainers know where to look closely. +- Be responsive to feedback from maintainers, especially when submitting code that is AI-generated. ## Where we go from here From 8b3abc441b89b58661ff2412f19d5d6c802f2f45 Mon Sep 17 00:00:00 2001 From: Leah Wasser Date: Tue, 16 Dec 2025 14:31:54 -0700 Subject: [PATCH 28/28] enh: typos, cleanup, last comment to respond to --- .../2025-11-18-generative-ai-peer-review.md | 273 +++++++++++++----- 1 file changed, 206 insertions(+), 67 deletions(-) diff --git a/_posts/2025-11-18-generative-ai-peer-review.md b/_posts/2025-11-18-generative-ai-peer-review.md index cd5f68ea..5365c64f 100644 --- a/_posts/2025-11-18-generative-ai-peer-review.md +++ b/_posts/2025-11-18-generative-ai-peer-review.md @@ -12,137 +12,276 @@ categories: classes: wide toc: true comments: true -last_modified: 2025-09-16 +last_modified: 2025-12-16 --- -authors: Leah Wasser, Jed Brown, Carter Rhea, Ellie Abrahams, Carol Willing +authors: Leah Wasser, Jed Brown, Carter Rhea, Ellie Abrahams, Carol Willing, Stefan van der Walt, Eliot Robson ## Generative AI meets scientific open source -Some developers believe that using AI products increases efficiency. However, in scientific open source, speed isn't everything—transparency, quality, and community trust are just as important as understanding the environmental impacts of using large language models in our everyday work. Similarly, ethical questions arise when tools may benefit some communities while harming others. +Some developers believe that using Generative AI products increases +efficiency. However, in scientific open source, speed isn't +everything—transparency, quality, and community trust are just as +important as understanding the environmental impacts of using large +language models in our everyday work. Similarly, ethical questions +arise when tools may benefit some communities while harming others. ## Why we need guidelines -At pyOpenSci, we’ve drafted a new policy for our peer review process to set clear expectations for disclosing the use of LLMs in scientific open-source software. +At pyOpenSci, [we’ve drafted a new policy](https://github.com/pyOpenSci/software-peer-review/pull/344) for our peer review process to set clear expectations for disclosing the use of LLMs in scientific open-source software. Our goal is transparency and fostering reproducible research. For scientific rigor, we want maintainers to **disclose when and how they’ve used LLMs** so editors and reviewers can fairly and efficiently evaluate submissions. Further, we want to avoid burdening our volunteer editorial and reviewer team with being the initial reviewers of generated code. -## A complex topic: Benefits and concerns +This is the beginning of our work to ensure that Gen AI tools are not +creating undue burden on our volunteer software review team. Humans +cannot perform in depth reviews at the rate at which these tools can +create large volumes of code. + +## A complex topic: benefits and concerns LLMs are perceived as helping developers: -- Explain complex codebases -- Generate unit tests and docstrings -- In some cases, simplifying language barriers for participants in open source around the world -- Speeding up everyday workflows +* Explain complex codebases +* Generate unit tests and docstrings +* Simplify language barriers for participants in open source around + the world +* Speed up everyday workflows -Some contributors perceive these products as making open source more accessible. And for some, maybe they do. However, LLMs also present -unprecedented social and environmental challenges. +Some contributors perceive these products as making open source more +accessible. And for some, maybe they do. However, LLMs also present +unprecedented social and environmental challenges that we have to +critically evaluate. ### Incorrectness of LLMs and misleading time benefits -Although it is commonly stated that LLMs help improve the productivity of high-level developers, recent scientific explorations of this hypothesis [indicate the contrary](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/). What's more is that the responses of LLMs for complex coding tasks [tend to be incorrect](https://arxiv.org/html/2407.06153v1) and/or overly verbose/inefficient. It is crucial that, if you use an LLM to help produce code, you should independently evaluate code correctness and efficiency. +Although it is commonly stated that LLMs help improve the productivity +of high-level developers, recent scientific explorations of this +hypothesis [indicate the +contrary](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/). +What's more is that the responses of LLMs for complex coding tasks +[tend to be +incorrect](https://arxiv.org/html/2407.06153v1) and/or overly +verbose/inefficient. It is crucial that, if you use an LLM to help +produce code, you should independently evaluate code correctness and +efficiency. ### Environmental impacts -Training and running LLMs [requires massive energy consumption](https://www.unep.org/news-and-stories/story/ai-has-environmental-problem-heres-what-world-can-do-about), raising sustainability concerns that sit uncomfortably alongside much of the global-scale scientific research that our community supports. +Training and running LLMs [requires massive energy +consumption](https://www.unep.org/news-and-stories/story/ai-has-environmental-problem-heres-what-world-can-do-about), +raising sustainability concerns that sit uncomfortably alongside much +of the global-scale scientific research that our community supports. ### Impact on learning -Heavy reliance on LLMs risks producing developers who can prompt, but not debug, maintain, or secure production code. This risk undermines long-term project sustainability and growth. In the long run, it will make it [harder for young developers to learn how to code and troubleshoot independently](https://knowledge.wharton.upenn.edu/article/without-guardrails-generative-ai-can-harm-education/). +Heavy reliance on LLMs risks producing developers who can prompt, but +not debug, maintain, or secure production code. This risk undermines +long-term project sustainability and growth. In the long run, it will +make it [harder for young developers to learn how to code and +troubleshoot +independently](https://knowledge.wharton.upenn.edu/article/without-guardrails-generative-ai-can-harm-education/). > We’re really worried that if humans don’t learn, if they start using these tools as a crutch and rely on it, then they won’t actually build those fundamental skills to be able to use these tools effectively in the future. _Hamsa Bastani_ ### Ethics and inclusion -LLM outputs can reflect and amplify bias in training data. In documentation and tutorials, that bias can harm the very communities we want to support. +LLM outputs can reflect and amplify bias in training data. In +documentation and tutorials, that bias can harm the very communities +we want to support. -## Our Approach: Transparency and Disclosure +## Our approach: transparency and disclosure -We acknowledge that social and ethical norms, as well as concerns about environmental and societal impacts, vary widely across the community. We are not here to judge anyone who uses or doesn't use LLMs. Our focus centers on supporting informed decision-making and consent regarding LLM use in the pyOpenSci software submission, review, and editorial process. +We acknowledge that social and ethical norms, as well as concerns +about environmental and societal impacts, vary widely across the +community. We are not here to judge anyone who uses or doesn't use +LLMs. Our focus centers on supporting informed decision-making and +consent regarding LLM use in the pyOpenSci software submission, +review, and editorial process. -Our community’s expectation is simple: **be open and disclose any Generative AI use in your package** when you submit it to our open software review process. +Our community’s expectation for maintainers submitting a package is simple: **be open and disclose any Generative AI use in your package** when you submit it to our open software review process. -- Disclose LLM use in your README and at the top of relevant modules. -- Describe how the Generative AI tools were used in your package's development. -- Be clear about what human review you performed on Generative AI outputs before submitting the package to our open peer review process. +* Disclose LLM use in your README and at the top of relevant modules. +* Describe how the Generative AI tools were used in your package's development. +* Be clear about what human review you performed on Generative AI + outputs before submitting the package to our open peer review + process. -Transparency helps reviewers understand context, trace decisions, and focus their time where it matters most. +Transparency helps reviewers understand context, trace decisions, and +focus their time where it matters most. We do not want the burden of +reviewing code generated from a model, placed on a volunteer. That +effort belongs to the maintainer who ran the model that generated that +code. ### Human oversight -LLM-assisted code must be **reviewed, edited, and tested by humans** before submission. +LLM-assisted code must be **reviewed, edited, and tested by humans** +before submission. -- Run your tests and confirm the correctness of the code that you submitted. -- Check for security and quality issues. -- Ensure style, readability, and concise docstrings. Depending on the AI tool, generated docstrings can sometimes be overly verbose without adding meaningful understanding. -- Explain your review process in your software submission to pyOpenSci. +* Run your tests and confirm the correctness of the code that you submitted. +* Check for security and quality issues. +* Ensure style, readability, and concise docstrings. Depending on the + AI tool, generated docstrings can sometimes be overly verbose without + adding meaningful understanding. +* Explain your review process in your software submission to pyOpenSci. -Please **don't offload vetting of generative AI content to volunteer reviewers**. Arrive with human-reviewed code that you understand, have tested, and can maintain. As the submitter, you are accountable for your submission: you take responsibility for the quality, correctness, and provenance of all code in your package, regardless of how it was generated. +Please **don't offload vetting of generative AI content to volunteer +reviewers**. Arrive with human-reviewed code that you understand, +have tested, and can maintain. As the submitter, you are accountable +for your submission: you take responsibility for the quality, +correctness, and provenance of all code in your package, regardless of +how it was generated. ### Watch out for licensing issues. -LLMs are trained on large amounts of open source code, and most of that code has licenses that require attribution (including permissive licenses like MIT and BSD-3). -The problem? LLMs sometimes produce near-exact copies of that training data, but without any attribution or copyright notices. **LLM output does not comply with the license requirements of the input code, even when the input is permissively licensed**, because it fails to provide the required attribution. +LLMs are trained on large amounts of open source code, and most of +that code has licenses that require attribution (including permissive +licenses like MIT and BSD-3). The problem? LLMs sometimes produce +near-exact copies of that training data, but without any attribution +or copyright notices. **LLM output does not comply with the license +requirements of the input code, even when the input is permissively +licensed**, because it fails to provide the required attribution. + +Not all code carries the same licensing risk. The risk varies +depending on what you're generating. + +Risk of license infringement is **lower for routine tasks** like +refactoring existing code, test suite improvements, creating +boilerplate code, simple utility functions, and docstring generation. +These tasks are more common, often use widely documented patterns, +and are not as likely to be substantially similar to copyrighted +training data. + +Tasks that are **higher risk** include: + +* Algorithm implementations +* Developing workflows for complex data structures +* Domain-specific logic that is potentially already published or + copyrighted + +For high-risk content (e.g., algorithm implementations), you need to +understand the algorithm to vet its correctness, ensure the approach +is not already published and copyrighted, vet its performance, and +evaluate edge cases. If you understand it well enough to review it +thoroughly, you can often implement it yourself. In these cases, use +LLMs as learning aids—ask questions, study approaches, then write +your own implementation. Why this matters: -- LLM-generated code may be _substantially similar_ to copyrighted training data; sometimes it is identical. Copyright law focuses on how similar your content is compared to the original. -- You can't trace what content the LLM learned from (the black box problem); this makes due diligence impossible on your part. You might accidentally commit plagiarism or copyright infringement by using LLM output in your code even if you modify it. -- License conflicts occur because of both items above. Read on... - -When licenses clash, it gets particularly messy. Even when licenses are compatible (e.g., MIT-licensed training data and MIT-licensed output), you still have a violation because attribution is missing. With incompatible licenses (say, an LLM outputs GPL code and your package uses MIT), you can't just add attribution to fix it—you'd technically have to delete everything and rewrite it from scratch using clean-room methods to comply with licensing requirements. - -The reality of all of this is that you can't eliminate this risk of license infringement or plagiarism with current LLM technology. But you can be more thoughtful about how you use the technology. - -**What you can do now:** - -- Be aware that when you directly use content from an LLM, there will be inherent license conflicts and attribution issues. -- Understand and transform code returned from an LLM: Don't paste LLM outputs directly. Review, edit, and ensure you fully understand what you're using. You can ask the LLM questions to better understand its outputs. This approach also helps you learn, which addresses the education concerns that we raised earlier. -- **Use LLMs as learning tools**: Ask questions, review outputs critically, then write your own implementation based on understanding. Often the outputs of LLMs are messy or inefficient. Use them to learn, not to copy. -- Consider [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design): Have one person review LLM suggestions for approach; have another person implement from that high-level description -- **Document your process**: If you plan to submit a Python package for pyOpenSci review, we will ask you about your use of LLM's in your work. Document the use of LLMs in your project's README file and in any modules with LLM outputs have been applied. Confirm that it has been reviewed by a human prior to submitting it to us, or any other volunteer lead peer review process. - -You can't control what's in training data, but you can be thoughtful about how you use these tools. +* LLM-generated code may be _substantially similar_ to copyrighted + training data; sometimes it is identical. Copyright law focuses on + how similar your content is compared to the original. +* You can't trace what content the LLM learned from (the black box + problem); this makes due diligence impossible on your part. You + might accidentally commit plagiarism or copyright infringement by + using LLM output in your code even if you modify it. +* License conflicts occur because of both items above. Read on... + +When licenses clash, it gets particularly messy. Even when licenses +are compatible (e.g., MIT-licensed training data and MIT-licensed +output), you still have a violation because attribution is missing. +With incompatible licenses (say, an LLM outputs GPL code and your +package uses MIT), you can't just add attribution to fix it—you'd +technically have to delete everything and rewrite it from scratch +using clean-room methods to comply with licensing requirements. + +The reality of all of this is that you can't eliminate this risk of +license infringement or plagiarism with current LLM technology. But +you can be more thoughtful about how you use the technology. + +### What you can do now + +Consider the following: + +* Assess the licensing risk based on what you're generating: routine + refactoring carries lower risk than implementing novel algorithms or + domain-specific logic. +* Be aware that when you directly use content from an LLM, there will + be inherent license conflicts and attribution issues. +* **Use LLMs as learning tools**: Ask questions, review outputs + critically, then write your own implementation based on + understanding. Often the outputs of LLMs are messy or inefficient. + Use them to learn, not to copy. This is especially important for + high-risk content like algorithms. +* Understand and transform code returned from an LLM: Don't paste LLM + outputs directly. Review, edit, and ensure you fully understand what + you're using. You can ask the LLM questions to better understand its + outputs. This approach also helps you learn, which addresses the + education concerns that we raised earlier. +* Consider [clean-room + techniques](https://en.wikipedia.org/wiki/Clean-room_design): Have + one person review LLM suggestions for approach; have another person + implement from that high-level description. +* **Document your process**: If you plan to submit a Python package + for pyOpenSci review, we will ask you about your use of LLMs in your + work. Document the use of LLMs in your project's README file and in + any modules where LLM outputs have been applied. Confirm that it has + been reviewed by a human prior to submitting it to us, or any other + volunteer-led peer review process. + +You can't control what's in training data, but you can be thoughtful +about how you use these tools.
-Examples of how these licensing issues are impacting and stressing our legal systems: +Examples of how these licensing issues are impacting and stressing our +legal systems: -- [GitHub Copilot litication](https://githubcopilotlitigation.com/case-updates.html) -- [Litigation around text from LLMs](https://arxiv.org/abs/2505.12546) -- [incompatible licenses](https://dwheeler.com/essays/floss-license-slide.html) +* [GitHub Copilot litigation](https://githubcopilotlitigation.com/case-updates.html) +* [Litigation around text from LLMs](https://arxiv.org/abs/2505.12546) +* [incompatible licenses](https://dwheeler.com/essays/floss-license-slide.html)
### Review for bias -Inclusion is part of quality. Treat AI-generated text with the same care as code. +Inclusion is part of quality. Treat AI-generated text with the same +care as code. Given the known biases that can manifest in Generative AI-derived text: -- Review AI-generated text for stereotypes or exclusionary language. -- Prefer plain, inclusive language. -- Invite feedback and review from diverse contributors. +* Review AI-generated text for stereotypes or exclusionary language. +* Prefer plain, inclusive language. +* Invite feedback and review from diverse contributors. ## Things to consider in your development workflows -If you are a maintainer or a contributor, some of the above can apply to your development and contribution process, too. -Similar to how peer review systems are being taxed, rapid, AI-assisted pull requests and issues can also overwhelm maintainers too. To combat this: - -- Open an issue first before submitting a pull request to ensure it's welcome and needed -- Keep your pull requests small with clear scopes. -- If you use LLMs, test and edit all of the output before you submit a pull request or issue. -- Flag AI-assisted sections of any contribution so maintainers know where to look closely. -- Be responsive to feedback from maintainers, especially when submitting code that is AI-generated. +If you are a maintainer or a contributor, some of the above can apply +to your development and contribution process, too. Similar to how +peer review systems are being taxed, rapid, AI-assisted pull requests +and issues can also overwhelm maintainers too. To combat this: + +* If you are using generative AI tools in your daily workflows, keep each task small, focused, and well-defined. This is particularly important if you are using agent mode. This will produce smaller changes to your codebase that +will be easier to thoughtfully review and evaluate. +* Open an issue first before submitting a pull request to a repository that you don't own to ensure it's + welcome and needed +* Keep your pull requests small with clear scopes. +* If you use LLMs, test and edit all of the output before you submit a + pull request or issue. +* Flag AI-assisted sections of any contribution so maintainers know + where to look closely. +* Be responsive to feedback from maintainers, especially when + submitting code that is AI-generated. ## Where we go from here -A lot of thought and consideration has gone into the development of pyOpenSci's Generative AI policies. -We will continue to suggest best practices for embracing modern technologies while critically evaluating their realities and the impacts they have on our ecosystem. These guidelines help us maintain the quality and integrity of packages in our peer review process while protecting the volunteer community that makes open peer review possible. As AI tools evolve, so will our approach—but transparency, human oversight, and community trust will always remain at the center of our work. +A lot of thought and consideration has gone into the development of +[pyOpenSci's Generative AI +policies](https://www.pyopensci.org/software-peer-review/our-process/policies.html#policy-for-use-of-generative-ai-llms). + +We will continue to suggest best practices for embracing modern +technologies while critically evaluating their realities and the +impacts they have on our ecosystem. These guidelines help us maintain +the quality and integrity of packages in our peer review process while +protecting the volunteer community that makes open peer review +possible. As AI tools evolve, so will our approach—but transparency, +human oversight, and community trust will always remain at the center +of our work. ## Join the conversation -This policy is just the beginning. As AI continues to evolve, so will our practices. We invite you to: +This policy is just the beginning. As AI continues to evolve, so will +our practices. We invite you to: 👉 [Read the full draft policy and discussion](https://github.com/pyOpenSci/software-peer-review/pull/344) -👉 Share your feedback and help us shape how the scientific Python community approaches Generative AI in open source. +👉 Share your feedback and help us shape how the scientific Python + community approaches Generative AI in open source. The conversation is only starting, and your voice matters.