Skip to content

Commit ae68d20

Browse files
lwasserjedbrownwillingceliotwrobson
authored
enh(blog): Add blog post on generative AI peer review policy (#734)
* enh(blog): Add blog post on generative AI peer review policy This blog post outlines pyOpenSci's new peer review policy regarding the use of generative AI tools in scientific software, emphasizing transparency, ethical considerations, and the importance of human oversight in the review process. * Apply suggestion from @jedbrown Co-authored-by: Jed Brown <jed@jedbrown.org> * Apply suggestion from @lwasser * Apply suggestion from @jedbrown Co-authored-by: Jed Brown <jed@jedbrown.org> * Apply suggestion from @jedbrown Co-authored-by: Jed Brown <jed@jedbrown.org> * Update _posts/2025-09-16-generative-ai-peer-review.md Co-authored-by: Jed Brown <jed@jedbrown.org> * Apply suggestions from code review * Apply suggestion from @jedbrown Co-authored-by: Jed Brown <jed@jedbrown.org> * Apply suggestion from @lwasser * Apply suggestion from @lwasser * Apply suggestion from @lwasser * Apply suggestion from @jedbrown Co-authored-by: Jed Brown <jed@jedbrown.org> * Apply suggestion from @lwasser * enh: more edits and updates * Apply suggestion from @willingc Co-authored-by: Carol Willing <carolcode@willingconsulting.com> * Apply suggestion from @willingc Co-authored-by: Carol Willing <carolcode@willingconsulting.com> * Apply suggestion from @willingc Co-authored-by: Carol Willing <carolcode@willingconsulting.com> * Apply suggestion from @willingc Co-authored-by: Carol Willing <carolcode@willingconsulting.com> * Apply suggestion from @willingc Co-authored-by: Carol Willing <carolcode@willingconsulting.com> * Apply suggestion from @willingc Co-authored-by: Carol Willing <carolcode@willingconsulting.com> * Apply suggestion from @lwasser * Apply suggestion from @lwasser * Apply suggestion from @lwasser * Apply suggestion from @lwasser * enh: edits from review * Apply suggestion from @lwasser * Update 2025-11-18-generative-ai-peer-review.md * enh: typos, cleanup, last comment to respond to --------- Co-authored-by: Jed Brown <jed@jedbrown.org> Co-authored-by: Carol Willing <carolcode@willingconsulting.com> Co-authored-by: Eliot Robson <eliot.robson24@gmail.com>
1 parent 89eb7d7 commit ae68d20

File tree

1 file changed

+287
-0
lines changed

1 file changed

+287
-0
lines changed
Lines changed: 287 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,287 @@
1+
---
2+
layout: single
3+
title: "Navigating LLMs in Open Source: pyOpenSci's New Peer Review Policy"
4+
excerpt: "Generative AI products are reducing the effort and skill necessary to generate large amounts of code. In some cases, this strains volunteer peer review programs like ours. Learn about pyOpenSci's approach to developing a Generative AI policy for our software peer review program."
5+
author: "pyopensci"
6+
permalink: /blog/generative-ai-peer-review-policy.html
7+
header:
8+
overlay_image: images/headers/pyopensci-floral.png
9+
categories:
10+
- blog-post
11+
- community
12+
classes: wide
13+
toc: true
14+
comments: true
15+
last_modified: 2025-12-16
16+
---
17+
18+
authors: Leah Wasser, Jed Brown, Carter Rhea, Ellie Abrahams, Carol Willing, Stefan van der Walt, Eliot Robson
19+
20+
## Generative AI meets scientific open source
21+
22+
Some developers believe that using Generative AI products increases
23+
efficiency. However, in scientific open source, speed isn't
24+
everything—transparency, quality, and community trust are just as
25+
important as understanding the environmental impacts of using large
26+
language models in our everyday work. Similarly, ethical questions
27+
arise when tools may benefit some communities while harming others.
28+
29+
## Why we need guidelines
30+
31+
At pyOpenSci, [we’ve drafted a new policy](https://github.com/pyOpenSci/software-peer-review/pull/344) for our peer review process to set clear expectations for disclosing the use of LLMs in scientific open-source software.
32+
33+
Our goal is transparency and fostering reproducible research. For scientific rigor, we want maintainers to **disclose when and how they’ve used LLMs** so editors and reviewers can fairly and efficiently evaluate submissions. Further, we want to avoid burdening our volunteer editorial and reviewer team with being the initial reviewers of generated code.
34+
35+
This is the beginning of our work to ensure that Gen AI tools are not
36+
creating undue burden on our volunteer software review team. Humans
37+
cannot perform in depth reviews at the rate at which these tools can
38+
create large volumes of code.
39+
40+
## A complex topic: benefits and concerns
41+
42+
LLMs are perceived as helping developers:
43+
44+
* Explain complex codebases
45+
* Generate unit tests and docstrings
46+
* Simplify language barriers for participants in open source around
47+
the world
48+
* Speed up everyday workflows
49+
50+
Some contributors perceive these products as making open source more
51+
accessible. And for some, maybe they do. However, LLMs also present
52+
unprecedented social and environmental challenges that we have to
53+
critically evaluate.
54+
55+
### Incorrectness of LLMs and misleading time benefits
56+
57+
Although it is commonly stated that LLMs help improve the productivity
58+
of high-level developers, recent scientific explorations of this
59+
hypothesis [indicate the
60+
contrary](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/).
61+
What's more is that the responses of LLMs for complex coding tasks
62+
[tend to be
63+
incorrect](https://arxiv.org/html/2407.06153v1) and/or overly
64+
verbose/inefficient. It is crucial that, if you use an LLM to help
65+
produce code, you should independently evaluate code correctness and
66+
efficiency.
67+
68+
### Environmental impacts
69+
70+
Training and running LLMs [requires massive energy
71+
consumption](https://www.unep.org/news-and-stories/story/ai-has-environmental-problem-heres-what-world-can-do-about),
72+
raising sustainability concerns that sit uncomfortably alongside much
73+
of the global-scale scientific research that our community supports.
74+
75+
### Impact on learning
76+
77+
Heavy reliance on LLMs risks producing developers who can prompt, but
78+
not debug, maintain, or secure production code. This risk undermines
79+
long-term project sustainability and growth. In the long run, it will
80+
make it [harder for young developers to learn how to code and
81+
troubleshoot
82+
independently](https://knowledge.wharton.upenn.edu/article/without-guardrails-generative-ai-can-harm-education/).
83+
84+
> We’re really worried that if humans don’t learn, if they start using these tools as a crutch and rely on it, then they won’t actually build those fundamental skills to be able to use these tools effectively in the future. _Hamsa Bastani_
85+
86+
### Ethics and inclusion
87+
88+
LLM outputs can reflect and amplify bias in training data. In
89+
documentation and tutorials, that bias can harm the very communities
90+
we want to support.
91+
92+
## Our approach: transparency and disclosure
93+
94+
We acknowledge that social and ethical norms, as well as concerns
95+
about environmental and societal impacts, vary widely across the
96+
community. We are not here to judge anyone who uses or doesn't use
97+
LLMs. Our focus centers on supporting informed decision-making and
98+
consent regarding LLM use in the pyOpenSci software submission,
99+
review, and editorial process.
100+
101+
Our community’s expectation for maintainers submitting a package is simple: **be open and disclose any Generative AI use in your package** when you submit it to our open software review process.
102+
103+
* Disclose LLM use in your README and at the top of relevant modules.
104+
* Describe how the Generative AI tools were used in your package's development.
105+
* Be clear about what human review you performed on Generative AI
106+
outputs before submitting the package to our open peer review
107+
process.
108+
109+
Transparency helps reviewers understand context, trace decisions, and
110+
focus their time where it matters most. We do not want the burden of
111+
reviewing code generated from a model, placed on a volunteer. That
112+
effort belongs to the maintainer who ran the model that generated that
113+
code.
114+
115+
### Human oversight
116+
117+
LLM-assisted code must be **reviewed, edited, and tested by humans**
118+
before submission.
119+
120+
* Run your tests and confirm the correctness of the code that you submitted.
121+
* Check for security and quality issues.
122+
* Ensure style, readability, and concise docstrings. Depending on the
123+
AI tool, generated docstrings can sometimes be overly verbose without
124+
adding meaningful understanding.
125+
* Explain your review process in your software submission to pyOpenSci.
126+
127+
Please **don't offload vetting of generative AI content to volunteer
128+
reviewers**. Arrive with human-reviewed code that you understand,
129+
have tested, and can maintain. As the submitter, you are accountable
130+
for your submission: you take responsibility for the quality,
131+
correctness, and provenance of all code in your package, regardless of
132+
how it was generated.
133+
134+
### Watch out for licensing issues.
135+
136+
LLMs are trained on large amounts of open source code, and most of
137+
that code has licenses that require attribution (including permissive
138+
licenses like MIT and BSD-3). The problem? LLMs sometimes produce
139+
near-exact copies of that training data, but without any attribution
140+
or copyright notices. **LLM output does not comply with the license
141+
requirements of the input code, even when the input is permissively
142+
licensed**, because it fails to provide the required attribution.
143+
144+
Not all code carries the same licensing risk. The risk varies
145+
depending on what you're generating.
146+
147+
Risk of license infringement is **lower for routine tasks** like
148+
refactoring existing code, test suite improvements, creating
149+
boilerplate code, simple utility functions, and docstring generation.
150+
These tasks are more common, often use widely documented patterns,
151+
and are not as likely to be substantially similar to copyrighted
152+
training data.
153+
154+
Tasks that are **higher risk** include:
155+
156+
* Algorithm implementations
157+
* Developing workflows for complex data structures
158+
* Domain-specific logic that is potentially already published or
159+
copyrighted
160+
161+
For high-risk content (e.g., algorithm implementations), you need to
162+
understand the algorithm to vet its correctness, ensure the approach
163+
is not already published and copyrighted, vet its performance, and
164+
evaluate edge cases. If you understand it well enough to review it
165+
thoroughly, you can often implement it yourself. In these cases, use
166+
LLMs as learning aids—ask questions, study approaches, then write
167+
your own implementation.
168+
169+
Why this matters:
170+
171+
* LLM-generated code may be _substantially similar_ to copyrighted
172+
training data; sometimes it is identical. Copyright law focuses on
173+
how similar your content is compared to the original.
174+
* You can't trace what content the LLM learned from (the black box
175+
problem); this makes due diligence impossible on your part. You
176+
might accidentally commit plagiarism or copyright infringement by
177+
using LLM output in your code even if you modify it.
178+
* License conflicts occur because of both items above. Read on...
179+
180+
When licenses clash, it gets particularly messy. Even when licenses
181+
are compatible (e.g., MIT-licensed training data and MIT-licensed
182+
output), you still have a violation because attribution is missing.
183+
With incompatible licenses (say, an LLM outputs GPL code and your
184+
package uses MIT), you can't just add attribution to fix it—you'd
185+
technically have to delete everything and rewrite it from scratch
186+
using clean-room methods to comply with licensing requirements.
187+
188+
The reality of all of this is that you can't eliminate this risk of
189+
license infringement or plagiarism with current LLM technology. But
190+
you can be more thoughtful about how you use the technology.
191+
192+
### What you can do now
193+
194+
Consider the following:
195+
196+
* Assess the licensing risk based on what you're generating: routine
197+
refactoring carries lower risk than implementing novel algorithms or
198+
domain-specific logic.
199+
* Be aware that when you directly use content from an LLM, there will
200+
be inherent license conflicts and attribution issues.
201+
* **Use LLMs as learning tools**: Ask questions, review outputs
202+
critically, then write your own implementation based on
203+
understanding. Often the outputs of LLMs are messy or inefficient.
204+
Use them to learn, not to copy. This is especially important for
205+
high-risk content like algorithms.
206+
* Understand and transform code returned from an LLM: Don't paste LLM
207+
outputs directly. Review, edit, and ensure you fully understand what
208+
you're using. You can ask the LLM questions to better understand its
209+
outputs. This approach also helps you learn, which addresses the
210+
education concerns that we raised earlier.
211+
* Consider [clean-room
212+
techniques](https://en.wikipedia.org/wiki/Clean-room_design): Have
213+
one person review LLM suggestions for approach; have another person
214+
implement from that high-level description.
215+
* **Document your process**: If you plan to submit a Python package
216+
for pyOpenSci review, we will ask you about your use of LLMs in your
217+
work. Document the use of LLMs in your project's README file and in
218+
any modules where LLM outputs have been applied. Confirm that it has
219+
been reviewed by a human prior to submitting it to us, or any other
220+
volunteer-led peer review process.
221+
222+
You can't control what's in training data, but you can be thoughtful
223+
about how you use these tools.
224+
225+
<div class="notice" markdown="1">
226+
Examples of how these licensing issues are impacting and stressing our
227+
legal systems:
228+
229+
* [GitHub Copilot litigation](https://githubcopilotlitigation.com/case-updates.html)
230+
* [Litigation around text from LLMs](https://arxiv.org/abs/2505.12546)
231+
* [incompatible licenses](https://dwheeler.com/essays/floss-license-slide.html)
232+
</div>
233+
234+
### Review for bias
235+
236+
Inclusion is part of quality. Treat AI-generated text with the same
237+
care as code.
238+
Given the known biases that can manifest in Generative AI-derived text:
239+
240+
* Review AI-generated text for stereotypes or exclusionary language.
241+
* Prefer plain, inclusive language.
242+
* Invite feedback and review from diverse contributors.
243+
244+
## Things to consider in your development workflows
245+
246+
If you are a maintainer or a contributor, some of the above can apply
247+
to your development and contribution process, too. Similar to how
248+
peer review systems are being taxed, rapid, AI-assisted pull requests
249+
and issues can also overwhelm maintainers too. To combat this:
250+
251+
* If you are using generative AI tools in your daily workflows, keep each task small, focused, and well-defined. This is particularly important if you are using agent mode. This will produce smaller changes to your codebase that
252+
will be easier to thoughtfully review and evaluate.
253+
* Open an issue first before submitting a pull request to a repository that you don't own to ensure it's
254+
welcome and needed
255+
* Keep your pull requests small with clear scopes.
256+
* If you use LLMs, test and edit all of the output before you submit a
257+
pull request or issue.
258+
* Flag AI-assisted sections of any contribution so maintainers know
259+
where to look closely.
260+
* Be responsive to feedback from maintainers, especially when
261+
submitting code that is AI-generated.
262+
263+
## Where we go from here
264+
265+
A lot of thought and consideration has gone into the development of
266+
[pyOpenSci's Generative AI
267+
policies](https://www.pyopensci.org/software-peer-review/our-process/policies.html#policy-for-use-of-generative-ai-llms).
268+
269+
We will continue to suggest best practices for embracing modern
270+
technologies while critically evaluating their realities and the
271+
impacts they have on our ecosystem. These guidelines help us maintain
272+
the quality and integrity of packages in our peer review process while
273+
protecting the volunteer community that makes open peer review
274+
possible. As AI tools evolve, so will our approach—but transparency,
275+
human oversight, and community trust will always remain at the center
276+
of our work.
277+
278+
## Join the conversation
279+
280+
This policy is just the beginning. As AI continues to evolve, so will
281+
our practices. We invite you to:
282+
283+
👉 [Read the full draft policy and discussion](https://github.com/pyOpenSci/software-peer-review/pull/344)
284+
👉 Share your feedback and help us shape how the scientific Python
285+
community approaches Generative AI in open source.
286+
287+
The conversation is only starting, and your voice matters.

0 commit comments

Comments
 (0)