Skip to content

feat: Add section validation warnings for recommended manuscript sections#275

Open
paxcalpt wants to merge 1 commit intomainfrom
feat/section-validation-warnings
Open

feat: Add section validation warnings for recommended manuscript sections#275
paxcalpt wants to merge 1 commit intomainfrom
feat/section-validation-warnings

Conversation

@paxcalpt
Copy link
Contributor

Summary

This PR adds a new that checks manuscripts for standard sections commonly required by journals and issues warnings (not errors) when sections are missing.

Changes

New Validator

  • File:
  • Checks for 6 recommended sections:
    • Data Availability
    • Code Availability
    • Author Contributions
    • Acknowledgements
    • Funding
    • Competing Interests

Integration

  • Added to validation workflow in Version: ImageMagick 7.1.2-11 Q16-HDRI aarch64 23470 https://imagemagick.org
    Copyright: (C) 1999 ImageMagick Studio LLC
    License: https://imagemagick.org/script/license.php
    Features: Cipher DPC HDRI Modules OpenMP
    Delegates (built-in): bzlib cairo fontconfig freetype heic jng jp2 jpeg jxl lcms lqr ltdl lzma openexr png raw rsvg tiff uhdr webp xml zip zlib zstd
    Compiler: clang (17.0.0)
    Usage: import [options ...] [ file ]

Image Settings:
-adjoin join images into a single multi-image file
-border include window border in the output image
-channel type apply option to select image channels
-colorspace type alternate image colorspace
-comment string annotate image with comment
-compress type type of pixel compression when writing the image
-define format:option
define one or more image format options
-density geometry horizontal and vertical density of the image
-depth value image depth
-descend obtain image by descending window hierarchy
-display server X server to contact
-dispose method layer disposal method
-dither method apply error diffusion to image
-delay value display the next image after pausing
-encipher filename convert plain pixels to cipher pixels
-endian type endianness (MSB or LSB) of the image
-encoding type text encoding type
-filter type use this filter when resizing an image
-format "string" output formatted image characteristics
-frame include window manager frame
-gravity direction which direction to gravitate towards
-identify identify the format and characteristics of the image
-interlace type None, Line, Plane, or Partition
-interpolate method pixel color interpolation method
-label string assign a label to an image
-limit type value Area, Disk, Map, or Memory resource limit
-monitor monitor progress
-page geometry size and location of an image canvas
-pause seconds seconds delay between snapshots
-pointsize value font point size
-quality value JPEG/MIFF/PNG compression level
-quiet suppress all warning messages
-regard-warnings pay attention to warning messages
-repage geometry size and location of an image canvas
-respect-parentheses settings remain in effect until parenthesis boundary
-sampling-factor geometry
horizontal and vertical sampling factor
-scene value image scene number
-screen select image from root window
-seed value seed a new sequence of pseudo-random numbers
-set property value set an image property
-silent operate silently, i.e. don't ring any bells
-snaps value number of screen snapshots
-support factor resize support: > 1.0 is blurry, < 1.0 is sharp
-synchronize synchronize image to storage device
-taint declare the image as modified
-transparent-color color
transparent color
-treedepth value color tree depth
-verbose print detailed information about the image
-virtual-pixel method
Constant, Edge, Mirror, or Tile
-window id select window with this id or name
root selects whole screen

Image Operators:
-annotate geometry text
annotate the image with text
-colors value preferred number of colors in the image
-crop geometry preferred size and location of the cropped image
-encipher filename convert plain pixels to cipher pixels
-extent geometry set the image size
-geometry geometry preferred size or location of the image
-help print program options
-monochrome transform image to black and white
-negate replace every pixel with its complementary color
-quantize colorspace reduce colors in this colorspace
-resize geometry resize the image
-rotate degrees apply Paeth rotation to the image
-strip strip image of all profiles and comments
-thumbnail geometry create a thumbnail of the image
-transparent color make this color transparent within the image
-trim trim image edges
-type type image type

Miscellaneous Options:
-debug events display copious debugging information
-help print program options
-list type print a list of supported option arguments
-log format format of debugging information
-version print version information

By default, 'file' is written in the MIFF image format. To
specify a particular image format, precede the filename with an image
format name and a colon (i.e. ps:image) or specify the image type as
the filename suffix (i.e. image.ps). Specify 'file' as '-' for
standard input or output.

  • Runs as first validator ("Structure" validation)
  • Flexible pattern matching supports multiple section name variations

Behavior

  • Warning Level: Issues warnings, not errors
    • Doesn't block PDF generation
    • Helps authors ensure manuscript completeness
  • Smart Matching: Recognizes variations like:
    • "Acknowledgements" or "Acknowledgments"
    • "Competing Interests" or "Conflicts of Interest"
    • "Data and Code Availability" (combined section)

Testing

Tested on both manuscripts:

  • rxiv-maker manuscript: All sections present, no warnings
  • maicrobe manuscript: Missing Acknowledgements, warning issued correctly

Example output:

⚠️ WARNING (1):

  1. Recommended section 'Acknowledgements' not found
     📄 /path/to/MANUSCRIPT/01_MAIN.md
     💡 Suggestion: Add an '## Acknowledgements' section to thank contributors

Benefits

  • Helps authors catch missing sections before journal submission
  • Aligns with common journal requirements
  • Non-blocking (warnings only) allows flexibility
  • Clear suggestions guide authors to add missing sections

Related

…ions

- New SectionValidator checks for standard sections required by most journals
- Warns (doesn't error) if sections are missing:
  • Data Availability
  • Code Availability
  • Author Contributions
  • Acknowledgements
  • Funding
  • Competing Interests
- Helps authors ensure manuscripts are complete before submission
- Integrated into 'rxiv validate' command as first validator in chain
- Flexible pattern matching supports multiple section naming variations
Copilot AI review requested due to automatic review settings December 24, 2025 12:03
@github-actions
Copy link

Code Review: Section Validation Warnings

Thank you for this contribution! This is a valuable feature that aligns well with journal requirements.

Strengths

  1. Well-designed validator pattern following BaseValidator architecture
  2. Appropriate use of ValidationLevel.WARNING instead of ERROR
  3. Flexible pattern matching for section name variations
  4. Clear, actionable suggestion messages
  5. Good metadata tracking
  6. Clean code structure

CRITICAL: Missing Test Coverage (Severity: High)

The PR adds a new validator but includes no unit tests. According to CLAUDE.md, the project aims for 80% test coverage.

Required: tests/unit/test_section_validator.py should include:

  • Test with all sections present
  • Test with missing sections
  • Test pattern matching variations
  • Test combined sections
  • Test error handling for missing/unreadable files
  • Test metadata tracking
  • Case-insensitive matching verification

See tests/unit/test_doi_validator.py for validator test patterns.

Enhancement: Consider Using ErrorCode Enum (Severity: Medium)

Other validators use ErrorCode from core.error_codes. Your code uses the deprecated _create_error() method. Consider defining section-specific error codes for consistency (e.g., MISSING_SECTION).

See figure_validator.py:77 and citation_validator.py:74 for examples.

Pattern Matching: Edge Cases (Severity: Low)

Question: Is it intentional to only match h2 headers and not h3? This might be worth documenting.

Your regex handles multiple spaces and case variations correctly.

Security Review: PASS

  • Uses safe file reading via _read_file_safely()
  • No user input in regex
  • No shell commands or file mutations
  • Appropriate error handling

Before Merging (Required)

  1. Add comprehensive unit tests
  2. Run: nox -s test
  3. Run: nox -s lint
  4. Run: mypy src/rxiv_maker/validators/section_validator.py

Overall Assessment: Conditional Approval

The implementation is solid and well-designed, but missing test coverage is a blocker. Add tests, then re-request review.

Great work on the feature design! The validator pattern is well-executed and will be genuinely useful for authors.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new SectionValidator that checks manuscripts for six standard sections commonly required by journals (Data Availability, Code Availability, Author Contributions, Acknowledgements, Funding, and Competing Interests). The validator issues non-blocking warnings when recommended sections are missing, helping authors ensure manuscript completeness before journal submission.

Key Changes:

  • New section validation system with flexible pattern matching supporting multiple section name variations
  • Integration as the first validator in the validation workflow ("Structure" validation)
  • Smart handling of combined sections (e.g., "Data and Code Availability")

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
src/rxiv_maker/validators/section_validator.py New validator class that checks for six recommended manuscript sections using regex patterns and generates warnings for missing sections
src/rxiv_maker/engines/operations/validate.py Integrates SectionValidator into the validation workflow as the first validator, with proper imports in both relative and absolute import blocks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"author contributions": {
"patterns": [
r"##\s+Author\s+Contributions?",
r"##\s+Contributions?",
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern r"##\s+Contributions?" is too broad and could match any section titled "## Contributions" or "## Contribution" without the word "Author". Consider making this pattern more specific to avoid false positives, or remove it if only "Author Contributions" should be matched.

Suggested change
r"##\s+Contributions?",

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +139
class SectionValidator(BaseValidator):
"""Validator for checking recommended manuscript sections."""

# Required sections that should generate warnings if missing
RECOMMENDED_SECTIONS = {
"data availability": {
"patterns": [
r"##\s+Data\s+Availability",
r"##\s+Data\s+and\s+Code\s+Availability",
],
"suggestion": "Add a '## Data Availability' section describing where your data can be accessed",
},
"code availability": {
"patterns": [
r"##\s+Code\s+Availability",
r"##\s+Data\s+and\s+Code\s+Availability",
r"##\s+Software\s+Availability",
],
"suggestion": "Add a '## Code Availability' section with links to code repositories",
},
"author contributions": {
"patterns": [
r"##\s+Author\s+Contributions?",
r"##\s+Contributions?",
],
"suggestion": "Add an '## Author Contributions' section describing each author's role",
},
"acknowledgements": {
"patterns": [
r"##\s+Acknowledgements?",
r"##\s+Acknowledgments?",
],
"suggestion": "Add an '## Acknowledgements' section to thank contributors",
},
"funding": {
"patterns": [
r"##\s+Funding",
r"##\s+Financial\s+Support",
r"##\s+Grant\s+Information",
],
"suggestion": "Add a '## Funding' section declaring funding sources or stating 'This research received no external funding.'",
},
"competing interests": {
"patterns": [
r"##\s+Competing\s+Interests?",
r"##\s+Conflicts?\s+of\s+Interest",
r"##\s+Disclosure",
],
"suggestion": "Add a '## Competing Interests' section, even if just to state 'The authors declare no competing interests.'",
},
}

def validate(self) -> ValidationResult:
"""Check for recommended manuscript sections.

Returns:
ValidationResult with warnings for missing sections
"""
errors: list[ValidationError] = []
metadata = {
"missing_sections": [],
"found_sections": [],
}

# Find manuscript file
try:
manuscript_file = find_manuscript_md(self.manuscript_path)
except (FileNotFoundError, ValueError) as e:
return ValidationResult(
validator_name=self.name,
errors=[
self._create_error(
level=ValidationLevel.ERROR,
message=f"Could not find manuscript file: {e}",
)
],
metadata=metadata,
)

# Read manuscript content
content = self._read_file_safely(manuscript_file)
if content is None:
return ValidationResult(
validator_name=self.name,
errors=[
self._create_error(
level=ValidationLevel.ERROR,
message=f"Could not read manuscript file: {manuscript_file}",
)
],
metadata=metadata,
)

# Check for each recommended section
for section_name, section_info in self.RECOMMENDED_SECTIONS.items():
found = False
for pattern in section_info["patterns"]:
if re.search(pattern, content, re.IGNORECASE):
found = True
metadata["found_sections"].append(section_name)
break

if not found:
metadata["missing_sections"].append(section_name)
errors.append(
self._create_error(
level=ValidationLevel.WARNING,
message=f"Recommended section '{section_name.title()}' not found",
file_path=manuscript_file,
suggestion=section_info["suggestion"],
)
)

return ValidationResult(
validator_name=self.name,
errors=errors,
metadata=metadata,
)
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new SectionValidator lacks test coverage. The repository has comprehensive test coverage for other validators (see tests/unit/test_validators.py), and this new validator should have corresponding tests to verify pattern matching, warning generation, and edge cases like missing files or combined sections.

Copilot uses AI. Check for mistakes.

# Find manuscript file
try:
manuscript_file = find_manuscript_md(self.manuscript_path)
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Path object returned by find_manuscript_md should be converted to a string before passing to file_path parameter. The ValidationError.file_path is typed as str | None, but find_manuscript_md returns a Path object. For consistency with other validators and proper type handling, convert the Path to string using str().

Copilot uses AI. Check for mistakes.
self._create_error(
level=ValidationLevel.WARNING,
message=f"Recommended section '{section_name.title()}' not found",
file_path=manuscript_file,
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Path object returned by find_manuscript_md should be converted to a string before passing to file_path parameter. The ValidationError.file_path is typed as str | None, and should match the expected type. Convert using str(manuscript_file).

Suggested change
file_path=manuscript_file,
file_path=str(manuscript_file),

Copilot uses AI. Check for mistakes.
class SectionValidator(BaseValidator):
"""Validator for checking recommended manuscript sections."""

# Required sections that should generate warnings if missing
Copy link

Copilot AI Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment states "Required sections" but the constant is named "RECOMMENDED_SECTIONS". The comment should say "Recommended sections" to match the variable name and the actual behavior (warnings, not errors).

Suggested change
# Required sections that should generate warnings if missing
# Recommended sections that should generate warnings if missing

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Code Review: Section Validation Warnings

Thank you for this contribution! This is a well-motivated feature that adds helpful guidance for authors. Here's my detailed review:


Strengths

1. Good Design Decisions

  • Warning-level only: Correctly uses ValidationLevel.WARNING instead of blocking errors
  • Flexible pattern matching: Recognizes multiple variations ("Acknowledgements" vs "Acknowledgments", combined sections like "Data and Code Availability")
  • Clear suggestions: Each missing section includes helpful guidance on what to add
  • Proper integration: Added as first validator ("Structure") in the validation chain

2. Code Quality

  • Clear docstrings and comments
  • Follows project patterns (extends BaseValidator, uses ValidationResult)
  • Proper error handling with try/except for file operations
  • Good metadata tracking (missing_sections, found_sections)

3. Alignment with Project Standards

  • Follows PEP 8 formatting
  • Uses type hints consistently
  • Proper imports with both relative and absolute fallbacks

⚠️ Issues & Recommendations

1. CRITICAL: Missing Test Coverage

Issue: No unit tests for SectionValidator

According to CLAUDE.md:

Maintain comprehensive test coverage (aim for >80%)

Recommendation: Add comprehensive tests in tests/unit/test_section_validator.py:

import tempfile
import unittest
from pathlib import Path

from rxiv_maker.validators.section_validator import SectionValidator
from rxiv_maker.validators.base_validator import ValidationLevel


class TestSectionValidator(unittest.TestCase):
    def setUp(self):
        self.temp_dir = tempfile.mkdtemp()
        
    def test_all_sections_present(self):
        """Test manuscript with all recommended sections."""
        # Create manuscript with all sections
        manuscript = """
## Data Availability
Data is available at...

## Code Availability  
Code is available at...

## Author Contributions
Author 1 did X...

## Acknowledgements
We thank...

## Funding
This work was funded by...

## Competing Interests
The authors declare no competing interests.
"""
        # Write test file and validate
        # Assert no warnings
        
    def test_missing_sections(self):
        """Test manuscript missing some sections."""
        # Create minimal manuscript
        # Assert correct warnings for missing sections
        
    def test_combined_data_code_section(self):
        """Test combined 'Data and Code Availability' section."""
        # Should satisfy both data and code requirements
        
    def test_section_name_variations(self):
        """Test recognition of different section name formats."""
        # Test "Acknowledgments" vs "Acknowledgements"
        # Test "Conflicts of Interest" vs "Competing Interests"
        
    def test_invalid_manuscript_path(self):
        """Test error handling for missing manuscript."""
        # Assert proper error-level validation result

Test commands:

pytest tests/unit/test_section_validator.py -v
nox -s test-unit

2. Should Use Centralized Error Codes

Issue: Uses deprecated _create_error() method instead of modern error code system

Current code (src/rxiv_maker/validators/section_validator.py:126-131):

errors.append(
    self._create_error(
        level=ValidationLevel.WARNING,
        message=f"Recommended section '{section_name.title()}' not found",
        file_path=manuscript_file,
        suggestion=section_info["suggestion"],
    )
)

From base_validator.py:133-135:

Note: This method is deprecated in favor of using create_validation_error
with structured ErrorCode enums for better consistency.

Recommendation:

  1. Add new error codes to src/rxiv_maker/core/error_codes.py:
# In ErrorCode enum (around line 44)
MISSING_DATA_AVAILABILITY = "E1016"
MISSING_CODE_AVAILABILITY = "E1017"
MISSING_AUTHOR_CONTRIBUTIONS = "E1018"
MISSING_ACKNOWLEDGEMENTS = "E1019"
MISSING_FUNDING = "E1020"
MISSING_COMPETING_INTERESTS = "E1021"
  1. Update validator to use create_validation_error:
from ..core.error_codes import ErrorCode, create_validation_error

# Map sections to error codes
ERROR_CODE_MAP = {
    "data availability": ErrorCode.MISSING_DATA_AVAILABILITY,
    "code availability": ErrorCode.MISSING_CODE_AVAILABILITY,
    # ... etc
}

# In validate():
errors.append(
    create_validation_error(
        ERROR_CODE_MAP[section_name],
        f"Recommended section '{section_name.title()}' not found",
        file_path=str(manuscript_file),
        suggestion=section_info["suggestion"],
    )
)

This provides:

  • Consistent error tracking
  • Machine-readable error codes
  • Better integration with other validators

3. Import Inconsistency

Issue: Missing from fallback imports (lines 28-50 in validate.py)

The PR adds SectionValidator to the first import block but also needs it in the fallback import block for consistency.

Already fixed in PR: ✅ Both import blocks include SectionValidator


4. Case Sensitivity in Pattern Matching

Current code uses re.IGNORECASE (line 118):

if re.search(pattern, content, re.IGNORECASE):

Analysis: This is good! Markdown headers can be written as:

  • ## Acknowledgements (capital A)
  • ## acknowledgements (lowercase a - uncommon but valid)

The case-insensitive matching is appropriate here.


5. Minor: Documentation Enhancement

Suggestion: Add example output to docstring showing what warnings look like:

"""Validator for checking recommended manuscript sections.

Example warning output:
    ⚠️ WARNING (1):
    
      1. Recommended section 'Acknowledgements' not found
         📄 /path/to/MANUSCRIPT/01_MAIN.md
         💡 Suggestion: Add an '## Acknowledgements' section to thank contributors
"""

🔒 Security Review

No security concerns identified

  • Uses safe re.search() with static patterns (no user-controlled regex)
  • File operations use safe _read_file_safely() helper
  • No shell commands or external process execution
  • No injection vectors

🚀 Performance Considerations

Efficient implementation

  • Single file read operation
  • Regex compilation done once at class level
  • O(n) complexity where n = content length
  • No blocking operations or network calls

Potential optimization (not critical): Pre-compile all patterns:

# At class level
COMPILED_PATTERNS = {
    section: [re.compile(p, re.IGNORECASE) for p in info["patterns"]]
    for section, info in RECOMMENDED_SECTIONS.items()
}

📋 Testing Checklist

Before merging, ensure:

  • Unit tests added for SectionValidator
  • Test coverage >80% for new code
  • Tests pass: nox -s test-unit
  • Linting passes: nox -s lint
  • Type checking passes: nox -s type-check
  • Integration test with full validation: rxiv validate
  • Tested on both example manuscripts (rxiv-maker, maicrobe)

📝 Suggested Changes Summary

Must Fix (Blocking):

  1. ✅ Add comprehensive unit tests

Should Fix (Recommended):

  1. 🔄 Migrate to centralized ErrorCode system
  2. 🔄 Add example output to docstring

Nice to Have:

  1. Pre-compile regex patterns for performance

🎯 Overall Assessment

Quality: Good implementation with solid foundation
Readiness: Needs test coverage before merge
Impact: Positive - helps authors create complete manuscripts

This is a valuable feature that aligns well with the project's goals. The main blocker is the missing test coverage, which is critical per the project's testing standards.

Once tests are added and the error code migration is done, this will be ready to merge! 🚀


Action Items for Author:

  1. Add unit tests in tests/unit/test_section_validator.py
  2. Consider migrating to ErrorCode system for consistency
  3. Run full test suite before requesting re-review

Let me know if you need help with any of these items!

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Code Review - PR #275: Section Validation Warnings

Thank you for this feature addition! This is a valuable enhancement that helps authors ensure manuscript completeness. Here's my detailed review:

✅ Strengths

  1. Well-structured implementation: The validator follows the existing BaseValidator pattern consistently
  2. Clear documentation: Good module-level docstring explaining purpose and behavior
  3. Flexible pattern matching: Smart use of regex alternatives to catch section naming variations
  4. Appropriate severity: Using WARNING level is correct - this shouldn't block PDF generation
  5. Helpful suggestions: Each missing section includes actionable guidance for authors
  6. Clean integration: Minimal changes to existing code, proper placement as first validator

🔍 Code Quality Observations

Positive:

  • Type hints used appropriately (list[ValidationError], dict[str, Any])
  • Follows existing validation patterns from CitationValidator, ReferenceValidator, etc.
  • Good use of dataclass metadata for tracking found/missing sections
  • Error handling for file reading issues

Minor Suggestions:

  1. Regex efficiency (line 117-120):
    The current approach iterates through all patterns for each section. Consider pre-compiling regex patterns in a class variable for better performance:

    COMPILED_PATTERNS = {
        section: [re.compile(p, re.IGNORECASE) for p in info['patterns']]
        for section, info in RECOMMENDED_SECTIONS.items()
    }
  2. Case sensitivity in patterns (lines 29-69):
    Your patterns use \s+ which requires at least one space. Section headers like ##Data Availability (no space after ##) won't match. Consider:

    r"##\s*Data\s+Availability"  # \s* allows zero or more spaces
  3. Pattern robustness:
    Current patterns might miss variations like:

    • ## Data & Code Availability
    • ## Author Contribution (singular)
    • ## Acknowledgment (singular)

    Consider adding more patterns or using more flexible regex.

🧪 Test Coverage

Critical Issue: No unit tests were added for this new validator.

According to CLAUDE.md, the project aims for >80% test coverage. I recommend adding tests covering:

  1. Happy path: Manuscript with all sections present
  2. Missing sections: Test each section independently
  3. Pattern variations: Test alternative section names are recognized
  4. Edge cases:
    • Missing manuscript file
    • Empty manuscript file
    • Combined sections ("Data and Code Availability")
  5. Case sensitivity: Ensure case-insensitive matching works

Suggested test structure (following tests/unit/test_validators.py pattern):

@pytest.mark.validation
class TestSectionValidator(unittest.TestCase):
    def setUp(self):
        self.temp_dir = tempfile.mkdtemp()
        self.manuscript_dir = os.path.join(self.temp_dir, "MANUSCRIPT")
        os.makedirs(self.manuscript_dir)
    
    def test_all_sections_present(self):
        # Create manuscript with all sections
        # Assert no warnings
        
    def test_missing_data_availability(self):
        # Create manuscript without data availability
        # Assert warning generated with correct message
        
    def test_alternative_section_names(self):
        # Test "Acknowledgments" vs "Acknowledgements"
        # Test "Conflicts of Interest" vs "Competing Interests"
        
    def test_combined_sections(self):
        # Test "Data and Code Availability" covers both

🔒 Security Considerations

No security concerns identified:

  • No user input directly processed (reads local files only)
  • No command execution
  • No SQL queries
  • File reading uses safe helpers from BaseValidator

⚡ Performance Considerations

  1. File reading: Only reads 01_MAIN.md once - efficient ✅

  2. Regex matching: 6 sections × ~2-3 patterns each = ~15 regex searches per validation

    • For typical manuscript sizes (<100KB), this is negligible
    • Pre-compiling patterns would be a minor optimization
  3. Validation order: Placed as first validator ("Structure") - makes sense logically and doesn't impact performance

📋 Additional Recommendations

  1. Add tests (high priority): Essential for maintainability and preventing regressions
  2. Consider configuration: Future enhancement could allow users to customize which sections are checked via .rxiv-maker/config
  3. Metadata usage: The metadata dict with found/missing sections is great for future features (e.g., JSON output, stats dashboard)
  4. Documentation: Consider updating website documentation (rxiv-maker.henriqueslab.org) to mention this feature

🎯 Verdict

Approve with minor recommendations:

  • The core implementation is solid and follows project conventions
  • Blocking concern: Missing unit tests should be added before merge
  • The regex pattern improvements are optional but recommended
  • This feature aligns well with rxiv-maker's goal of helping authors produce high-quality manuscripts

📝 Testing Commands

Before merging, please run:

# After adding tests:
nox -s test-unit              # Run unit tests
nox -s test-fast              # Quick validation
nox -s lint                   # Check code style

# Manual testing:
rxiv validate MANUSCRIPT/     # Test on actual manuscripts

Great work overall! This will be a helpful feature for rxiv-maker users. 🚀

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Code Review: Section Validator

Thank you for this contribution! This is a well-designed feature that adds helpful validation for recommended manuscript sections. Here's my detailed review:

Strengths

  1. Clean Architecture: Follows the existing validator pattern perfectly, extending BaseValidator consistently with other validators
  2. Appropriate Warning Level: Using ValidationLevel.WARNING is exactly right - non-blocking but informative
  3. Flexible Pattern Matching: Smart use of multiple regex patterns to catch variations (e.g., "Acknowledgements" vs "Acknowledgments")
  4. Clear User Guidance: Helpful suggestions for each missing section guide authors on what to add
  5. Good Metadata Tracking: Tracks both missing and found sections for potential reporting/analytics

🔍 Issues and Concerns

CRITICAL: Missing Test Coverage

The PR adds 139 lines of new code but includes zero tests. According to CLAUDE.md:

Maintain comprehensive test coverage (aim for >80%)

Required tests (should add tests/unit/test_section_validator.py):

  • ✅ All 6 sections present → no warnings
  • ✅ All 6 sections missing → 6 warnings
  • ✅ Pattern variations ("Acknowledgements" vs "Acknowledgments")
  • ✅ Combined sections ("Data and Code Availability")
  • ✅ Case insensitivity
  • ✅ File not found error handling
  • ✅ File read error handling
  • ✅ Metadata correctness

Example test structure:

import unittest
import tempfile
import os
from pathlib import Path

class TestSectionValidator(unittest.TestCase):
    def setUp(self):
        self.temp_dir = tempfile.mkdtemp()
        
    def test_all_sections_present(self):
        # Create manuscript with all sections
        content = """
# Title
## Data Availability
## Code Availability
## Author Contributions
## Acknowledgements
## Funding
## Competing Interests
        """
        manuscript_path = Path(self.temp_dir)
        (manuscript_path / "01_MAIN.md").write_text(content)
        
        validator = SectionValidator(str(manuscript_path))
        result = validator.validate()
        
        self.assertEqual(len(result.errors), 0)
        self.assertEqual(len(result.metadata["found_sections"]), 6)

Code Quality Issues

  1. Type Hints: Line 5 uses implicit Any - should be explicit:

    # Current (line 5)
    from typing import Any  # Never used!
    
    # Should add proper type hints throughout
    from typing import Any, Dict, List, Set
  2. Regex Performance: Patterns are compiled in RECOMMENDED_SECTIONS dict but could be pre-compiled as class constants for better performance:

    # Current - compiles on each access
    "patterns": [r"##\s+Data\s+Availability"]
    
    # Better - pre-compiled
    import re
    
    class SectionValidator(BaseValidator):
        DATA_AVAILABILITY_PATTERN = re.compile(r"##\s+Data\s+Availability", re.IGNORECASE)
        
        RECOMMENDED_SECTIONS = {
            "data availability": {
                "patterns": [DATA_AVAILABILITY_PATTERN, ...],
                ...
            }
        }
  3. Magic Numbers: The check runs on lines 114-131 but the logic could be more explicit with early returns

Security Considerations

No major security issues, but worth noting:

  • Uses _read_file_safely() which is good for file operations
  • Regex patterns are static (no user input) - safe from ReDoS attacks
  • No file writes or command execution

Integration Concerns

  1. Validator Ordering: Added as first validator ("Structure") in validate.py:97. This is good UX (structure before citations), but should be documented in commit message

  2. Performance: Reading full manuscript content with re.search() for each pattern. For large manuscripts (>10MB), consider:

    • Single regex with alternation
    • Early termination after finding all sections
  3. False Positives: Pattern r"##\s+Data\s+Availability" might miss:

    • ## Data availability (lowercase - though IGNORECASE handles this)
    • ##Data Availability (no space after ##)
    • ### Data Availability (level 3 heading with extra space)

    Consider more flexible pattern: r"###+\s*Data\s+Availability"

Documentation Gaps

  1. Module docstring is excellent, but method docstrings could be more detailed
  2. No explanation of why these 6 sections specifically (journal requirements should be referenced)
  3. Missing example in docstring showing what warnings look like

📋 Recommendations

Before merging:

  1. Add comprehensive unit tests (blocking requirement per CLAUDE.md)
  2. Run test suite: nox -s test-unit
  3. Check linting: nox -s lint
  4. Type checking: mypy src/rxiv_maker/validators/section_validator.py

Nice-to-have improvements:

  1. Make section list configurable via YAML config (recommended_sections: [data_availability, ...])
  2. Add --strict-sections CLI flag to promote warnings to errors for journal submission
  3. Link to journal requirements docs in suggestions
  4. Add this validator to documentation/changelog

🎯 Summary

Overall: Good implementation with critical testing gap

The code quality is solid and follows project conventions well. The feature is valuable for users. However, the complete lack of tests is a blocker for merging according to project standards.

Recommendation: Request changes for test coverage before approval.


Files to check:

  • src/rxiv_maker/validators/section_validator.py:118 - Flexible pattern matching
  • src/rxiv_maker/engines/operations/validate.py:99 - Integration as first validator

Let me know if you'd like help writing the tests!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant