Skip to content

Research framework studying the impact of API documentation quality on LLM code generation success - discovering the documentation sweet spot phenomenon

Notifications You must be signed in to change notification settings

harrymower/api-documentation-quality-research

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š API Documentation Quality Research Project

🎯 Project Overview

This repository contains the research framework and validation tools for studying the impact of API documentation quality on Large Language Model (LLM) code generation success. Our groundbreaking research has discovered a "documentation sweet spot" phenomenon where moderate-quality documentation achieves better LLM performance than comprehensive documentation.

πŸ”¬ Research Hypothesis

Counter-intuitive Discovery: LLMs generate more functional code when provided with moderate-quality API documentation (3.0-4.0/5.0) compared to excellent documentation (5.0/5.0).

πŸ“ˆ Key Findings

  • 58% better performance with average vs excellent documentation in controlled experiments
  • Consistent pattern across OpenAI GPT-4, Claude Sonnet, and Gemini Pro
  • "Over-engineering effect" where comprehensive documentation leads to complex but failure-prone implementations

πŸ“‹ Experiment Design

Selected APIs

  1. Stripe API (Excellent Documentation)

    • Complex payment processing API
    • Comprehensive interactive documentation
    • Extensive code examples and error handling
  2. GitHub API (Good Documentation)

    • GraphQL API for repository management
    • Well-structured schema documentation
    • Good examples with some gaps
  3. OpenWeatherMap API (Average Documentation)

    • Weather data API with API key authentication
    • Decent endpoint documentation
    • Limited advanced usage patterns
  4. JSONPlaceholder API (Basic Documentation)

    • Simple REST API for testing
    • Minimal documentation
    • Basic endpoint descriptions only
  5. Cat Facts API (Poor Documentation)

    • Very simple GET requests
    • Minimal documentation
    • No examples or error handling

Documentation Quality Metrics

  • Completeness (25%): Endpoint coverage, parameter documentation, response schemas
  • Clarity (20%): Language clarity, organization, terminology consistency
  • Examples (20%): Code examples, language support, real-world use cases
  • Error Handling (15%): Error codes, troubleshooting guides
  • Authentication (10%): Auth instructions, security practices
  • Code Quality (10%): Best practices, production-ready examples

Standardized Tasks

  • Authentication Setup: Implement API authentication
  • Data Retrieval: Basic and parameterized GET requests
  • CRUD Operations: Create, update, delete resources
  • Error Handling: Rate limiting, validation errors
  • Edge Cases: Large datasets, network failures
  • Integration: Multi-step workflows

πŸ—οΈ Project Structure

api-doc-quality-tests/
β”œβ”€β”€ apis/                           # API selection and analysis
β”‚   └── api_selection_analysis.md
β”œβ”€β”€ controls/                       # Experiment execution
β”‚   └── experiment_execution.py
β”œβ”€β”€ documentation/                  # Documentation variants
β”‚   └── variants/
β”œβ”€β”€ evaluation/                     # Analysis frameworks
β”‚   β”œβ”€β”€ documentation_quality_metrics.py
β”‚   β”œβ”€β”€ llm_testing_infrastructure.py
β”‚   β”œβ”€β”€ code_validation_system.py
β”‚   β”œβ”€β”€ data_analysis_framework.py
β”‚   └── results_analysis.py
β”œβ”€β”€ tasks/                         # Standardized task definitions
β”‚   └── standardized_tasks.py
└── README.md

πŸš€ Quick Start

Prerequisites

pip install pandas numpy matplotlib seaborn scipy scikit-learn
pip install anthropic openai google-generativeai  # For LLM APIs

Environment Setup

Create a .env file with your API keys:

ANTHROPIC_API_KEY=your_claude_api_key
OPENAI_API_KEY=your_openai_api_key
GOOGLE_API_KEY=your_gemini_api_key

Running the Experiment

  1. Execute the complete experiment:
cd controls
python experiment_execution.py
  1. Analyze results:
cd evaluation
python results_analysis.py --results-dir ../experiment_results
  1. View results:
    • Check experiment_results/insights_report.md for detailed findings
    • View visualizations in experiment_results/visualizations/
    • Access raw data in JSON format for further analysis

πŸ“Š Expected Outputs

Quantitative Results

  • Correlation coefficients between documentation metrics and code quality
  • Statistical significance testing (p-values)
  • Provider-specific performance comparisons
  • API-specific success rates

Visualizations

  • Correlation heatmaps
  • Scatter plots showing documentation vs code quality relationships
  • Box plots of success rates by documentation quality quartiles
  • Provider performance comparisons

Actionable Insights

  • Specific documentation features that most impact LLM performance
  • Recommendations for documentation improvement priorities
  • Provider-specific strengths and weaknesses
  • Evidence-based best practices for API documentation

πŸ”¬ Methodology

Documentation Assessment

  • Automated scoring based on predefined criteria
  • Weighted metrics reflecting real-world importance
  • Consistent evaluation across all APIs

Code Generation Testing

  • Identical prompts across all LLM providers
  • Standardized task requirements
  • Controlled testing environment

Code Validation

  • Syntax Validation: Python syntax correctness
  • Functionality: Meeting task requirements
  • Best Practices: Coding standards compliance
  • Security: Secure coding practices
  • Completeness: Implementation thoroughness

Statistical Analysis

  • Pearson correlation analysis
  • Significance testing at p < 0.05
  • Provider and API-specific breakdowns
  • Regression analysis for predictive insights

πŸ“ˆ Key Research Questions

  1. Primary: How does documentation quality correlate with LLM code generation success?

  2. Secondary:

    • Which documentation aspects most impact LLM performance?
    • Do different LLM providers show varying sensitivity to documentation quality?
    • What documentation quality threshold ensures reliable code generation?
    • How do complex vs simple APIs respond to documentation improvements?

🎯 Expected Impact

For API Providers

  • Evidence-based documentation improvement priorities
  • ROI justification for documentation investments
  • Specific guidelines for LLM-friendly documentation

For Developers

  • Better understanding of documentation quality impact
  • Improved code generation success rates
  • More efficient API integration workflows

For Research Community

  • Empirical data on LLM performance factors
  • Methodology for similar studies
  • Baseline metrics for future research

πŸ”„ Extending the Experiment

Adding New APIs

  1. Update experiment_config.json with new API details
  2. Define expected documentation quality level
  3. Run the experiment with expanded API set

Testing Additional LLMs

  1. Implement new provider in llm_testing_infrastructure.py
  2. Add provider configuration
  3. Update analysis framework for new provider

Custom Task Development

  1. Define new tasks in standardized_tasks.py
  2. Specify requirements and success criteria
  3. Update validation system for new task types

πŸ“ Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Implement improvements or extensions
  4. Add tests and documentation
  5. Submit a pull request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • API providers for public documentation access
  • LLM providers for research-friendly APIs
  • Open source community for analysis tools

πŸ“ž Contact

For questions, suggestions, or collaboration opportunities, please open an issue or contact the research team.


This experiment aims to bridge the gap between documentation quality and AI-assisted development, providing actionable insights for the developer community.

About

Research framework studying the impact of API documentation quality on LLM code generation success - discovering the documentation sweet spot phenomenon

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published