Skip to main content
Background Image
  1. Blog/

From README to Comprehensive Docs: Transforming ripgrep's Documentation with AI Automation

Author
Glen Baker
Building tech startups and open source tooling
Table of Contents

From Two Files and Code to Comprehensive Documentation
#

ripgrep is one of the most popular command-line search tools, with over 57,000 stars on GitHub. Despite its popularity and rich feature set, the documentation consisted of just two files: a feature-focused README and a tutorial-style GUIDE. While these files were well-written, they left gaps:

  • No structured learning path from basics to advanced features
  • Limited visual aids to explain complex concepts
  • Features scattered across different sections
  • No dedicated troubleshooting or reference sections

Using an evolved version of my MkDocs automation workflow (originally built for Prodigy’s documentation), I generated 50+ pages of enhanced documentation, now live at https://iepathos.github.io/ripgrep.

This post documents how the workflow evolved to handle:

  1. Intelligent page splitting - Breaking monolithic chapters into focused subpages
  2. Automated visual enhancement - Adding diagrams, admonitions, and annotations where they help
  3. Mermaid diagram validation - Ensuring all generated diagrams render correctly

The workflow went beyond mere drift detection—it transformed minimal docs into a comprehensive knowledge base.

What Changed: MkDocs vs mdBook
#

The original workflow used mdBook, a simple static site generator popular in the Rust ecosystem. MkDocs Material offers significantly more features:

FeaturemdBookMkDocs Material
Mermaid DiagramsPlugin requiredNative support
AdmonitionsLimited12+ styled types
Code AnnotationsNoYes (numbered callouts)
Tabbed ContentNoYes
Navigation TabsNoYes
SearchBasicAdvanced with highlighting
Theme CustomizationLimitedExtensive

The Enhanced Workflow Architecture
#

The ripgrep workflow introduced critical improvements over previous iterations. Here’s the complete flow:

name: prodigy-mkdocs-drift-detection
mode: mapreduce

setup:
  # 1. Analyze codebase features
  - claude: "/prodigy-analyze-features-for-mkdocs"

  # 2. Detect gaps and create stub pages
  - claude: "/prodigy-detect-mkdocs-gaps"

  # 3. Analyze page sizes and structural complexity (NEW!)
  - claude: "/prodigy-analyze-mkdocs-structure"

  # 4. Automatically split oversized pages (NEW!)
  - claude: "/prodigy-split-oversized-mkdocs-pages"

  # 5. Auto-discover ALL markdown files (including newly created subpages)
  - shell: |
      find $DOCS_DIR -name "*.md" -type f | jq -R -s '
        split("\n") | map(select(length > 0)) | map({
          id: (split("/")[-1] | split(".md")[0]),
          title: ...,
          file: .,
          type: "auto-discovered"
        })
      ' > $ANALYSIS_DIR/flattened-items.json

map:
  agent_template:
    # Step 1: Analyze page for drift (subsection-aware)
    - claude: "/prodigy-analyze-mkdocs-drift"

    # Step 2: Fix detected drift with validation
    - claude: "/prodigy-fix-mkdocs-drift"
      validate:
        claude: "/prodigy-validate-mkdocs-page"
        threshold: 100  # Must meet 100% quality standards
        on_incomplete:
          claude: "/prodigy-complete-mkdocs-fix"
          max_attempts: 3

    # Step 3: Enhance with visual features
    - claude: "/prodigy-enhance-mkdocs-page"

reduce:
  # Validate with strict mode
  - shell: "mkdocs build --strict"
    on_failure:
      claude: "/prodigy-fix-mkdocs-build-errors"

  # Check structure and feature consistency
  - claude: "/prodigy-validate-mkdocs-structure"
  - claude: "/prodigy-validate-feature-consistency"

  # Validate Mermaid diagrams (NEW!)
  - shell: "cd .prodigy/scripts && npm install --silent && node validate-mermaid.js ../../$DOCS_DIR"
    on_failure:
      claude: "/prodigy-fix-mermaid-diagrams --validation-output '${shell.stderr}'"

Four Major Improvements Over Book-Docs Workflow
#

The ripgrep workflow builds on the subsection-aware foundations and adds four critical enhancements:

1. Intelligent Page Splitting
#

The breakthrough feature for transforming ripgrep’s documentation: automatic page splitting based on structural analysis.

The original GUIDE.md was over 1,400 lines—a monolithic wall of text covering everything from basics to advanced features. The workflow now analyzes page structure before processing and automatically splits oversized pages into focused subpages:

# Step 3: Analyze page sizes and structural complexity
- claude: "/prodigy-analyze-mkdocs-structure --project $PROJECT_NAME --docs-dir $DOCS_DIR --pages $CHAPTERS_FILE --output $ANALYSIS_DIR/structure-report.json"

# Step 4: Automatically split all oversized pages into subpages
# This runs BEFORE map phase so agents process optimally-sized pages
- claude: "/prodigy-split-oversized-mkdocs-pages --project $PROJECT_NAME --pages $CHAPTERS_FILE --docs-dir $DOCS_DIR --structure-report $ANALYSIS_DIR/structure-report.json"

For ripgrep, this transformed:

  • GUIDE.md (1,400 lines) → basics/ directory with 8 focused pages (pattern-matching.md, case-sensitivity.md, boundaries.md, etc.)
  • Binary Data section (350 lines) → binary-data/ directory with 5 specialized pages (detection.md, modes.md, flags.md, etc.)
  • Troubleshootingtroubleshooting/ directory with 6 topic-specific pages

This happens in the setup phase, before map processing begins. Benefits:

  • Context-aware agents: Each agent works on focused topics, not overwhelming monoliths
  • Better enhancement: Visual features are applied to coherent topics, not scattered sections
  • Improved navigation: Readers find specific information faster
  • Maintainability: Smaller pages are easier to update and keep accurate

2. Exhaustive Page Discovery
#

After page splitting creates new files, we need to ensure every page gets processed. The workflow adds a direct filesystem scan:

find $DOCS_DIR -name "*.md" -type f | jq -R -s '
  split("\n") |
  map(select(length > 0)) |
  map({
    id: (split("/")[-1] | split(".md")[0]),
    title: (split("/")[-1] | split(".md")[0] | gsub("-|_"; " ")),
    file: .,
    type: "auto-discovered"
  })
' > $ANALYSIS_DIR/flattened-items.json

This is critical because page splitting creates files dynamically. The filesystem becomes the source of truth:

  • Comprehensive coverage: Every .md file is processed, including newly split pages
  • No manual tracking: Don’t maintain a curated list that gets out of sync
  • Orphaned pages caught: Pages that exist but aren’t properly linked are discovered
  • Deterministic: Same files every time, no variation between runs

3. Visual Enhancement Per Page
#

After drift is detected and fixed, each page gets enhanced with MkDocs Material features through the /prodigy-enhance-mkdocs-page command:

# Step 3: Enhance page with visual features (diagrams, admonitions, annotations)
# This runs per-page so Claude has full context about what the page discusses
- claude: "/prodigy-enhance-mkdocs-page --project $PROJECT_NAME --json '${item}' --auto-fix true"
  commit_required: true

This command analyzes each page’s content and adds:

  • Mermaid diagrams for architecture and workflow visualization
  • Admonitions (warnings, tips, notes) for important callouts
  • Code annotations with numbered inline explanations
  • Tabbed content for alternative approaches

The enhancement happens in the map phase (per-page), not the reduce phase (bulk), which means each agent has full context about what the page covers and can make intelligent decisions about which visual features to add.

Example visual enhancements:

Mermaid diagram for workflow visualization:

```mermaid
graph LR
    A[Workflow YAML] --> B[Prodigy CLI]
    B --> C[Setup Phase]
    C --> D[Map Phase]
    D --> E1[Agent 1]
    D --> E2[Agent 2]
    E1 --> F[Reduce Phase]
    E2 --> F

Admonitions for important callouts:
```markdown
!!! warning "Breaking Change"
    The `legacy_risk` field has been deprecated. Use `minimal_count` instead.

!!! tip "Performance Optimization"
    Set `max_parallel: 5` to process items faster on multi-core systems.

Code annotations with inline explanations:

error_policy:
  on_item_failure: dlq  # (1)!
  continue_on_failure: true  # (2)!
  max_failures: 2  # (3)!

1. Failed items go to dead letter queue for manual review
2. Don't stop the entire workflow on one failure
3. Maximum number of failures before aborting workflow

4. Mermaid Diagram Validation
#

Here’s the critical innovation for visual reliability: automated diagram validation using the official Mermaid renderer.

When you generate dozens of Mermaid diagrams automatically, some will have syntax errors that break rendering. Rather than manually checking each one, the workflow includes a validation script that uses @mermaid-js/mermaid-cli (the official Mermaid rendering tool) to validate diagrams in the reduce phase:

# Validate Mermaid diagrams - ensure all diagrams have valid syntax
- shell: "cd .prodigy/scripts && npm install --silent && node validate-mermaid.js ../../$DOCS_DIR"
  on_failure:
    # Fix any invalid Mermaid diagrams found
    # Pass validation output (JSON on stderr) to Claude for context
    claude: "/prodigy-fix-mermaid-diagrams --validation-output '${shell.stderr}'"
    commit_required: true

The validation uses the exact same rendering engine that MkDocs Material uses. This provides:

  • 100% accuracy: If the diagram renders successfully with mmdc, it will render in the docs
  • Real error messages: Get actual Mermaid parser errors for precise fixes
  • Comprehensive coverage: Catches all syntax issues automatically

The validation script:

  • Extracts all Mermaid diagrams from markdown files
  • Attempts to render each diagram to SVG using mmdc
  • Reports exact line numbers and error messages for failures
  • Outputs structured JSON for Claude to automatically fix issues

Example validation output:

Validating Mermaid diagrams in docs/...

✗ Invalid diagram in docs/basics/pattern-matching.md:91
  - Unmatched square brackets (5 [ vs 4 ])
  - Edge labels contain nested quotes - escape or use HTML entities

✓ Valid diagram in docs/introduction.md:9
✓ Valid diagram in docs/basics/case-sensitivity.md:92

========================================
Validation Summary
========================================
Total diagrams: 23
Valid: 21
Invalid: 2

When validation fails, the /prodigy-fix-mermaid-diagrams command automatically fixes syntax errors and re-validates. This caught 8 broken diagrams in ripgrep’s docs that would have failed silently during build.

Multi-Layered Validation
#

Beyond Mermaid diagrams, the workflow includes comprehensive validation:

Structure validation (/prodigy-validate-mkdocs-structure):

- claude: "/prodigy-validate-mkdocs-structure --project $PROJECT_NAME --docs-dir $DOCS_DIR --output $ANALYSIS_DIR/structure-validation.json --auto-fix true"
  commit_required: true

This checks:

  • Navigation hierarchy is logical
  • No orphaned pages (exist but not linked)
  • Appropriate nesting levels
  • Consistent naming conventions

Feature consistency (/prodigy-validate-feature-consistency):

- claude: "/prodigy-validate-feature-consistency --project $PROJECT_NAME --docs-dir $DOCS_DIR --output $ANALYSIS_DIR/feature-consistency.json"

This checks:

  • Similar pages use similar visual enhancements
  • Admonitions are semantically correct (warnings for dangerous operations, tips for optimizations)
  • No overuse or underuse of features
  • Consistent diagram styles

These validations catch issues that per-page analysis misses, like:

  • One architecture page has diagrams, another similar page doesn’t
  • Configuration pages use different annotation styles
  • Some warnings use !!! danger while others use !!! warning for similar content

Combined with the existing holistic validation from the mdBook workflow, this creates a three-tier quality system:

  1. Per-page validation: Each page meets quality standards independently
  2. Build validation: All pages compile together correctly (mkdocs build --strict)
  3. Cross-page validation: Pages are consistent with each other

Real Results: ripgrep Documentation Transformation
#

I ran the enhanced MkDocs workflow on ripgrep’s minimal documentation (README + GUIDE). Here’s what happened:

From Minimal to Comprehensive
#

Before:

  • 2 documentation files (README.md, GUIDE.md)
  • ~1,800 total lines of markdown
  • No visual diagrams
  • Limited structure (linear reading path)
  • Features scattered across sections

After:

  • 55 focused documentation pages
  • Organized into 7 major sections (Basics, Advanced Patterns, Binary Data, Troubleshooting, etc.)
  • 87 Mermaid diagrams explaining workflows, architecture, and decision trees
  • 358 admonitions highlighting warnings, tips, and important callouts
  • 180+ code annotations with inline explanations
  • Tabbed installation instructions for different platforms
  • 100% strict build success - no broken links or invalid syntax

Page Splitting Impact
#

The intelligent page splitting transformed monolithic files into focused learning paths:

GUIDE.md basics section (400 lines) became:

  • basics/index.md - Overview and learning path
  • basics/pattern-matching.md - Pattern types and usage
  • basics/literal-search.md - Fixed-string searching
  • basics/case-sensitivity.md - Case handling options
  • basics/boundaries.md - Word and line boundaries
  • basics/regex-basics.md - Regular expression primer
  • basics/output.md - Output formatting
  • basics/count-list.md - Counting and listing matches
  • basics/practice.md - Practice exercises

Binary Data section (350 lines) became:

  • binary-data/index.md - Overview of binary handling
  • binary-data/detection.md - How ripgrep detects binary files
  • binary-data/modes.md - Binary search modes
  • binary-data/flags.md - Binary-specific flags
  • binary-data/explicit-implicit.md - Explicit vs implicit behavior
  • binary-data/examples.md - Practical examples

Result: Readers can find specific information immediately instead of scanning through monolithic guides.

Example: Pattern Matching Page
#

The pattern matching page demonstrates how visual enhancements transform technical content:

Before (from GUIDE.md): Plain text explaining regex vs literal patterns, scattered across multiple sections with basic code examples.

After (docs/basics/pattern-matching.md):

Added decision tree diagram:

flowchart TD
    Start[Need to Search?] --> HasSpecial{"Pattern has
special chars
like *, ., (, )?"}

    HasSpecial -->|Yes| WantLiteral{"Want to match
those chars
literally?"}
    HasSpecial -->|No| NeedFlex{"Need flexible
matching?"}

    WantLiteral -->|Yes| UseLiteral["Use -F flag
Literal String"]
    WantLiteral -->|No| UseRegex["Use Regex
Default"]

    NeedFlex -->|Yes| UseRegex
    NeedFlex -->|No| UseLiteral

Converted important notes to admonitions:

!!! tip "When to use literal strings"
    Use the `-F` flag when you:

    - Need to search for exact text containing special characters
    - Want faster searches for simple strings
    - Are searching for code snippets with regex metacharacters
    - Need predictable behavior without regex interpretation

Annotated code examples:

# Match lines containing either TODO or FIXME
rg -e TODO -e FIXME  # (1)!

# Match multiple error levels
rg -e "error" -e "warning" -e "critical"  # (2)!
  1. Each -e flag adds a pattern; matches lines with TODO OR FIXME (or both)
  2. Searches for any of the three error levels in the same command

The page went from “walls of text” to “guided learning.”

Example: Introduction Page
#

The introduction page showcases comprehensive visual enhancement:

Before (from README.md): Text-only feature list with no visualization of how ripgrep actually works.

After (docs/index.md and docs/introduction.md):

Added search process flowchart showing ripgrep’s automatic filtering:

flowchart LR
    Start([Start Search]) --> Filter{"File should
be searched?"}
    Filter -->|"No
gitignore, hidden, binary"| Skip[Skip File]
    Filter -->|Yes| Read["Read File
Line by Line"]
    Read --> Match{"Line matches
pattern?"}
    Match -->|Yes| Print[Print Line]
    Match -->|No| Next{More lines?}
    Print --> Next
    Next -->|Yes| Read
    Next -->|No| Done{More files?}

Tabbed installation instructions for different platforms:

=== "macOS"
    ```bash
    brew install ripgrep
    ```

=== "Windows"
    ```bash
    choco install ripgrep
    ```

=== "Debian/Ubuntu"
    ```bash
    sudo apt install ripgrep
    ```

Result: New users understand both how ripgrep works and how to install it in seconds.

Key Technical Innovations
#

1. Per-Page Enhancement Context
#

The enhancement happens in the map phase, not the reduce phase. This is crucial because:

  • Each agent has full context about the page’s topic and content
  • Decisions about which features to add are made with understanding
  • No generic “add a diagram here” rules—the AI understands why a diagram helps

Running enhancement in reduce would mean processing pages in bulk without context.

2. Validation Drives Quality
#

Each enhanced page goes through validation:

validate:
  claude: "/prodigy-validate-mkdocs-page"
  result_file: ".prodigy/validation-result.json"
  threshold: 100
  on_incomplete:
    claude: "/prodigy-complete-mkdocs-fix"
    max_attempts: 3

If a page doesn’t meet quality standards (missing diagrams where needed, inconsistent formatting, etc.), the workflow automatically attempts fixes. This ensures consistent quality across all pages.

3. Strict Build Enforcement
#

The reduce phase uses mkdocs build --strict:

reduce:
  - shell: "mkdocs build --strict"
    on_failure:
      claude: "/prodigy-fix-mkdocs-build-errors"

Strict mode catches:

  • Broken internal links
  • Invalid admonition syntax
  • Malformed Mermaid diagrams
  • Missing referenced files
  • YAML frontmatter errors

If the build fails, an agent automatically diagnoses and fixes the issues. No manual intervention required.

4. Structure and Consistency Checks
#

Two new reduce-phase validations ensure cross-cutting quality:

Structure validation checks:

  • Navigation hierarchy makes sense
  • No orphaned pages
  • Appropriate nesting levels
  • Consistent naming conventions

Feature consistency checks:

  • Pages of similar types use similar enhancements
  • No overuse or underuse of visual features
  • Admonition types are semantically correct (warnings for dangerous operations, tips for optimizations, etc.)

These catch issues that per-page validation misses.

Comparing the Three Generations
#

AspectDebtmap (Original)Prodigy Book-Docsripgrep MkDocs
Source MaterialOwn project docsOwn project docsExisting OSS docs (README + GUIDE)
Drift Commands/prodigy-analyze-book-chapter-drift
/prodigy-fix-chapter-drift
/prodigy-analyze-subsection-drift
/prodigy-fix-subsection-drift
/prodigy-analyze-mkdocs-drift
/prodigy-fix-mkdocs-drift
GranularityChapter-level onlySubsection-aware (H2, H3)Subsection-aware
Page StructureFixed (manual SUMMARY.md)Fixed (manual SUMMARY.md)Dynamic (automatic page splitting)
Page DiscoveryCurated chapters.jsonGap detection → flattened-items.jsonGap detection + automatic splitting + find scan
ValidationNone/prodigy-validate-doc-fix
/prodigy-validate-book-holistically
/prodigy-validate-mkdocs-page
/prodigy-validate-mkdocs-structure
/prodigy-validate-feature-consistency
Mermaid diagram validation
Visual EnhancementNoneNone/prodigy-enhance-mkdocs-page
Build Checkmdbook buildmdbook buildmkdocs build --strict
Pages Generated27 (curated)47 (gap-detected)55 (split + discovered)
Diagrams0087 Mermaid diagrams
Quality FocusBasic accuracySubsection accuracy + holistic validationAccuracy + intelligent structure + visual engagement + diagram validation

The evolution: accuracyprecisiontransformation.

Lessons Learned
#

1. Page Splitting Unlocks Comprehensive Documentation
#

The most impactful innovation wasn’t visual enhancements—it was automatic page splitting.

The workflow starts by generating initial documentation from ripgrep’s README and GUIDE.md, then analyzes the codebase to enhance and expand the docs based on actual features. During this process, some generated pages become monolithic (1,400+ lines). No reader wants to scroll through a wall of text to find information about case sensitivity or binary data handling. But manually splitting oversized files would require:

  • Deciding where to split
  • Creating directory structure
  • Moving content
  • Updating cross-references
  • Maintaining navigation

The workflow handles all of this automatically by analyzing structural complexity:

  • Identifies oversized pages (>300 lines or >5 major sections)
  • Determines logical split points (section boundaries)
  • Creates focused subpages with proper hierarchy
  • Updates cross-references and navigation

Result: Documentation that expands organically from codebase analysis becomes navigable and learnable, not an overwhelming wall of text.

2. Intelligent Structure Beats Manual Organization
#

When working with existing documentation (like ripgrep), you can’t impose an arbitrary structure. The workflow needs to:

  • Understand the existing organization
  • Identify natural topic boundaries
  • Preserve authorial intent while improving discoverability

The structure analysis command (/prodigy-analyze-mkdocs-structure) evaluates pages based on:

  • Line count and section depth
  • Topic cohesion within sections
  • Cross-reference patterns
  • Natural split points at heading boundaries

This preserves the original authors’ logic while making it more accessible.

3. Visual Features Need Intelligence
#

You can’t apply visual enhancements mechanically. A rule-based system would either:

  • Over-apply: Add diagrams everywhere, cluttering simple pages
  • Under-apply: Miss opportunities where visuals would help

AI agents with context make nuanced decisions: “Pattern matching is decision-heavy, add a decision tree diagram. Case sensitivity has three modes, add a flow diagram. Installation varies by platform, use tabbed content.”

4. Diagram Validation is Non-Negotiable
#

When you generate 87 Mermaid diagrams automatically, some will have syntax errors:

  • Unmatched brackets from complex node labels
  • Quote nesting in edge labels
  • Invalid node IDs with special characters

The Mermaid validation script caught 8 broken diagrams in ripgrep’s docs that would have silently failed during rendering. Users would have seen empty diagram blocks with no explanation.

The key insight: Using the actual Mermaid renderer (@mermaid-js/mermaid-cli) means validation is 100% accurate. If it renders with mmdc, it will render in the docs.

Automated validation → automated fixes → reliable visual documentation.

5. Per-Page Enhancement Beats Bulk Processing
#

Running enhancement in the map phase (per-page) instead of reduce phase (bulk) means:

  • Each agent has full context about what the page covers
  • Decisions are informed by content, not position
  • Quality is higher because understanding is higher

The extra parallelization complexity is worth it.

Practical Impact
#

For Open Source Projects
#

The ripgrep workflow demonstrates a new pattern: transforming existing minimal documentation into comprehensive guides automatically.

Many popular open source projects have:

  • Excellent but minimal documentation (README + basic guide)
  • Features scattered across issues, comments, and source code
  • No bandwidth for comprehensive documentation rewrite
  • Users who would benefit from visual learning aids and structured paths

The workflow pattern:

  1. Analyze existing documentation (README, GUIDE, wiki pages)
  2. Identify topics and features
  3. Split monolithic files into focused pages
  4. Enhance with diagrams and visual aids
  5. Validate everything builds and renders correctly

Result: Transform 2 files into 55+ pages of comprehensive, visually-enhanced documentation in a single workflow run.

For ripgrep Users
#

The enhanced documentation provides:

  • Faster onboarding: New users can progress from basics to advanced features systematically
  • Better discoverability: Topic-focused pages mean finding specific information quickly
  • Visual learning: Diagrams explain complex concepts like binary detection and multiline search instantly
  • Platform-specific guidance: Tabbed installation instructions for every platform
  • Troubleshooting support: Dedicated troubleshooting section with common issues and solutions

For Workflow Maintainers
#

  • Reusable pattern: The same workflow works for any similar project (applied it to ripgrep after building it for Prodigy)
  • Minimal configuration: Just point at existing docs and run
  • No manual enhancement: Diagrams, admonitions, and structure created automatically
  • Validation catches issues: Mermaid validation and strict build prevent broken docs from deploying

How to Adapt This Workflow
#

Want to build something similar? Here’s the path:

1. Start with MkDocs Material
#

Install and configure MkDocs Material with the features you want:

# mkdocs.yml
theme:
  name: material
  features:
    - content.code.annotate
    - content.tabs.link
    - navigation.tabs
    - navigation.sections

markdown_extensions:
  - admonition
  - pymdownx.details
  - pymdownx.superfences:
      custom_fences:
        - name: mermaid
          class: mermaid

2. Build Auto-Discovery
#

Use find and jq to discover all markdown files:

find docs/ -name "*.md" -type f | jq -R -s '
  split("\n") |
  map(select(length > 0)) |
  map({
    id: (split("/")[-1] | split(".md")[0]),
    file: .
  })
' > pages.json

3. Create Enhancement Logic
#

Write a slash command or agent prompt that:

  • Analyzes page content and structure
  • Identifies opportunities for visual features
  • Adds appropriate enhancements
  • Validates the result builds correctly

Example prompt structure:

Analyze this documentation page and enhance it with MkDocs Material features:

1. Add Mermaid diagrams for:
   - Architecture overviews
   - Workflow sequences
   - Data flow

2. Convert important text to admonitions:
   - Warnings for dangerous operations
   - Tips for optimizations
   - Notes for important context

3. Annotate complex code examples with explanations

4. Use tabs for alternative approaches

Preserve all existing content accuracy. Focus on making complex topics clearer.

4. Add Validation
#

Implement three validation layers:

# Per-page validation (in map phase)
validate:
  claude: "/validate-page --json '${item}'"
  threshold: 100

# Build validation (in reduce phase)
- shell: "mkdocs build --strict"

# Consistency validation (in reduce phase)
- claude: "/validate-consistency --all-pages"

5. Iterate and Refine
#

Run the workflow, review the output, and adjust:

  • Are diagrams helpful or cluttered?
  • Are admonitions semantic or decorative?
  • Are annotations explaining or over-explaining?

The AI agents adapt to your documentation style. Because agents enhance existing documentation rather than replacing it, any manual improvements you make become part of the context for the next run. If you:

  • Simplify an overly complex diagram → agents see the simpler style as context
  • Remove excessive admonitions → agents see a less cluttered baseline
  • Rewrite an annotation for clarity → agents see your preferred explanation style
  • Adjust diagram positioning → agents see your preferred placement patterns

Each workflow run reads the current documentation state, including your manual refinements. Claude uses its general pattern-matching capabilities to maintain consistency with what it sees. While this isn’t true machine learning (no training or model updates occur), the practical effect is that agents tend to match the existing documentation style. The documentation becomes an iterative process: agents enhance, you refine, agents use refinements as context.

Future Enhancements
#

The workflow is production-ready, but opportunities for improvement abound. On top of the improvements we will naturally pick up from commercial LLM improvements over time, we can improve the documentation generation system further with:

1. Screenshot Management
#

Detect when UI screenshots are outdated and automatically regenerate them using browser automation. The workflow already knows when features change—extending it to visual updates is logical.

2. Interactive Examples
#

Generate runnable code examples with embedded outputs. Users could see results without leaving the docs.

3. Version-Specific Documentation
#

Integrate with mike to maintain documentation for multiple versions automatically. When a new version is released:

  • Run the drift workflow against the new version’s codebase
  • Generate version-specific docs with accurate examples and configuration
  • Update version selectors and cross-version compatibility notes
  • Archive old versions while keeping them searchable

This would enable docs like v1.2.3/configuration vs v2.0.0/configuration with each version reflecting the actual features available at that release.

Conclusion
#

Documentation automation has evolved from drift detection to documentation transformation. The progression across three generations:

  1. Debtmap (original): Proved that AI agents can detect and fix drift reliably at the chapter level
  2. Prodigy book-docs: Added subsection-level precision and multi-pass validation
  3. ripgrep mkdocs: Proved that agents can transform existing minimal documentation into comprehensive, visually-enhanced knowledge bases

The ripgrep workflow demonstrates the pattern’s full potential:

Input: 2 files (README + GUIDE), ~1,800 lines, minimal structure Output: 55 focused pages with 87 diagrams, organized learning paths, and comprehensive coverage

By combining:

  • Intelligent page splitting to break monoliths into focused topics
  • Auto-discovery to eliminate maintenance toil
  • Subsection-aware drift detection for precision
  • Per-page visual enhancement with context
  • Multi-layer validation including Mermaid diagram checks
  • Strict build enforcement to catch integration issues

We’ve created a workflow that can transform any project’s minimal documentation into comprehensive guides automatically.

The ripgrep documentation demonstrates this transformation in production. Every page split, diagram, admonition, and annotation was generated automatically from the original README, GUIDE, and code. The workflow preserved the authors’ clarity and intent while making it dramatically more accessible.

The pattern is reusable. Point it at any project with minimal but well-written documentation, and watch it transform into comprehensive guides with visual aids and structured learning paths.


Resources
#

Live Documentation Examples:

Blog Posts:

Workflows (showing evolution):

Tools:

Validation Scripts:

  • Mermaid Validator: validate-mermaid.js (Node.js script using @mermaid-js/mermaid-cli for accurate diagram validation)

Get Started
#

Ready to transform your project’s documentation?

For your own projects: Check out the Prodigy project page to learn how to set up AI-powered workflows that maintain accurate, engaging documentation automatically.

To see the transformation in action:

  • Browse the ripgrep docs - transformed from 2 files to 55+ pages with 87 diagrams
  • Compare with ripgrep’s original GUIDE to see the before/after
  • Notice the decision tree diagrams, admonitions, code annotations, and tabbed content—all generated automatically

About the Author
#

I’m Glen Baker, building Prodigy to automate complex development workflows and Debtmap to help teams tackle technical debt systematically. If you’re interested in AI-powered development automation, read more on my blog or get in touch.


This blog post documents the third generation of the documentation automation workflow. The first generation (Debtmap) proved the concept at the chapter level. The second generation (Prodigy book-docs) added subsection precision. This third generation (ripgrep mkdocs) adds intelligent page splitting, visual engagement, and diagram validation. The workflow transformed ripgrep’s 2-file documentation into 55+ comprehensive pages—proving that AI automation can enhance existing open source projects without replacing their authors’ expertise.

Related

Automating Documentation Maintenance with Prodigy: A Real-World Case Study
Prodigy - AI Workflow Orchestration for Claude
Debtmap - Rust Technical Debt Analyzer