From README to Comprehensive Docs: Transforming ripgrep's Documentation with AI Automation

Table of Contents

From Two Files and Code to Comprehensive Documentation
#

ripgrep is one of the most popular command-line search tools, with over 57,000 stars on GitHub. Despite its popularity and rich feature set, the documentation consisted of just two files: a feature-focused README and a tutorial-style GUIDE. While these files were well-written, they left gaps:

No structured learning path from basics to advanced features
Limited visual aids to explain complex concepts
Features scattered across different sections
No dedicated troubleshooting or reference sections

Using an evolved version of my MkDocs automation workflow (originally built for Prodigy’s documentation), I generated 50+ pages of enhanced documentation, now live at https://iepathos.github.io/ripgrep.

This post documents how the workflow evolved to handle:

Intelligent page splitting - Breaking monolithic chapters into focused subpages
Automated visual enhancement - Adding diagrams, admonitions, and annotations where they help
Mermaid diagram validation - Ensuring all generated diagrams render correctly

The workflow went beyond mere drift detection—it transformed minimal docs into a comprehensive knowledge base.

What Changed: MkDocs vs mdBook
#

The original workflow used mdBook, a simple static site generator popular in the Rust ecosystem. MkDocs Material offers significantly more features:

Feature	mdBook	MkDocs Material
Mermaid Diagrams	Plugin required	Native support
Admonitions	Limited	12+ styled types
Code Annotations	No	Yes (numbered callouts)
Tabbed Content	No	Yes
Navigation Tabs	No	Yes
Search	Basic	Advanced with highlighting
Theme Customization	Limited	Extensive

The Enhanced Workflow Architecture
#

The ripgrep workflow introduced critical improvements over previous iterations. Here’s the complete flow:

name: prodigy-mkdocs-drift-detection
mode: mapreduce

setup:
  # 1. Analyze codebase features
  - claude: "/prodigy-analyze-features-for-mkdocs"

  # 2. Detect gaps and create stub pages
  - claude: "/prodigy-detect-mkdocs-gaps"

  # 3. Analyze page sizes and structural complexity (NEW!)
  - claude: "/prodigy-analyze-mkdocs-structure"

  # 4. Automatically split oversized pages (NEW!)
  - claude: "/prodigy-split-oversized-mkdocs-pages"

  # 5. Auto-discover ALL markdown files (including newly created subpages)
  - shell: |
      find $DOCS_DIR -name "*.md" -type f | jq -R -s '
        split("\n") | map(select(length > 0)) | map({
          id: (split("/")[-1] | split(".md")[0]),
          title: ...,
          file: .,
          type: "auto-discovered"
        })
      ' > $ANALYSIS_DIR/flattened-items.json

map:
  agent_template:
    # Step 1: Analyze page for drift (subsection-aware)
    - claude: "/prodigy-analyze-mkdocs-drift"

    # Step 2: Fix detected drift with validation
    - claude: "/prodigy-fix-mkdocs-drift"
      validate:
        claude: "/prodigy-validate-mkdocs-page"
        threshold: 100  # Must meet 100% quality standards
        on_incomplete:
          claude: "/prodigy-complete-mkdocs-fix"
          max_attempts: 3

    # Step 3: Enhance with visual features
    - claude: "/prodigy-enhance-mkdocs-page"

reduce:
  # Validate with strict mode
  - shell: "mkdocs build --strict"
    on_failure:
      claude: "/prodigy-fix-mkdocs-build-errors"

  # Check structure and feature consistency
  - claude: "/prodigy-validate-mkdocs-structure"
  - claude: "/prodigy-validate-feature-consistency"

  # Validate Mermaid diagrams (NEW!)
  - shell: "cd .prodigy/scripts && npm install --silent && node validate-mermaid.js ../../$DOCS_DIR"
    on_failure:
      claude: "/prodigy-fix-mermaid-diagrams --validation-output '${shell.stderr}'"

Four Major Improvements Over Book-Docs Workflow
#

The ripgrep workflow builds on the subsection-aware foundations and adds four critical enhancements:

1. Intelligent Page Splitting
#

The breakthrough feature for transforming ripgrep’s documentation: automatic page splitting based on structural analysis.

The original GUIDE.md was over 1,400 lines—a monolithic wall of text covering everything from basics to advanced features. The workflow now analyzes page structure before processing and automatically splits oversized pages into focused subpages:

# Step 3: Analyze page sizes and structural complexity
- claude: "/prodigy-analyze-mkdocs-structure --project $PROJECT_NAME --docs-dir $DOCS_DIR --pages $CHAPTERS_FILE --output $ANALYSIS_DIR/structure-report.json"

# Step 4: Automatically split all oversized pages into subpages
# This runs BEFORE map phase so agents process optimally-sized pages
- claude: "/prodigy-split-oversized-mkdocs-pages --project $PROJECT_NAME --pages $CHAPTERS_FILE --docs-dir $DOCS_DIR --structure-report $ANALYSIS_DIR/structure-report.json"

For ripgrep, this transformed:

GUIDE.md (1,400 lines) → basics/ directory with 8 focused pages (pattern-matching.md, case-sensitivity.md, boundaries.md, etc.)
Binary Data section (350 lines) → binary-data/ directory with 5 specialized pages (detection.md, modes.md, flags.md, etc.)
Troubleshooting → troubleshooting/ directory with 6 topic-specific pages

This happens in the setup phase, before map processing begins. Benefits:

Context-aware agents: Each agent works on focused topics, not overwhelming monoliths
Better enhancement: Visual features are applied to coherent topics, not scattered sections
Improved navigation: Readers find specific information faster
Maintainability: Smaller pages are easier to update and keep accurate

2. Exhaustive Page Discovery
#

After page splitting creates new files, we need to ensure every page gets processed. The workflow adds a direct filesystem scan:

find $DOCS_DIR -name "*.md" -type f | jq -R -s '
  split("\n") |
  map(select(length > 0)) |
  map({
    id: (split("/")[-1] | split(".md")[0]),
    title: (split("/")[-1] | split(".md")[0] | gsub("-|_"; " ")),
    file: .,
    type: "auto-discovered"
  })
' > $ANALYSIS_DIR/flattened-items.json

This is critical because page splitting creates files dynamically. The filesystem becomes the source of truth:

Comprehensive coverage: Every .md file is processed, including newly split pages
No manual tracking: Don’t maintain a curated list that gets out of sync
Orphaned pages caught: Pages that exist but aren’t properly linked are discovered
Deterministic: Same files every time, no variation between runs

3. Visual Enhancement Per Page
#

After drift is detected and fixed, each page gets enhanced with MkDocs Material features through the /prodigy-enhance-mkdocs-page command:

# Step 3: Enhance page with visual features (diagrams, admonitions, annotations)
# This runs per-page so Claude has full context about what the page discusses
- claude: "/prodigy-enhance-mkdocs-page --project $PROJECT_NAME --json '${item}' --auto-fix true"
  commit_required: true

This command analyzes each page’s content and adds:

Mermaid diagrams for architecture and workflow visualization
Admonitions (warnings, tips, notes) for important callouts
Code annotations with numbered inline explanations
Tabbed content for alternative approaches

The enhancement happens in the map phase (per-page), not the reduce phase (bulk), which means each agent has full context about what the page covers and can make intelligent decisions about which visual features to add.

Example visual enhancements:

Mermaid diagram for workflow visualization:

```mermaid
graph LR
    A[Workflow YAML] --> B[Prodigy CLI]
    B --> C[Setup Phase]
    C --> D[Map Phase]
    D --> E1[Agent 1]
    D --> E2[Agent 2]
    E1 --> F[Reduce Phase]
    E2 --> F


Admonitions for important callouts:
```markdown
!!! warning "Breaking Change"
    The `legacy_risk` field has been deprecated. Use `minimal_count` instead.

!!! tip "Performance Optimization"
    Set `max_parallel: 5` to process items faster on multi-core systems.

Code annotations with inline explanations:

error_policy:
  on_item_failure: dlq  # (1)!
  continue_on_failure: true  # (2)!
  max_failures: 2  # (3)!

1. Failed items go to dead letter queue for manual review
2. Don't stop the entire workflow on one failure
3. Maximum number of failures before aborting workflow

4. Mermaid Diagram Validation
#

Here’s the critical innovation for visual reliability: automated diagram validation using the official Mermaid renderer.

When you generate dozens of Mermaid diagrams automatically, some will have syntax errors that break rendering. Rather than manually checking each one, the workflow includes a validation script that uses @mermaid-js/mermaid-cli (the official Mermaid rendering tool) to validate diagrams in the reduce phase:

# Validate Mermaid diagrams - ensure all diagrams have valid syntax
- shell: "cd .prodigy/scripts && npm install --silent && node validate-mermaid.js ../../$DOCS_DIR"
  on_failure:
    # Fix any invalid Mermaid diagrams found
    # Pass validation output (JSON on stderr) to Claude for context
    claude: "/prodigy-fix-mermaid-diagrams --validation-output '${shell.stderr}'"
    commit_required: true

The validation uses the exact same rendering engine that MkDocs Material uses. This provides:

100% accuracy: If the diagram renders successfully with mmdc, it will render in the docs
Real error messages: Get actual Mermaid parser errors for precise fixes
Comprehensive coverage: Catches all syntax issues automatically

The validation script:

Extracts all Mermaid diagrams from markdown files
Attempts to render each diagram to SVG using mmdc
Reports exact line numbers and error messages for failures
Outputs structured JSON for Claude to automatically fix issues

Example validation output:

Validating Mermaid diagrams in docs/...

✗ Invalid diagram in docs/basics/pattern-matching.md:91
  - Unmatched square brackets (5 [ vs 4 ])
  - Edge labels contain nested quotes - escape or use HTML entities

✓ Valid diagram in docs/introduction.md:9
✓ Valid diagram in docs/basics/case-sensitivity.md:92

========================================
Validation Summary
========================================
Total diagrams: 23
Valid: 21
Invalid: 2

When validation fails, the /prodigy-fix-mermaid-diagrams command automatically fixes syntax errors and re-validates. This caught 8 broken diagrams in ripgrep’s docs that would have failed silently during build.

Multi-Layered Validation
#

Beyond Mermaid diagrams, the workflow includes comprehensive validation:

Structure validation (/prodigy-validate-mkdocs-structure):

- claude: "/prodigy-validate-mkdocs-structure --project $PROJECT_NAME --docs-dir $DOCS_DIR --output $ANALYSIS_DIR/structure-validation.json --auto-fix true"
  commit_required: true

This checks:

Navigation hierarchy is logical
No orphaned pages (exist but not linked)
Appropriate nesting levels
Consistent naming conventions

Feature consistency (/prodigy-validate-feature-consistency):

- claude: "/prodigy-validate-feature-consistency --project $PROJECT_NAME --docs-dir $DOCS_DIR --output $ANALYSIS_DIR/feature-consistency.json"

This checks:

Similar pages use similar visual enhancements
Admonitions are semantically correct (warnings for dangerous operations, tips for optimizations)
No overuse or underuse of features
Consistent diagram styles

These validations catch issues that per-page analysis misses, like:

One architecture page has diagrams, another similar page doesn’t
Configuration pages use different annotation styles
Some warnings use !!! danger while others use !!! warning for similar content

Combined with the existing holistic validation from the mdBook workflow, this creates a three-tier quality system:

Per-page validation: Each page meets quality standards independently
Build validation: All pages compile together correctly (mkdocs build --strict)
Cross-page validation: Pages are consistent with each other

Real Results: ripgrep Documentation Transformation
#

I ran the enhanced MkDocs workflow on ripgrep’s minimal documentation (README + GUIDE). Here’s what happened:

From Minimal to Comprehensive
#

Before:

2 documentation files (README.md, GUIDE.md)
~1,800 total lines of markdown
No visual diagrams
Limited structure (linear reading path)
Features scattered across sections

After:

55 focused documentation pages
Organized into 7 major sections (Basics, Advanced Patterns, Binary Data, Troubleshooting, etc.)
87 Mermaid diagrams explaining workflows, architecture, and decision trees
358 admonitions highlighting warnings, tips, and important callouts
180+ code annotations with inline explanations
Tabbed installation instructions for different platforms
100% strict build success - no broken links or invalid syntax

Page Splitting Impact
#

The intelligent page splitting transformed monolithic files into focused learning paths:

GUIDE.md basics section (400 lines) became:

basics/index.md - Overview and learning path
basics/pattern-matching.md - Pattern types and usage
basics/literal-search.md - Fixed-string searching
basics/case-sensitivity.md - Case handling options
basics/boundaries.md - Word and line boundaries
basics/regex-basics.md - Regular expression primer
basics/output.md - Output formatting
basics/count-list.md - Counting and listing matches
basics/practice.md - Practice exercises

Binary Data section (350 lines) became:

binary-data/index.md - Overview of binary handling
binary-data/detection.md - How ripgrep detects binary files
binary-data/modes.md - Binary search modes
binary-data/flags.md - Binary-specific flags
binary-data/explicit-implicit.md - Explicit vs implicit behavior
binary-data/examples.md - Practical examples

Result: Readers can find specific information immediately instead of scanning through monolithic guides.

Example: Pattern Matching Page
#

The pattern matching page demonstrates how visual enhancements transform technical content:

Before (from GUIDE.md): Plain text explaining regex vs literal patterns, scattered across multiple sections with basic code examples.

After (docs/basics/pattern-matching.md):

Added decision tree diagram:

flowchart TD
    Start[Need to Search?] --> HasSpecial{"Pattern has
special chars
like *, ., (, )?"}

    HasSpecial -->|Yes| WantLiteral{"Want to match
those chars
literally?"}
    HasSpecial -->|No| NeedFlex{"Need flexible
matching?"}

    WantLiteral -->|Yes| UseLiteral["Use -F flag
Literal String"]
    WantLiteral -->|No| UseRegex["Use Regex
Default"]

    NeedFlex -->|Yes| UseRegex
    NeedFlex -->|No| UseLiteral

Converted important notes to admonitions:

!!! tip "When to use literal strings"
    Use the `-F` flag when you:

    - Need to search for exact text containing special characters
    - Want faster searches for simple strings
    - Are searching for code snippets with regex metacharacters
    - Need predictable behavior without regex interpretation

Annotated code examples:

# Match lines containing either TODO or FIXME
rg -e TODO -e FIXME  # (1)!

# Match multiple error levels
rg -e "error" -e "warning" -e "critical"  # (2)!

Each -e flag adds a pattern; matches lines with TODO OR FIXME (or both)
Searches for any of the three error levels in the same command

The page went from “walls of text” to “guided learning.”

Example: Introduction Page
#

The introduction page showcases comprehensive visual enhancement:

Before (from README.md): Text-only feature list with no visualization of how ripgrep actually works.

After (docs/index.md and docs/introduction.md):

Added search process flowchart showing ripgrep’s automatic filtering:

flowchart LR
    Start([Start Search]) --> Filter{"File should
be searched?"}
    Filter -->|"No
gitignore, hidden, binary"| Skip[Skip File]
    Filter -->|Yes| Read["Read File
Line by Line"]
    Read --> Match{"Line matches
pattern?"}
    Match -->|Yes| Print[Print Line]
    Match -->|No| Next{More lines?}
    Print --> Next
    Next -->|Yes| Read
    Next -->|No| Done{More files?}

Tabbed installation instructions for different platforms:

=== "macOS"
    ```bash
    brew install ripgrep
    ```

=== "Windows"
    ```bash
    choco install ripgrep
    ```

=== "Debian/Ubuntu"
    ```bash
    sudo apt install ripgrep
    ```

Result: New users understand both how ripgrep works and how to install it in seconds.

Key Technical Innovations
#

1. Per-Page Enhancement Context
#

The enhancement happens in the map phase, not the reduce phase. This is crucial because:

Each agent has full context about the page’s topic and content
Decisions about which features to add are made with understanding
No generic “add a diagram here” rules—the AI understands why a diagram helps

Running enhancement in reduce would mean processing pages in bulk without context.

2. Validation Drives Quality
#

Each enhanced page goes through validation:

validate:
  claude: "/prodigy-validate-mkdocs-page"
  result_file: ".prodigy/validation-result.json"
  threshold: 100
  on_incomplete:
    claude: "/prodigy-complete-mkdocs-fix"
    max_attempts: 3

If a page doesn’t meet quality standards (missing diagrams where needed, inconsistent formatting, etc.), the workflow automatically attempts fixes. This ensures consistent quality across all pages.

3. Strict Build Enforcement
#

The reduce phase uses mkdocs build --strict:

reduce:
  - shell: "mkdocs build --strict"
    on_failure:
      claude: "/prodigy-fix-mkdocs-build-errors"

Strict mode catches:

Broken internal links
Invalid admonition syntax
Malformed Mermaid diagrams
Missing referenced files
YAML frontmatter errors

If the build fails, an agent automatically diagnoses and fixes the issues. No manual intervention required.

4. Structure and Consistency Checks
#

Two new reduce-phase validations ensure cross-cutting quality:

Structure validation checks:

Navigation hierarchy makes sense
No orphaned pages
Appropriate nesting levels
Consistent naming conventions

Feature consistency checks:

Pages of similar types use similar enhancements
No overuse or underuse of visual features
Admonition types are semantically correct (warnings for dangerous operations, tips for optimizations, etc.)

These catch issues that per-page validation misses.

Comparing the Three Generations
#

Aspect	Debtmap (Original)	Prodigy Book-Docs	ripgrep MkDocs
Source Material	Own project docs	Own project docs	Existing OSS docs (README + GUIDE)
Drift Commands	`/prodigy-analyze-book-chapter-drift` `/prodigy-fix-chapter-drift`	`/prodigy-analyze-subsection-drift` `/prodigy-fix-subsection-drift`	`/prodigy-analyze-mkdocs-drift` `/prodigy-fix-mkdocs-drift`
Granularity	Chapter-level only	Subsection-aware (H2, H3)	Subsection-aware
Page Structure	Fixed (manual SUMMARY.md)	Fixed (manual SUMMARY.md)	Dynamic (automatic page splitting)
Page Discovery	Curated chapters.json	Gap detection → flattened-items.json	Gap detection + automatic splitting + `find` scan
Validation	None	`/prodigy-validate-doc-fix` `/prodigy-validate-book-holistically`	`/prodigy-validate-mkdocs-page` `/prodigy-validate-mkdocs-structure` `/prodigy-validate-feature-consistency` Mermaid diagram validation
Visual Enhancement	None	None	`/prodigy-enhance-mkdocs-page`
Build Check	`mdbook build`	`mdbook build`	`mkdocs build --strict`
Pages Generated	27 (curated)	47 (gap-detected)	55 (split + discovered)
Diagrams	0	0	87 Mermaid diagrams
Quality Focus	Basic accuracy	Subsection accuracy + holistic validation	Accuracy + intelligent structure + visual engagement + diagram validation

The evolution: accuracy → precision → transformation.

Lessons Learned
#

1. Page Splitting Unlocks Comprehensive Documentation
#

The most impactful innovation wasn’t visual enhancements—it was automatic page splitting.

The workflow starts by generating initial documentation from ripgrep’s README and GUIDE.md, then analyzes the codebase to enhance and expand the docs based on actual features. During this process, some generated pages become monolithic (1,400+ lines). No reader wants to scroll through a wall of text to find information about case sensitivity or binary data handling. But manually splitting oversized files would require:

Deciding where to split
Creating directory structure
Moving content
Updating cross-references
Maintaining navigation

The workflow handles all of this automatically by analyzing structural complexity:

Identifies oversized pages (>300 lines or >5 major sections)
Determines logical split points (section boundaries)
Creates focused subpages with proper hierarchy
Updates cross-references and navigation

Result: Documentation that expands organically from codebase analysis becomes navigable and learnable, not an overwhelming wall of text.

2. Intelligent Structure Beats Manual Organization
#

When working with existing documentation (like ripgrep), you can’t impose an arbitrary structure. The workflow needs to:

Understand the existing organization
Identify natural topic boundaries
Preserve authorial intent while improving discoverability

The structure analysis command (/prodigy-analyze-mkdocs-structure) evaluates pages based on:

Line count and section depth
Topic cohesion within sections
Cross-reference patterns
Natural split points at heading boundaries

This preserves the original authors’ logic while making it more accessible.

3. Visual Features Need Intelligence
#

You can’t apply visual enhancements mechanically. A rule-based system would either:

Over-apply: Add diagrams everywhere, cluttering simple pages
Under-apply: Miss opportunities where visuals would help

AI agents with context make nuanced decisions: “Pattern matching is decision-heavy, add a decision tree diagram. Case sensitivity has three modes, add a flow diagram. Installation varies by platform, use tabbed content.”

4. Diagram Validation is Non-Negotiable
#

When you generate 87 Mermaid diagrams automatically, some will have syntax errors:

Unmatched brackets from complex node labels
Quote nesting in edge labels
Invalid node IDs with special characters

The Mermaid validation script caught 8 broken diagrams in ripgrep’s docs that would have silently failed during rendering. Users would have seen empty diagram blocks with no explanation.

The key insight: Using the actual Mermaid renderer (@mermaid-js/mermaid-cli) means validation is 100% accurate. If it renders with mmdc, it will render in the docs.

Automated validation → automated fixes → reliable visual documentation.

5. Per-Page Enhancement Beats Bulk Processing
#

Running enhancement in the map phase (per-page) instead of reduce phase (bulk) means:

Each agent has full context about what the page covers
Decisions are informed by content, not position
Quality is higher because understanding is higher

The extra parallelization complexity is worth it.

Practical Impact
#

For Open Source Projects
#

The ripgrep workflow demonstrates a new pattern: transforming existing minimal documentation into comprehensive guides automatically.

Many popular open source projects have:

Excellent but minimal documentation (README + basic guide)
Features scattered across issues, comments, and source code
No bandwidth for comprehensive documentation rewrite
Users who would benefit from visual learning aids and structured paths

The workflow pattern:

Analyze existing documentation (README, GUIDE, wiki pages)
Identify topics and features
Split monolithic files into focused pages
Enhance with diagrams and visual aids
Validate everything builds and renders correctly

Result: Transform 2 files into 55+ pages of comprehensive, visually-enhanced documentation in a single workflow run.

For ripgrep Users
#

The enhanced documentation provides:

Faster onboarding: New users can progress from basics to advanced features systematically
Better discoverability: Topic-focused pages mean finding specific information quickly
Visual learning: Diagrams explain complex concepts like binary detection and multiline search instantly
Platform-specific guidance: Tabbed installation instructions for every platform
Troubleshooting support: Dedicated troubleshooting section with common issues and solutions

For Workflow Maintainers
#

Reusable pattern: The same workflow works for any similar project (applied it to ripgrep after building it for Prodigy)
Minimal configuration: Just point at existing docs and run
No manual enhancement: Diagrams, admonitions, and structure created automatically
Validation catches issues: Mermaid validation and strict build prevent broken docs from deploying

How to Adapt This Workflow
#

Want to build something similar? Here’s the path:

1. Start with MkDocs Material
#

Install and configure MkDocs Material with the features you want:

# mkdocs.yml
theme:
  name: material
  features:
    - content.code.annotate
    - content.tabs.link
    - navigation.tabs
    - navigation.sections

markdown_extensions:
  - admonition
  - pymdownx.details
  - pymdownx.superfences:
      custom_fences:
        - name: mermaid
          class: mermaid

2. Build Auto-Discovery
#

Use find and jq to discover all markdown files:

find docs/ -name "*.md" -type f | jq -R -s '
  split("\n") |
  map(select(length > 0)) |
  map({
    id: (split("/")[-1] | split(".md")[0]),
    file: .
  })
' > pages.json

3. Create Enhancement Logic
#

Write a slash command or agent prompt that:

Analyzes page content and structure
Identifies opportunities for visual features
Adds appropriate enhancements
Validates the result builds correctly

Example prompt structure:

Analyze this documentation page and enhance it with MkDocs Material features:

1. Add Mermaid diagrams for:
   - Architecture overviews
   - Workflow sequences
   - Data flow

2. Convert important text to admonitions:
   - Warnings for dangerous operations
   - Tips for optimizations
   - Notes for important context

3. Annotate complex code examples with explanations

4. Use tabs for alternative approaches

Preserve all existing content accuracy. Focus on making complex topics clearer.

4. Add Validation
#

Implement three validation layers:

# Per-page validation (in map phase)
validate:
  claude: "/validate-page --json '${item}'"
  threshold: 100

# Build validation (in reduce phase)
- shell: "mkdocs build --strict"

# Consistency validation (in reduce phase)
- claude: "/validate-consistency --all-pages"

5. Iterate and Refine
#

Run the workflow, review the output, and adjust:

Are diagrams helpful or cluttered?
Are admonitions semantic or decorative?
Are annotations explaining or over-explaining?

The AI agents adapt to your documentation style. Because agents enhance existing documentation rather than replacing it, any manual improvements you make become part of the context for the next run. If you:

Simplify an overly complex diagram → agents see the simpler style as context
Remove excessive admonitions → agents see a less cluttered baseline
Rewrite an annotation for clarity → agents see your preferred explanation style
Adjust diagram positioning → agents see your preferred placement patterns

Each workflow run reads the current documentation state, including your manual refinements. Claude uses its general pattern-matching capabilities to maintain consistency with what it sees. While this isn’t true machine learning (no training or model updates occur), the practical effect is that agents tend to match the existing documentation style. The documentation becomes an iterative process: agents enhance, you refine, agents use refinements as context.

Future Enhancements
#

The workflow is production-ready, but opportunities for improvement abound. On top of the improvements we will naturally pick up from commercial LLM improvements over time, we can improve the documentation generation system further with:

1. Screenshot Management
#

Detect when UI screenshots are outdated and automatically regenerate them using browser automation. The workflow already knows when features change—extending it to visual updates is logical.

2. Interactive Examples
#

Generate runnable code examples with embedded outputs. Users could see results without leaving the docs.

3. Version-Specific Documentation
#

Integrate with mike to maintain documentation for multiple versions automatically. When a new version is released:

Run the drift workflow against the new version’s codebase
Generate version-specific docs with accurate examples and configuration
Update version selectors and cross-version compatibility notes
Archive old versions while keeping them searchable

This would enable docs like v1.2.3/configuration vs v2.0.0/configuration with each version reflecting the actual features available at that release.

Conclusion
#

Documentation automation has evolved from drift detection to documentation transformation. The progression across three generations:

Debtmap (original): Proved that AI agents can detect and fix drift reliably at the chapter level
Prodigy book-docs: Added subsection-level precision and multi-pass validation
ripgrep mkdocs: Proved that agents can transform existing minimal documentation into comprehensive, visually-enhanced knowledge bases

The ripgrep workflow demonstrates the pattern’s full potential:

Input: 2 files (README + GUIDE), ~1,800 lines, minimal structure Output: 55 focused pages with 87 diagrams, organized learning paths, and comprehensive coverage

By combining:

Intelligent page splitting to break monoliths into focused topics
Auto-discovery to eliminate maintenance toil
Subsection-aware drift detection for precision
Per-page visual enhancement with context
Multi-layer validation including Mermaid diagram checks
Strict build enforcement to catch integration issues

We’ve created a workflow that can transform any project’s minimal documentation into comprehensive guides automatically.

The ripgrep documentation demonstrates this transformation in production. Every page split, diagram, admonition, and annotation was generated automatically from the original README, GUIDE, and code. The workflow preserved the authors’ clarity and intent while making it dramatically more accessible.

The pattern is reusable. Point it at any project with minimal but well-written documentation, and watch it transform into comprehensive guides with visual aids and structured learning paths.

Resources
#

Live Documentation Examples:

ripgrep Docs (transformed from 2 files): https://iepathos.github.io/ripgrep/
Prodigy Docs (MkDocs Material): https://iepathos.github.io/prodigy/
Debtmap Docs (mdBook): https://iepathos.github.io/debtmap/

Blog Posts:

Original automation case study: Automating Documentation Maintenance with Prodigy

Workflows (showing evolution):

Generation 1: Debtmap book-docs-drift.yml (chapter-level)
Generation 2: Prodigy book-docs-drift.yml (subsection-aware)
Generation 3: ripgrep mkdocs-drift.yml (page splitting + visual enhancements + diagram validation)

Tools:

Prodigy: GitHub | Crates.io
Debtmap: GitHub | Crates.io
ripgrep: GitHub | Fork with MkDocs
MkDocs Material: https://squidfunk.github.io/mkdocs-material/

Validation Scripts:

Mermaid Validator: validate-mermaid.js (Node.js script using @mermaid-js/mermaid-cli for accurate diagram validation)

Get Started
#

Ready to transform your project’s documentation?

For your own projects: Check out the Prodigy project page to learn how to set up AI-powered workflows that maintain accurate, engaging documentation automatically.

To see the transformation in action:

Browse the ripgrep docs - transformed from 2 files to 55+ pages with 87 diagrams
Compare with ripgrep’s original GUIDE to see the before/after
Notice the decision tree diagrams, admonitions, code annotations, and tabbed content—all generated automatically

About the Author
#

I’m Glen Baker, building Prodigy to automate complex development workflows and Debtmap to help teams tackle technical debt systematically. If you’re interested in AI-powered development automation, read more on my blog or get in touch.

This blog post documents the third generation of the documentation automation workflow. The first generation (Debtmap) proved the concept at the chapter level. The second generation (Prodigy book-docs) added subsection precision. This third generation (ripgrep mkdocs) adds intelligent page splitting, visual engagement, and diagram validation. The workflow transformed ripgrep’s 2-file documentation into 55+ comprehensive pages—proving that AI automation can enhance existing open source projects without replacing their authors’ expertise.

From Two Files and Code to Comprehensive Documentation#

What Changed: MkDocs vs mdBook#

The Enhanced Workflow Architecture#

Four Major Improvements Over Book-Docs Workflow#

1. Intelligent Page Splitting#

2. Exhaustive Page Discovery#

3. Visual Enhancement Per Page#

4. Mermaid Diagram Validation#

Multi-Layered Validation#

Real Results: ripgrep Documentation Transformation#

From Minimal to Comprehensive#

Page Splitting Impact#

Example: Pattern Matching Page#

Example: Introduction Page#

Key Technical Innovations#

1. Per-Page Enhancement Context#

2. Validation Drives Quality#

3. Strict Build Enforcement#

4. Structure and Consistency Checks#

Comparing the Three Generations#

Lessons Learned#

1. Page Splitting Unlocks Comprehensive Documentation#

2. Intelligent Structure Beats Manual Organization#

3. Visual Features Need Intelligence#

4. Diagram Validation is Non-Negotiable#

5. Per-Page Enhancement Beats Bulk Processing#

Practical Impact#

For Open Source Projects#

For ripgrep Users#

For Workflow Maintainers#

How to Adapt This Workflow#

1. Start with MkDocs Material#

2. Build Auto-Discovery#

3. Create Enhancement Logic#

4. Add Validation#

5. Iterate and Refine#

Future Enhancements#

1. Screenshot Management#

2. Interactive Examples#

3. Version-Specific Documentation#

Conclusion#

Resources#

Get Started#

About the Author#

Related