Why AI 'Fails' at Complex Tasks: It's Your Technical Debt

Table of Contents

The Common Complaint
#

I keep seeing this pattern in discussions about AI coding assistants:

AI excels at bootstrapping new projects, able to generate scaffolding, write tests, debug simple issues, and power through generic code patterns. However, that same AI struggles significantly with codebases of higher complexity. As soon as event-driven architectures, ETL pipelines, or complex data layers enter the picture, the AI can’t handle the domain complexity. It can’t seem to operate at the “macro level” these systems require.

This observation is correct, but the diagnosis is often wrong.

In most codebases, the problem isn’t that AI can’t handle complex domains. The problem is cognitive load.

Well designed code manages complexity through proper structure. Poorly designed code requires understanding many distant parts at once. How we manage that complexity is the essence of code quality.

Leaky Abstractions
#

Well-designed code abstracts domain complexity through proper boundaries and separation of concerns. When you’re working on any individual component, you shouldn’t need to hold the entire event flow or ETL pipeline in your head. If you (or an AI) are drowning in macro-level concerns while working on a single function, that’s not a domain problem, it’s a structural problem:

Leaky abstractions
Tangled responsibilities
Poor module boundaries
Mixed I/O and business logic
Insufficient test coverage

This is technical debt that needs refactoring.

Why These Domains SEEM Harder
#

Event systems, ETL processes, and complex data layers ARE inherently complex. The issue compounds:

These domains are also more LIKELY to be written poorly because they’re harder to structure well.

The inherent complexity makes developers (and AI) more likely to:

Take shortcuts under time pressure
Mix concerns that should be separated
Let abstractions leak across boundaries
Skip tests because “it’s too complex to test”

Then when AI struggles, it’s also the nearest scapegoat. If you’re an experienced dev trying AI to see where its limits are, it’s easy to mistakenly conclude that it can’t handle the domain complexity rather than recognizing the code quality issues at play.

What Good Structure Looks Like
#

Note: Full runnable examples are available in the debtmap examples directory. The code below is simplified for illustration - see the linked examples for complete, compilable versions analyzed with debtmap.

Bad code (what you usually see):

// ETL function with everything tangled together
async fn process_user_data(user_id: i64) -> Result<()> {
    // 150 lines of:
    // - Database queries mixed with transformations
    // - Side effects scattered throughout
    // - Business logic intertwined with I/O
    // - Mutation soup
    // - Event publishing embedded in processing
    // All in one giant function
}

When you or AI open this file, you need to understand:

The database schema
The transformation logic
The event system
The error handling strategy
The entire data pipeline flow

Good code (rare but possible):

// Pure transformation pipeline - business logic only
fn transform_user_data(raw: RawUserData) -> Result<ProcessedUserData> {
    raw.validate()
        .and_then(normalize_fields)
        .and_then(enrich_with_defaults)
        .and_then(apply_business_rules)
}

// I/O at the edges - infrastructure concerns
async fn process_user_data(user_id: i64) -> Result<()> {
    let raw = fetch_user_data(user_id).await?;
    let processed = transform_user_data(raw)?;

    save_processed_data(&processed).await?;
    publish_user_updated_event(user_id).await?;

    Ok(())
}

Now when you work on transform_user_data:

No database knowledge needed
No event system knowledge needed
Just pure data transformation
Easy to test
Easy to reason about

The complexity hasn’t disappeared, it’s just been properly encapsulated.

Event Handling: Another Common Culprit
#

Event-driven systems suffer from similar problems. Here’s the typical pattern:

Bad event handler (everything tangled):

// Event handler with infrastructure, business logic, and side effects mixed
async fn on_user_registered(event: Event) -> Result<()> {
    // Deserialize event payload
    let user_data: UserRegistered = serde_json::from_str(&event.payload)?;

    // Fetch additional data
    let user = db.get_user(user_data.id).await?;
    let plan = db.get_plan(user.plan_id).await?;

    // Business logic mixed with event concerns
    if user.email_verified {
        let welcome_email = format_welcome_email(&user, &plan);
        email_service.send(welcome_email).await?;

        // Publish more events (nested event handling)
        event_bus.publish(Event {
            topic: "email.sent",
            payload: serde_json::to_string(&EmailSent {
                user_id: user.id
            })?,
        }).await?;
    }

    // Update state
    analytics.track("user_registered", user.id).await?;

    // More events
    if plan.is_trial {
        event_bus.publish(Event {
            topic: "trial.started",
            payload: serde_json::to_string(&TrialStarted {
                user_id: user.id,
                expires_at: Utc::now() + Duration::days(14),
            })?,
        }).await?;
    }

    Ok(())
}

When you work on this code, you need to understand:

The event bus serialization format
Email service API
Analytics tracking system
Database schema and relationships
Trial business logic
All downstream event handlers

Good event handler (clean separation):

// Pure domain logic - no event infrastructure
fn handle_user_registered(user: User, plan: Plan) -> Vec<DomainEvent> {
    let mut events = vec![];

    if user.email_verified {
        events.push(DomainEvent::SendWelcomeEmail {
            user_id: user.id,
            plan_type: plan.plan_type,
        });
    }

    events.push(DomainEvent::TrackRegistration {
        user_id: user.id,
    });

    if plan.is_trial {
        events.push(DomainEvent::StartTrial {
            user_id: user.id,
            expires_at: Utc::now() + Duration::days(14),
        });
    }

    events
}

// Infrastructure wrapper - events and I/O only
async fn on_user_registered(event: Event) -> Result<()> {
    let payload: UserRegistered = deserialize_event(&event)?;
    let user = fetch_user(payload.user_id).await?;
    let plan = fetch_plan(user.plan_id).await?;

    let domain_events = handle_user_registered(user, plan);

    publish_domain_events(domain_events).await
}

Now when you work on handle_user_registered:

No event bus knowledge needed
No database or API knowledge needed
Pure business logic
Easy to test with plain structs
Clear what events will be produced

Again, the complexity is managed through proper boundaries.

The Test: Local Reasoning
#

Here’s the litmus test for good architecture:

Can you understand and modify any individual function without understanding the entire system?

If the answer is no, you have structural problems, not domain complexity.

Good architecture creates:

Local reasoning - Any given function only requires understanding that piece
Enforced correctness - Types and interfaces make it hard to do the wrong thing (the “pit of success”)
Clear failures - When something breaks, the error is obvious and localized

Why AI (and Humans) Output Technical Debt
#

This isn’t just an AI problem. Humans intentionally ship technical debt all the time, for good reasons:

Why developers take shortcuts:

Time pressure to ship features
“We’ll refactor it later” (narrator: they won’t)
Unclear requirements make proper abstraction difficult
Premature optimization avoidance taken too far
Lack of understanding of the domain initially

Why AI does the same:

Training data includes “quick working code” probably more so than “well-factored production code”
Reward signals likely weight “it works” more heavily than “it’s maintainable”
Context limitations make refactoring harder
Users accept the first working solution rather than asking for better
Optimization for speed mirrors human behavior under time pressure
Tendency to continue existing patterns such as technical debt in codebases

The difference is: Humans understand the tradeoff. AI just mimics the pattern.

How Technical Debt Snowballs
#

Here’s the vicious cycle:

Initial shortcut - “claude make it work” and then ships quickly with tangled code
Context growth - Each new feature needs more system knowledge
Cognitive overload - Developers (and AI) struggle with mounting complexity
More shortcuts - Under pressure, take more shortcuts to cope
Degraded structure - Abstractions break down further
Compound effect - Each change makes the next change harder
AI failure - AI can’t hold entire system context, makes mistakes
Blame the tool - “AI can’t handle complex domains”

The real problem was step 1, but we only hear about step 7.

How to Fix It Systematically
#

You can’t just tell AI “write better code.” Well, you can, but this isn’t generally effective from my testing. You need to:

Identify the technical debt hotspots
Prioritize what to fix first
Refactor to encapsulate complexity
Add test coverage as guardrails
Use AI effectively on the improved codebase

Using Debtmap to Find What Matters
#

I built debtmap to solve exactly this problem. It uses multi-signal analysis to identify where to focus efforts in order to manage technical debt as a projects grow:

# Generate coverage report
cargo llvm-cov --lcov --output-path target/coverage/lcov.info

# Cross-reference complexity with coverage and context providers such as git history
debtmap analyze . --lcov target/coverage/lcov.info --context

What makes debtmap different:

Multi-signal risk scoring - Combines complexity, test coverage, purity analysis, git history, and bug frequency
Pattern recognition - Distinguishes real complexity from simple repetitive code with information theory
Actionable priorities - “Fix this file first, here’s why, here’s how”
Coverage integration - Shows where complex code lacks test protection making it higher risk

Example output:

#1 SCORE: 8.9 [CRITICAL]
├─ TEST GAP: ./src/event_handler.rs:38 handle_user_event()
├─ COMPLEXITY: cyclomatic=15, cognitive=22
├─ COVERAGE: 0% (expected: 90% for Business Logic)
├─ GIT HISTORY: 23 changes, 8 bug fixes
├─ ACTION: Add tests, then extract pure functions
└─ WHY: High complexity + no tests + frequent bugs = highest risk

This tells you:

What to fix (handle_user_event function)
Why it matters (complexity + no coverage + bug history)
How to fix it (add tests, extract logic)
Impact if you fix it (reduce risk score by 3.7)

Real Example: ETL and Event Code Analysis
#

I’ve created runnable examples that demonstrate exactly what we’re talking about. Let’s see what debtmap finds:

ETL Bad Example (etl_bad.rs):

#1 SCORE: 7.5 [HIGH]
├─ COMPLEXITY HOTSPOT: etl_bad.rs:42 process_user_data()
├─ COMPLEXITY: cyclomatic=25, cognitive=48
├─ COVERAGE: 0%
├─ ACTION: Split into 4 focused functions by decision clusters
└─ WHY: Multiple nested conditionals with mixed I/O and business logic

ETL Good Example (etl_good.rs):

No high-priority issues detected.
Pure functions have low complexity scores.
Test coverage: 85% on business logic functions.

Event Handler Bad Example (events_bad.rs):

#1 SCORE: 6.8 [MEDIUM]
├─ COMPLEXITY HOTSPOT: events_bad.rs:95 on_user_registered()
├─ COMPLEXITY: cyclomatic=18, cognitive=42
├─ COVERAGE: 0%
├─ ACTION: Reduce complexity from 42 to ~15
└─ WHY: Deeply nested event handling with multiple decision paths

Event Handler Good Example (events_good.rs):

No high-priority issues detected.
Domain logic functions have low cognitive complexity.
Test coverage: 100% on handle_user_registered().

The difference is stark:

Bad examples: HIGH/MEDIUM severity, complexity in the 40s, 0% coverage
Good examples: No issues, low complexity, high test coverage

You can clone the examples directory and run debtmap yourself to see the full analysis.

The Refactoring Process
#

Once you’ve identified hotspots:

1. Add Test Coverage First

Before refactoring, add tests that lock in current behavior:

#[test]
fn test_user_event_handling_current_behavior() {
    // Even if code is messy, test what it does now
    let event = UserEvent::new(123, "login");
    let result = handle_user_event(event);
    assert!(result.is_ok());
}

2. Extract Pure Functions

Separate business logic from I/O:

// Before: everything tangled
async fn handle_user_event(event: UserEvent) -> Result<()> {
    let user = db.fetch_user(event.user_id).await?;
    let processed = transform_and_validate(&user, &event)?;
    db.save(processed).await?;
    event_bus.publish(processed).await?;
    Ok(())
}

// After: pure logic extracted
fn process_event_logic(user: &User, event: &UserEvent) -> Result<ProcessedEvent> {
    validate_event(event)?
        .transform_with_user(user)
        .apply_business_rules()
}

// I/O wrapper stays simple
async fn handle_user_event(event: UserEvent) -> Result<()> {
    let user = fetch_user(event.user_id).await?;
    let processed = process_event_logic(&user, &event)?;
    persist_and_publish(processed).await
}

3. Create Clear Boundaries

Use types to enforce correct usage:

// Type states prevent misuse
struct UnvalidatedEvent { /* ... */ }
struct ValidatedEvent { /* ... */ }
struct ProcessedEvent { /* ... */ }

impl UnvalidatedEvent {
    fn validate(self) -> Result<ValidatedEvent> { /* ... */ }
}

impl ValidatedEvent {
    fn process(self, user: &User) -> ProcessedEvent { /* ... */ }
}

4. Verify Improvement

Run debtmap again to confirm:

debtmap analyze . --lcov target/coverage/lcov.info

# Score dropped from 8.9 to 3.2 ✅
# Coverage increased from 0% to 85% ✅
# Complexity reduced from 22 to 8 ✅

When AI Gets Stuck: The Refactoring Blind Spot
#

Here’s a pattern I see constantly:

AI attempts to implement a feature in complex, tangled code
AI gets stuck - tests fail, bugs appear, implementation is incomplete
Developer: “Fix this bug” or “Make the test pass”
AI tries another patch, still fails
Developer: “Try a different approach”
AI tries yet another patch, fails again
Repeat until frustration

What neither AI nor developer suggests: Is this code too complex to modify safely? Which parts are the most complex? How can we refactor to reduce complexity first?

AI doesn’t proactively say “This code is too complex for me to modify safely. Let me refactor it first.” It just keeps trying to patch increasingly complex code. And developers, frustrated by AI’s failures, keep asking for more patches rather than stepping back.

The solution:

When AI gets stuck repeatedly on the same area:

Stop asking AI to fix symptoms - Don’t keep trying to patch the bug or force the test to pass
Run debtmap - Identify the complexity hotspots causing the problem
Refactor the problematic area first - Extract pure functions, add tests, separate concerns
Then retry the original task - AI will succeed where it failed before

Why this works:

Refactoring creates local reasoning boundaries
Each pure function is independently testable
Bugs become obvious in small, focused functions
AI can reason about 10-line functions but not 100-line tangles

When to refactor instead of patch:

AI has failed 3+ times on the same task
The file has high debtmap scores
Tests are brittle and break with small changes
You can’t easily explain what the code does
Changes in one area break unrelated functionality

Don’t let AI (or yourself) keep patching technical debt. Refactor first, then implement.

Apply the Same Principles to AI Usage
#

Here’s the meta-insight: The same principles that make code easier for AI to work with also apply to how you use AI.

Instead of: “AI, refactor my entire event system” (overwhelming, will fail)

Do:

# Break into small, independent tasks
# Task 1: Add tests for current behavior
# Task 2: Extract pure validation logic
# Task 3: Separate I/O from business logic
# Task 4: Wire refactored pieces together

# Orchestrate multiple AI instances if needed
# Each instance only needs local reasoning

This mirrors good code design:

Break macro tasks into small, composable pieces
Each piece is locally understandable
Orchestrate at a higher level
Verify each step before proceeding

The Root Cause: Complexity Management
#

Whether you’re:

Writing code for humans to maintain
Writing code for AI to work with
Asking AI to perform tasks
Designing systems

The fundamental principle is the same:

Manage complexity by breaking it into small, locally-understandable pieces with clear boundaries.

When people say “AI fails at complex tasks,” what they really mean is:

“AI fails when forced to hold too much context at once, just like humans do.”

The solution isn’t better AI—it’s better structure.

Practical Steps
#

If you’re experiencing “AI fails on my complex codebase”:

1. Audit Your Technical Debt

If you’re lucky enough to be working with a Rust code base, then you can try Debtmap. Otherwise, look for options in your language for cognitive complexity analysis.

# Install debtmap
cargo install debtmap

# Generate coverage
cargo llvm-cov --lcov --output-path target/coverage/lcov.info

# Identify hotspots
debtmap analyze . --lcov target/coverage/lcov.info

2. Fix the Top 3 Issues

Focus on files with:

High complexity scores
Low test coverage
Frequent changes/bugs

3. Add Test Coverage

Before refactoring, lock in behavior with tests.

4. Refactor for Encapsulation

Extract pure functions
Separate I/O from logic
Create clear boundaries
Reduce function size (target: <20 lines)

5. Verify Improvements

Run debtmap again. Scores should drop significantly.

6. Use AI on Improved Code

Now AI can work effectively because:

Each function is locally understandable
Tests provide guardrails
Clear boundaries prevent mistakes
Errors are localized and obvious

The Bigger Picture
#

The complaint “AI can’t handle complex domains” is actually revealing something important:

Your codebase has structural problems that AI is making visible.

AI struggles with the same things humans struggle with:

Leaky abstractions
Tangled responsibilities
Insufficient tests
Too much context required

The difference is AI fails faster and more obviously than humans who can power through with heroic effort.

Don’t blame AI for exposing your technical debt. Thank it for making the problems obvious, then fix them.

Conclusion
#

When AI “fails” at complex tasks in your codebase:

It’s not the domain - Event systems, ETL, and complex data can have manageable complexity
It’s the structure - Leaky abstractions and tangled code exceed cognitive limits
It’s technical debt - Accumulated shortcuts that need systematic refactoring
It’s fixable - Use tools like debtmap to identify and prioritize fixes
Refactor, don’t patch - When AI gets stuck, step back and reduce complexity first
It’s universal - The same complexity thresholds affect humans AND AI

Well-designed code keeps complexity below the threshold where both AI and engineers can work effectively. Poorly designed code exceeds that threshold.

The real shift in mindset:

When AI fails repeatedly, don’t ask “How can I make AI fix this?” Ask “Why is this code too complex for reliable modification?”

Reduce complexity to manageable levels, and AI works effectively—even on “complex” domains.

Meta-Example: Code Quality Analysis
#

You might ask: “Isn’t code quality analysis itself a complex domain?”

Absolutely yes:

AST parsing and traversal
Multiple complexity algorithms (cyclomatic, cognitive, entropy)
Pattern recognition across code structures
Multi-signal risk analysis (git history + coverage + complexity)
Call graph construction and resolution
Framework-specific pattern detection

But debtmap, despite analyzing complexity, maintains manageable complexity in its own codebase by:

Pure functions for algorithms - Complexity calculations separated from I/O
Clear module boundaries - Parsers, analyzers, reporters are independent
Comprehensive test coverage - Business logic is testable
Separation of concerns - Framework detection, pattern matching, scoring all isolated

The point isn’t that debtmap has “perfect” code - it’s that the complexity is managed to levels where both AI and engineers can:

Understand individual functions without system-wide knowledge
Modify behavior without breaking distant code
Add features with confidence
Debug issues by reasoning locally

This is the new standard: Can AI and human engineers work with this effectively?

If yes, complexity is managed. If no, use a tool like Debtmap to evaluate and prioritize, then refactor the worst parts.

Tools mentioned:

debtmap - Multi-signal technical debt analysis
cargo-llvm-cov - Test coverage for Rust

Related posts:

The Common Complaint#

Leaky Abstractions#

Why These Domains SEEM Harder#

What Good Structure Looks Like#

Event Handling: Another Common Culprit#

The Test: Local Reasoning#

Why AI (and Humans) Output Technical Debt#

How Technical Debt Snowballs#

How to Fix It Systematically#

Using Debtmap to Find What Matters#

Real Example: ETL and Event Code Analysis#

The Refactoring Process#

When AI Gets Stuck: The Refactoring Blind Spot#

Apply the Same Principles to AI Usage#

The Root Cause: Complexity Management#

Practical Steps#

The Bigger Picture#

Conclusion#

Meta-Example: Code Quality Analysis#

Related