The Common Complaint#
I keep seeing this pattern in discussions about AI coding assistants:
AI excels at bootstrapping new projects, able to generate scaffolding, write tests, debug simple issues, and power through generic code patterns. However, that same AI struggles significantly with codebases of higher complexity. As soon as event-driven architectures, ETL pipelines, or complex data layers enter the picture, the AI can’t handle the domain complexity. It can’t seem to operate at the “macro level” these systems require.
This observation is correct, but the diagnosis is often wrong.
In most codebases, the problem isn’t that AI can’t handle complex domains. The problem is cognitive load.
Well designed code manages complexity through proper structure. Poorly designed code requires understanding many distant parts at once. How we manage that complexity is the essence of code quality.
Leaky Abstractions#
Well-designed code abstracts domain complexity through proper boundaries and separation of concerns. When you’re working on any individual component, you shouldn’t need to hold the entire event flow or ETL pipeline in your head. If you (or an AI) are drowning in macro-level concerns while working on a single function, that’s not a domain problem, it’s a structural problem:
- Leaky abstractions
- Tangled responsibilities
- Poor module boundaries
- Mixed I/O and business logic
- Insufficient test coverage
This is technical debt that needs refactoring.
Why These Domains SEEM Harder#
Event systems, ETL processes, and complex data layers ARE inherently complex. The issue compounds:
These domains are also more LIKELY to be written poorly because they’re harder to structure well.
The inherent complexity makes developers (and AI) more likely to:
- Take shortcuts under time pressure
- Mix concerns that should be separated
- Let abstractions leak across boundaries
- Skip tests because “it’s too complex to test”
Then when AI struggles, it’s also the nearest scapegoat. If you’re an experienced dev trying AI to see where its limits are, it’s easy to mistakenly conclude that it can’t handle the domain complexity rather than recognizing the code quality issues at play.
What Good Structure Looks Like#
Note: Full runnable examples are available in the debtmap examples directory. The code below is simplified for illustration - see the linked examples for complete, compilable versions analyzed with debtmap.
Bad code (what you usually see):
// ETL function with everything tangled together
async fn process_user_data(user_id: i64) -> Result<()> {
// 150 lines of:
// - Database queries mixed with transformations
// - Side effects scattered throughout
// - Business logic intertwined with I/O
// - Mutation soup
// - Event publishing embedded in processing
// All in one giant function
}
When you or AI open this file, you need to understand:
- The database schema
- The transformation logic
- The event system
- The error handling strategy
- The entire data pipeline flow
Good code (rare but possible):
// Pure transformation pipeline - business logic only
fn transform_user_data(raw: RawUserData) -> Result<ProcessedUserData> {
raw.validate()
.and_then(normalize_fields)
.and_then(enrich_with_defaults)
.and_then(apply_business_rules)
}
// I/O at the edges - infrastructure concerns
async fn process_user_data(user_id: i64) -> Result<()> {
let raw = fetch_user_data(user_id).await?;
let processed = transform_user_data(raw)?;
save_processed_data(&processed).await?;
publish_user_updated_event(user_id).await?;
Ok(())
}
Now when you work on transform_user_data:
- No database knowledge needed
- No event system knowledge needed
- Just pure data transformation
- Easy to test
- Easy to reason about
The complexity hasn’t disappeared, it’s just been properly encapsulated.
Event Handling: Another Common Culprit#
Event-driven systems suffer from similar problems. Here’s the typical pattern:
Bad event handler (everything tangled):
// Event handler with infrastructure, business logic, and side effects mixed
async fn on_user_registered(event: Event) -> Result<()> {
// Deserialize event payload
let user_data: UserRegistered = serde_json::from_str(&event.payload)?;
// Fetch additional data
let user = db.get_user(user_data.id).await?;
let plan = db.get_plan(user.plan_id).await?;
// Business logic mixed with event concerns
if user.email_verified {
let welcome_email = format_welcome_email(&user, &plan);
email_service.send(welcome_email).await?;
// Publish more events (nested event handling)
event_bus.publish(Event {
topic: "email.sent",
payload: serde_json::to_string(&EmailSent {
user_id: user.id
})?,
}).await?;
}
// Update state
analytics.track("user_registered", user.id).await?;
// More events
if plan.is_trial {
event_bus.publish(Event {
topic: "trial.started",
payload: serde_json::to_string(&TrialStarted {
user_id: user.id,
expires_at: Utc::now() + Duration::days(14),
})?,
}).await?;
}
Ok(())
}
When you work on this code, you need to understand:
- The event bus serialization format
- Email service API
- Analytics tracking system
- Database schema and relationships
- Trial business logic
- All downstream event handlers
Good event handler (clean separation):
// Pure domain logic - no event infrastructure
fn handle_user_registered(user: User, plan: Plan) -> Vec<DomainEvent> {
let mut events = vec![];
if user.email_verified {
events.push(DomainEvent::SendWelcomeEmail {
user_id: user.id,
plan_type: plan.plan_type,
});
}
events.push(DomainEvent::TrackRegistration {
user_id: user.id,
});
if plan.is_trial {
events.push(DomainEvent::StartTrial {
user_id: user.id,
expires_at: Utc::now() + Duration::days(14),
});
}
events
}
// Infrastructure wrapper - events and I/O only
async fn on_user_registered(event: Event) -> Result<()> {
let payload: UserRegistered = deserialize_event(&event)?;
let user = fetch_user(payload.user_id).await?;
let plan = fetch_plan(user.plan_id).await?;
let domain_events = handle_user_registered(user, plan);
publish_domain_events(domain_events).await
}
Now when you work on handle_user_registered:
- No event bus knowledge needed
- No database or API knowledge needed
- Pure business logic
- Easy to test with plain structs
- Clear what events will be produced
Again, the complexity is managed through proper boundaries.
The Test: Local Reasoning#
Here’s the litmus test for good architecture:
Can you understand and modify any individual function without understanding the entire system?
If the answer is no, you have structural problems, not domain complexity.
Good architecture creates:
- Local reasoning - Any given function only requires understanding that piece
- Enforced correctness - Types and interfaces make it hard to do the wrong thing (the “pit of success”)
- Clear failures - When something breaks, the error is obvious and localized
Why AI (and Humans) Output Technical Debt#
This isn’t just an AI problem. Humans intentionally ship technical debt all the time, for good reasons:
Why developers take shortcuts:
- Time pressure to ship features
- “We’ll refactor it later” (narrator: they won’t)
- Unclear requirements make proper abstraction difficult
- Premature optimization avoidance taken too far
- Lack of understanding of the domain initially
Why AI does the same:
- Training data includes “quick working code” probably more so than “well-factored production code”
- Reward signals likely weight “it works” more heavily than “it’s maintainable”
- Context limitations make refactoring harder
- Users accept the first working solution rather than asking for better
- Optimization for speed mirrors human behavior under time pressure
- Tendency to continue existing patterns such as technical debt in codebases
The difference is: Humans understand the tradeoff. AI just mimics the pattern.
How Technical Debt Snowballs#
Here’s the vicious cycle:
- Initial shortcut - “claude make it work” and then ships quickly with tangled code
- Context growth - Each new feature needs more system knowledge
- Cognitive overload - Developers (and AI) struggle with mounting complexity
- More shortcuts - Under pressure, take more shortcuts to cope
- Degraded structure - Abstractions break down further
- Compound effect - Each change makes the next change harder
- AI failure - AI can’t hold entire system context, makes mistakes
- Blame the tool - “AI can’t handle complex domains”
The real problem was step 1, but we only hear about step 7.
How to Fix It Systematically#
You can’t just tell AI “write better code.” Well, you can, but this isn’t generally effective from my testing. You need to:
- Identify the technical debt hotspots
- Prioritize what to fix first
- Refactor to encapsulate complexity
- Add test coverage as guardrails
- Use AI effectively on the improved codebase
Using Debtmap to Find What Matters#
I built debtmap to solve exactly this problem. It uses multi-signal analysis to identify where to focus efforts in order to manage technical debt as a projects grow:
# Generate coverage report
cargo llvm-cov --lcov --output-path target/coverage/lcov.info
# Cross-reference complexity with coverage and context providers such as git history
debtmap analyze . --lcov target/coverage/lcov.info --context
What makes debtmap different:
- Multi-signal risk scoring - Combines complexity, test coverage, purity analysis, git history, and bug frequency
- Pattern recognition - Distinguishes real complexity from simple repetitive code with information theory
- Actionable priorities - “Fix this file first, here’s why, here’s how”
- Coverage integration - Shows where complex code lacks test protection making it higher risk
Example output:
#1 SCORE: 8.9 [CRITICAL]
├─ TEST GAP: ./src/event_handler.rs:38 handle_user_event()
├─ COMPLEXITY: cyclomatic=15, cognitive=22
├─ COVERAGE: 0% (expected: 90% for Business Logic)
├─ GIT HISTORY: 23 changes, 8 bug fixes
├─ ACTION: Add tests, then extract pure functions
└─ WHY: High complexity + no tests + frequent bugs = highest risk
This tells you:
- What to fix (handle_user_event function)
- Why it matters (complexity + no coverage + bug history)
- How to fix it (add tests, extract logic)
- Impact if you fix it (reduce risk score by 3.7)
Real Example: ETL and Event Code Analysis#
I’ve created runnable examples that demonstrate exactly what we’re talking about. Let’s see what debtmap finds:
ETL Bad Example (etl_bad.rs):
#1 SCORE: 7.5 [HIGH]
├─ COMPLEXITY HOTSPOT: etl_bad.rs:42 process_user_data()
├─ COMPLEXITY: cyclomatic=25, cognitive=48
├─ COVERAGE: 0%
├─ ACTION: Split into 4 focused functions by decision clusters
└─ WHY: Multiple nested conditionals with mixed I/O and business logic
ETL Good Example (etl_good.rs):
No high-priority issues detected.
Pure functions have low complexity scores.
Test coverage: 85% on business logic functions.
Event Handler Bad Example (events_bad.rs):
#1 SCORE: 6.8 [MEDIUM]
├─ COMPLEXITY HOTSPOT: events_bad.rs:95 on_user_registered()
├─ COMPLEXITY: cyclomatic=18, cognitive=42
├─ COVERAGE: 0%
├─ ACTION: Reduce complexity from 42 to ~15
└─ WHY: Deeply nested event handling with multiple decision paths
Event Handler Good Example (events_good.rs):
No high-priority issues detected.
Domain logic functions have low cognitive complexity.
Test coverage: 100% on handle_user_registered().
The difference is stark:
- Bad examples: HIGH/MEDIUM severity, complexity in the 40s, 0% coverage
- Good examples: No issues, low complexity, high test coverage
You can clone the examples directory and run debtmap yourself to see the full analysis.
The Refactoring Process#
Once you’ve identified hotspots:
1. Add Test Coverage First
Before refactoring, add tests that lock in current behavior:
#[test]
fn test_user_event_handling_current_behavior() {
// Even if code is messy, test what it does now
let event = UserEvent::new(123, "login");
let result = handle_user_event(event);
assert!(result.is_ok());
}
2. Extract Pure Functions
Separate business logic from I/O:
// Before: everything tangled
async fn handle_user_event(event: UserEvent) -> Result<()> {
let user = db.fetch_user(event.user_id).await?;
let processed = transform_and_validate(&user, &event)?;
db.save(processed).await?;
event_bus.publish(processed).await?;
Ok(())
}
// After: pure logic extracted
fn process_event_logic(user: &User, event: &UserEvent) -> Result<ProcessedEvent> {
validate_event(event)?
.transform_with_user(user)
.apply_business_rules()
}
// I/O wrapper stays simple
async fn handle_user_event(event: UserEvent) -> Result<()> {
let user = fetch_user(event.user_id).await?;
let processed = process_event_logic(&user, &event)?;
persist_and_publish(processed).await
}
3. Create Clear Boundaries
Use types to enforce correct usage:
// Type states prevent misuse
struct UnvalidatedEvent { /* ... */ }
struct ValidatedEvent { /* ... */ }
struct ProcessedEvent { /* ... */ }
impl UnvalidatedEvent {
fn validate(self) -> Result<ValidatedEvent> { /* ... */ }
}
impl ValidatedEvent {
fn process(self, user: &User) -> ProcessedEvent { /* ... */ }
}
4. Verify Improvement
Run debtmap again to confirm:
debtmap analyze . --lcov target/coverage/lcov.info
# Score dropped from 8.9 to 3.2 ✅
# Coverage increased from 0% to 85% ✅
# Complexity reduced from 22 to 8 ✅
When AI Gets Stuck: The Refactoring Blind Spot#
Here’s a pattern I see constantly:
- AI attempts to implement a feature in complex, tangled code
- AI gets stuck - tests fail, bugs appear, implementation is incomplete
- Developer: “Fix this bug” or “Make the test pass”
- AI tries another patch, still fails
- Developer: “Try a different approach”
- AI tries yet another patch, fails again
- Repeat until frustration
What neither AI nor developer suggests: Is this code too complex to modify safely? Which parts are the most complex? How can we refactor to reduce complexity first?
AI doesn’t proactively say “This code is too complex for me to modify safely. Let me refactor it first.” It just keeps trying to patch increasingly complex code. And developers, frustrated by AI’s failures, keep asking for more patches rather than stepping back.
The solution:
When AI gets stuck repeatedly on the same area:
- Stop asking AI to fix symptoms - Don’t keep trying to patch the bug or force the test to pass
- Run debtmap - Identify the complexity hotspots causing the problem
- Refactor the problematic area first - Extract pure functions, add tests, separate concerns
- Then retry the original task - AI will succeed where it failed before
Why this works:
- Refactoring creates local reasoning boundaries
- Each pure function is independently testable
- Bugs become obvious in small, focused functions
- AI can reason about 10-line functions but not 100-line tangles
When to refactor instead of patch:
- AI has failed 3+ times on the same task
- The file has high debtmap scores
- Tests are brittle and break with small changes
- You can’t easily explain what the code does
- Changes in one area break unrelated functionality
Don’t let AI (or yourself) keep patching technical debt. Refactor first, then implement.
Apply the Same Principles to AI Usage#
Here’s the meta-insight: The same principles that make code easier for AI to work with also apply to how you use AI.
Instead of: “AI, refactor my entire event system” (overwhelming, will fail)
Do:
# Break into small, independent tasks
# Task 1: Add tests for current behavior
# Task 2: Extract pure validation logic
# Task 3: Separate I/O from business logic
# Task 4: Wire refactored pieces together
# Orchestrate multiple AI instances if needed
# Each instance only needs local reasoning
This mirrors good code design:
- Break macro tasks into small, composable pieces
- Each piece is locally understandable
- Orchestrate at a higher level
- Verify each step before proceeding
The Root Cause: Complexity Management#
Whether you’re:
- Writing code for humans to maintain
- Writing code for AI to work with
- Asking AI to perform tasks
- Designing systems
The fundamental principle is the same:
Manage complexity by breaking it into small, locally-understandable pieces with clear boundaries.
When people say “AI fails at complex tasks,” what they really mean is:
“AI fails when forced to hold too much context at once, just like humans do.”
The solution isn’t better AI—it’s better structure.
Practical Steps#
If you’re experiencing “AI fails on my complex codebase”:
1. Audit Your Technical Debt
If you’re lucky enough to be working with a Rust code base, then you can try Debtmap. Otherwise, look for options in your language for cognitive complexity analysis.
# Install debtmap
cargo install debtmap
# Generate coverage
cargo llvm-cov --lcov --output-path target/coverage/lcov.info
# Identify hotspots
debtmap analyze . --lcov target/coverage/lcov.info
2. Fix the Top 3 Issues
Focus on files with:
- High complexity scores
- Low test coverage
- Frequent changes/bugs
3. Add Test Coverage
Before refactoring, lock in behavior with tests.
4. Refactor for Encapsulation
- Extract pure functions
- Separate I/O from logic
- Create clear boundaries
- Reduce function size (target: <20 lines)
5. Verify Improvements
Run debtmap again. Scores should drop significantly.
6. Use AI on Improved Code
Now AI can work effectively because:
- Each function is locally understandable
- Tests provide guardrails
- Clear boundaries prevent mistakes
- Errors are localized and obvious
The Bigger Picture#
The complaint “AI can’t handle complex domains” is actually revealing something important:
Your codebase has structural problems that AI is making visible.
AI struggles with the same things humans struggle with:
- Leaky abstractions
- Tangled responsibilities
- Insufficient tests
- Too much context required
The difference is AI fails faster and more obviously than humans who can power through with heroic effort.
Don’t blame AI for exposing your technical debt. Thank it for making the problems obvious, then fix them.
Conclusion#
When AI “fails” at complex tasks in your codebase:
- It’s not the domain - Event systems, ETL, and complex data can have manageable complexity
- It’s the structure - Leaky abstractions and tangled code exceed cognitive limits
- It’s technical debt - Accumulated shortcuts that need systematic refactoring
- It’s fixable - Use tools like debtmap to identify and prioritize fixes
- Refactor, don’t patch - When AI gets stuck, step back and reduce complexity first
- It’s universal - The same complexity thresholds affect humans AND AI
Well-designed code keeps complexity below the threshold where both AI and engineers can work effectively. Poorly designed code exceeds that threshold.
The real shift in mindset:
When AI fails repeatedly, don’t ask “How can I make AI fix this?” Ask “Why is this code too complex for reliable modification?”
Reduce complexity to manageable levels, and AI works effectively—even on “complex” domains.
Meta-Example: Code Quality Analysis#
You might ask: “Isn’t code quality analysis itself a complex domain?”
Absolutely yes:
- AST parsing and traversal
- Multiple complexity algorithms (cyclomatic, cognitive, entropy)
- Pattern recognition across code structures
- Multi-signal risk analysis (git history + coverage + complexity)
- Call graph construction and resolution
- Framework-specific pattern detection
But debtmap, despite analyzing complexity, maintains manageable complexity in its own codebase by:
- Pure functions for algorithms - Complexity calculations separated from I/O
- Clear module boundaries - Parsers, analyzers, reporters are independent
- Comprehensive test coverage - Business logic is testable
- Separation of concerns - Framework detection, pattern matching, scoring all isolated
The point isn’t that debtmap has “perfect” code - it’s that the complexity is managed to levels where both AI and engineers can:
- Understand individual functions without system-wide knowledge
- Modify behavior without breaking distant code
- Add features with confidence
- Debug issues by reasoning locally
This is the new standard: Can AI and human engineers work with this effectively?
If yes, complexity is managed. If no, use a tool like Debtmap to evaluate and prioritize, then refactor the worst parts.
Tools mentioned:
- debtmap - Multi-signal technical debt analysis
- cargo-llvm-cov - Test coverage for Rust
Related posts: