Why Claude Code, GitHub Copilot, and other agentic coding models struggle with large mainframe codebases—and how Cortex solves it with knowledge graphs, dependency mapping, and semantic search.
Agentic coding models like Claude Code and GitHub Copilot are transforming how modern software is written. They can scaffold React components, debug Python microservices, and generate entire API layers from a natural language prompt. But point them at a legacy mainframe codebase—hundreds of Natural programs, ADABAS databases, JCL batch schedules, and decades of accumulated business logic—and they hit a wall.
This post explains why that wall exists, how Cortex—an intelligent knowledge layer for legacy codebases—tears it down, and what it looks like in practice when an AI agent can finally understand a mainframe system from top to bottom.
When an agentic coding model encounters a modern codebase, it benefits from an ecosystem of signals: package manifests, type annotations, well-known frameworks, clear directory conventions, README files, and a vast training corpus of similar code on GitHub. A modern TypeScript repository is practically self-documenting by comparison.
Mainframe codebases offer none of these luxuries. Here's what AI models actually face:
Component names like SUBN0302, SUBP0405, and SUBG0003 carry no semantic meaning. An AI model looking at a program called SUBN0301 has no way to infer that it's a contribution tax calculator implementing 15% concessional tax, Division 293 assessments, and no-TFN penalties. The naming convention (SUB + type prefix + numeric ID) encodes structural information, but zero business context.
In modern code, import statements make dependencies explicit. In Natural/ADABAS systems, dependencies are buried in CALLNAT statements, PERFORM blocks, global data area references, and shared database files. Understanding that changing SUBN0302 (the cap monitor) affects the downstream TBAR-EVENT table, which feeds into SUBP0405 (the annual TBAR reporting batch), which ultimately generates XML for the Australian Tax Office—requires tracing a chain that spans programs, subprograms, DDMs, and batch jobs.
Australian superannuation legislation is dense: contribution caps, bring-forward rules, Transfer Balance Caps, preservation age calculations, tax component splits, minimum pension drawdowns. These rules aren't abstracted into configuration—they're hardcoded across hundreds of subprograms. An AI model that can't connect SUBG0003 (a global data area storing system constants like #CONCESSIONAL-CAP = 30000) to the six different subprograms that reference it will generate code that silently violates regulatory requirements.
Software AG Natural is a niche 4GL. There's virtually no Natural code on GitHub, no Stack Overflow threads explaining ADABAS DDM patterns, no open-source reference implementations. AI models trained on billions of lines of Python, Java, and JavaScript have near-zero exposure to Natural syntax, and no way to learn your system's specific conventions.
Cortex is a purpose-built knowledge layer that sits between a mainframe codebase and the AI models that need to understand it. Just as the cerebral cortex is the brain's centre for higher-order thinking, Cortex provides the intelligence layer that transforms raw legacy code into structured, navigable knowledge. It pre-analyses the entire codebase, builds a graph of relationships, enriches it with business context, and exposes it through a set of MCP (Model Context Protocol) tools that any AI agent can call.
Cortex provides three core capabilities:
Every component in the system is mapped to its business domain, capability, priority level, and business owner. Instead of seeing SUBN0302, an AI agent sees: "Contribution cap calculation ($30K CC / $120K NCC with bring-forward), Accumulation domain, CRITICAL priority, owned by Operations Manager."
A Neo4j-backed graph database maps every CALLS, USES_GLOBAL, USES_DDM, READ, WRITE, and CALLED_BY relationship across all 358 components. An agent can ask "what's the impact of changing SUBN0302?" and instantly get a risk-scored dependency tree spanning three call-depth levels.
Natural language queries like "death benefit processing and beneficiary payment" return ranked, relevant components—even though no component is named anything close to those words. The AI agent can explore the codebase by intent, not by filename.
Cortex exposes its intelligence through nine MCP server tools. Each tool answers a specific class of question an AI agent might have while working with the codebase:
| Tool | Purpose | Example Query |
|---|---|---|
get_platform_overview |
System-wide orientation—all 10 domains, 61 capabilities, 358 components at a glance | "Give me a platform overview" |
get_domain_summary |
Drill into a domain's capabilities, entry points, and ownership | "Tell me about the Accumulation domain" |
get_capability_summary |
Component breakdown for a specific business capability | "What does Contribution Processing do?" |
explain_component |
Business context, SME annotations, and metadata for one component | "Explain SUBN0302" |
get_call_tree |
Upstream/downstream call hierarchy with depth control | "Show call tree for SUBP0301" |
analyze_impact |
Risk-scored dependency analysis with ADABAS operation tracking | "What's the impact of changing SUBN0301?" |
trace_data_flow |
ADABAS READ/STORE/UPDATE operations across the dependency chain | "Trace data flow for SUBN0301" |
search_capability |
Semantic search across all capabilities by natural language | "death benefit processing and beneficiary payment" |
code_search |
Pattern search across source: PERFORMs, comments, literals, fields | "*CONTRIB*" or "VALIDATE-*" |
To illustrate the power of Cortex, let's walk through an actual demo session where an AI agent (Claude, connected to the Cortex MCP server) explores an Australian superannuation fund administration system for the first time. The system is a purpose-built demo codebase: 10 domains, 61 capabilities, 358 components, built on Software AG Natural with ADABAS.
The agent's first move is to call get_platform_overview. In a single response, it receives a complete map of the system:
Within seconds, the agent knows the Accumulation domain is the largest (14 capabilities, 99 components), that the system follows a clear member lifecycle, and that support functions like batch utilities and reporting underpin everything. Without Cortex, discovering this structure would require manually reading hundreds of source files.
The agent calls get_domain_summary("Accumulation") and immediately gets a prioritised capability table with entry points and business owners:
The agent now knows exactly where to focus. Contribution Processing is the largest capability with 21 components, it's CRITICAL priority, and it's owned by the Operations Manager. This is the kind of organisational intelligence that simply doesn't exist in the source code.
Combining get_capability_summary, explain_component, and get_call_tree, the agent maps the complete execution flow:
From a single capability query, the agent has reconstructed the complete processing pipeline: how contributions enter the system via SuperStream, get taxed (15% concessional, Division 293, or 47% no-TFN penalty), are monitored against annual caps ($30K concessional / $120K non-concessional with three-year bring-forward), allocated to investment holdings at the current unit buy price, and reported to the ATO via MATS.
More importantly, the agent now knows the business rules embedded in each component. For example, from the demo's deep dive into SUBN0302:
The most powerful Cortex capability for safe code changes. The agent calls analyze_impact on the contribution tax calculator:
In the demo, the agent discovered that SUBN0302 (the cap monitor) generates TBAR-EVENT records when contribution caps are breached. It then followed that trail across domains:
This is the kind of cross-domain traceability that would take a human analyst hours or days to piece together by reading source code. The agent did it in seconds by chaining three Cortex tool calls. It discovered the complete regulatory reporting pipeline: from a contribution cap breach in the Accumulation domain, through event generation, into the annual TBAR batch in Pre-Retirement, and out to the ATO as structured XML.
When the agent needs to find components related to a business concept, semantic search cuts through the cryptic naming:
No human had to tag these components with the words "death benefit" and "beneficiary payment." Cortex's semantic layer understands the business meaning of each component and returns ranked results that let the agent immediately find the right starting points.
With Cortex connected, an AI coding agent goes from "I can see 358 files with cryptic names" to a developer experience that's arguably better than what most human developers have. Here's what changes:
Before modifying any component, the agent runs analyze_impact and gets a risk-rated dependency graph. It knows that changing SUBN0302's cap logic will affect the TBAR reporting pipeline and requires testing across both the Accumulation and Pre-Retirement domains. It knows which ADABAS tables are read-shared by other components and avoids breaking those contracts.
The knowledge base includes regulatory tagging—which capabilities are governed by ATO, APRA, ASIC, or AUSTRAC requirements. When the agent generates code that touches MATS reporting or TBAR events, it knows it's working in a regulated context and can flag compliance implications.
A new developer (human or AI) can go from zero to productive in minutes instead of weeks. The platform overview provides instant orientation. Domain summaries give functional context. Capability summaries reveal component roles. Call trees show execution flow. Data flow tracing shows exactly which tables are read and written. The entire system becomes navigable by business intent.
Mainframe systems are notorious for hidden cross-domain dependencies. Cortex makes them explicit. The demo showed a trail from contribution processing through cap monitoring, TBAR event generation, and all the way to ATO XML submission—spanning two domains and six components. An AI agent can follow these trails automatically, ensuring that changes in one domain don't silently break another.
Cortex isn't about replacing mainframe systems. It's about making them accessible to the next generation of development tools. As AI coding agents become standard practice, the organisations that benefit most will be those that can bridge the gap between their existing mainframe investments and the capabilities of modern AI.
The superannuation system in this demo is a purpose-built codebase—358 components across 10 domains—designed to showcase the framework's capabilities. But consider what happens when Cortex is applied to a production mainframe environment: systems with hundreds of thousands of components, decades of accumulated business logic, teams of developers who each understand only a fraction of the whole, and regulatory obligations where a single overlooked dependency can trigger compliance breaches worth millions. At that scale, the value of structured knowledge, automated impact analysis, and semantic search doesn't just grow linearly—it compounds. The problems that are manageable (if painful) in a 358-component demo become genuinely unsolvable without a knowledge layer in a 200,000-component production system.
Cortex ensures that when an AI agent touches a legacy system, it does so with full context: knowing what it's changing, what depends on it, what regulations apply, and what could break. That's not just a productivity improvement—it's a safety net for legacy modernisation at enterprise scale.