The AI-Native Systems Maturity Model

Most AI strategies fail before they start. Not because the technology doesn't work. Because the organization isn't ready for it.

McKinsey's 2024 AI report found that while 72% of companies have adopted AI in at least one business function, fewer than 10% describe their AI implementations as having reached full scale. The rest are stuck — running pilots that never graduate, funding teams that produce dashboards nobody uses, and writing AI strategies that sound transformative in board decks but produce nothing in the market.

The diagnosis is almost always the same: organizations are trying to run before they can walk. They're attempting Level 4 work with Level 1 infrastructure. The result is expensive failure dressed up as innovation theater.

What's missing is a clear, honest map of where organizations actually sit — and what it genuinely takes to move up.


The Five Levels

Most digital maturity models are too generic to be useful. They measure intent, not capability. They reward organizations for having a strategy document rather than for having built anything that works.

This model is different. It measures what an organization has actually built, deployed, and scaled. Not what it plans to do. Not what it has piloted. What is running in production, generating value, and informing real decisions today.

The five levels — AI-Curious, AI-Augmented, AI-Integrated, AI-Native, and AI-First — describe meaningfully distinct organizational states. Each level has different infrastructure requirements, different talent needs, different regulatory exposures, and different competitive implications. You can't skip levels. Organizations that try to jump from Level 1 to Level 3 don't end up at Level 3. They end up at Level 1 with a larger consulting bill.

The key insight that separates this model from generic frameworks: the defining variable at each level is not technology adoption — it's organizational design. The question is not "do you use AI?" The question is "how much of your organization's core operating logic has been rebuilt around AI capabilities?"


Level 1: AI-Curious

Characteristics

This is where the majority of large enterprises and traditional financial institutions currently sit. The org has an AI strategy deck. It has attended the conferences. Someone in senior leadership has read the Andreessen Horowitz AI canon. There is genuine excitement — and genuine uncertainty about what to do next.

AI projects are typically owned by IT or a central innovation team, not by the business units who would actually use the outputs. Data is siloed across legacy systems. The organization has not made a serious investment in the data infrastructure that AI requires to function.

Typical Decisions

Decisions about AI at this level are largely driven by vendor pitches and peer benchmarking. "What is JP Morgan doing?" is a more common input to strategy than "what does our customer data tell us?" POCs are funded generously and evaluated loosely. Nobody has clearly defined what success looks like before starting.

Risks

The primary failure mode at Level 1 is hype-driven investment with no path to production. Organizations fund five AI pilots, get five interesting demos, and produce zero deployed systems. The data infrastructure needed to run AI at scale — clean, governed, real-time accessible data — doesn't exist, and nobody wants to fund the unglamorous work of building it. AI talent hired to "lead AI transformation" spends most of their time fighting with legacy IT systems instead of building models.

What Triggers the Move Up

Usually: one embarrassing failure, or one competitor deploying something that works. The failure has to be public enough — or expensive enough — that it forces a genuine reckoning with organizational readiness. Sometimes it's a regulatory push, like the OCC's increasing scrutiny of AI model risk, that forces the discipline Level 1 organizations resist.


Level 2: AI-Augmented

Characteristics

The Level 2 organization has at least one AI model in production. Not a pilot. Not a proof-of-concept. A model that runs daily, that real employees interact with, and that produces outputs that feed into actual decisions. AI is a tool that helps humans work faster or better. It has not yet changed how work is fundamentally structured.

This is where most mid-size banks, regional insurers, and established financial services firms currently sit. They have fraud detection models. They have credit scoring augmentation tools. They have document processing pipelines that use NLP to extract data from loan applications or insurance claims.

Typical Decisions

At this level, AI projects have clear business sponsors. ROI is measured, even if imprecisely. There is a real machine learning team — not just data scientists embedded in IT. The organization is starting to understand what data quality actually means in practice, because bad data is producing bad model outputs, and those bad outputs are causing real business problems.

Risks

The classic Level 2 failure is what practitioners call "pilot purgatory" — the organization successfully moves a model from POC to pilot, proves value, and then cannot get it to production scale because of data governance gaps, integration complexity with core systems, or organizational resistance from the teams whose work the model is meant to augment. The model works. The organization can't absorb it.

A second risk: Level 2 organizations tend to underinvest in model monitoring. A fraud model trained on 2022 data will degrade as fraud patterns evolve. Without systematic monitoring, model decay is invisible until it produces a visible, expensive failure.

What Triggers the Move Up

The transition to Level 3 happens when a workflow gets transformed, not just augmented. Not "the analyst can do their job 20% faster." But "the job itself has been fundamentally redesigned around what the AI can and can't do." That's a different kind of change — and it requires organizational willingness that most Level 2 organizations have not yet developed.


Level 3: AI-Integrated

Characteristics

At Level 3, AI is embedded in core workflows. Real-time inference at scale. Models running in production that directly shape customer-facing decisions, not just internal efficiency. The organization has moved from "AI helps employees" to "AI is part of the product."

JPMorgan's COiN platform — which analyzes commercial loan agreements and extracts data that used to require 360,000 hours of lawyer time annually — is a Level 3 deployment. Capital One's Eno assistant, which handles millions of customer interactions and proactively surfaces fraud alerts, is Level 3. These are not experiments. They are production systems that the business depends on.

Typical Decisions

At Level 3, AI model governance becomes a serious organizational function. For financial institutions, SR 11-7 — the Federal Reserve and OCC's model risk management guidance — is no longer a compliance checkbox. It is a real operational constraint that shapes how models are developed, validated, and monitored. Model inventories exist and are maintained. Validation functions are independent from development functions. Model performance is tracked systematically.

Risks

The primary risk at Level 3 is model concentration — the organization has made core business processes dependent on models that can fail in correlated ways. A single adversarial shift in the underlying data distribution can degrade multiple models simultaneously. The 2023 banking stress that followed Silicon Valley Bank's collapse exposed how model-dependent risk frameworks can produce synchronized failures.

Organizational resistance also intensifies at Level 3. When AI starts reshaping core workflows rather than assisting them, it directly threatens existing roles, incentives, and power structures. The middle management layer whose judgment AI is now partially replacing does not welcome that change.

What Triggers the Move Up

The jump to Level 4 happens when the organization starts designing new products and processes around AI capabilities — not bolting AI onto existing processes, but building things that would not exist without AI. That requires a fundamentally different product mindset. It is a strategic choice, not just a technical one.


Level 4: AI-Native

Characteristics

At Level 4, AI is the product. Not a feature of the product. Not an enhancement to the product. The AI model is the core value delivery mechanism, and the business model is built around it.

Upstart is the canonical example. Upstart's lending model — which uses over 1,600 variables to assess creditworthiness versus the traditional FICO-based approach — is not a feature of their lending business. It is their lending business. The model's ability to approve borrowers that traditional underwriting would reject, at lower default rates, is the entire competitive proposition. Without the model, there is no Upstart.

Kensho, acquired by S&P Global for $550 million in 2018, built analytics products that only exist because of machine learning capabilities. Palantir's AIP platform wraps large language models around enterprise data in ways that create entirely new analytical workflows — workflows that did not exist before the AI capability existed.

Typical Decisions

Level 4 organizations treat model development as core product development. AI talent is not a support function. It is a primary value-creation function, compensated and structured accordingly. Regulatory engagement is proactive — because the regulatory risk of a Level 4 business is existential, not incidental.

Risks

Model concentration risk at Level 4 is severe. If the model is the product, model failure is business failure. Upstart learned this painfully in 2022 when rising interest rates and economic uncertainty caused their credit models to significantly underperform, leading to a 90% stock price decline from its peak.

Regulatory scrutiny at Level 4 is qualitatively different from Levels 1-3. The CFPB's increasing focus on algorithmic lending decisions, the EU AI Act's high-risk classification for credit scoring systems, and the OCC's evolving guidance on model risk all create compliance exposures that require dedicated legal and regulatory strategy.

What Triggers the Move Up

The move to Level 5 — genuine AI-First — happens when the organization starts shaping the AI ecosystem rather than consuming it. They are training foundation models, not fine-tuning them. They are setting industry standards, not following them. They are the organization other Level 4 companies learn from. Almost no financial services company is here.


Level 5: AI-First

Characteristics

At Level 5, AI is the organizational design principle. Not a tool, not a product, not an embedded capability. The fundamental logic of how the organization is structured — how it hires, how it makes decisions, how it designs products, how it allocates resources — is built around AI capabilities.

Very few organizations are genuinely at Level 5. Anthropic, OpenAI, and DeepMind qualify as AI labs — but that is almost tautological. The more interesting question is whether any enterprise company has reached Level 5. The honest answer is: almost certainly not yet.

Companies like Google DeepMind are arguably approaching Level 5 within specific functions. But Google the corporation remains a Level 3-4 organization — AI-integrated and increasingly AI-native in search and advertising, but not organized around AI as a design principle across the whole enterprise.

Typical Decisions

At Level 5, the organization's hiring strategy is dictated by AI capability requirements. Its organizational structure is designed to maximize AI-human collaboration. Its product roadmap is driven by what new AI capabilities make newly possible — not by customer requests or competitive copying.

Risks

Level 5 organizations face regulatory exposure that does not yet have established frameworks. They are operating ahead of the law, which creates both competitive advantage and legal risk. The EU AI Act, the Trump administration's Executive Order 14179 on AI (January 2025), and emerging Basel Committee guidance on AI in banking are all aimed partly at organizations operating at this frontier.

The talent moat required is genuinely narrow. The number of people in the world who can design Level 5 AI systems is small. Retaining them against competition from other frontier labs and well-capitalized startups is an ongoing operational challenge.

Important note: For most enterprises — including most financial institutions reading this — Level 4 is the realistic and appropriate target. The goal is not to be OpenAI. The goal is to build AI-native products that generate durable competitive advantage. Chasing Level 5 before reaching Level 4 is the fastest path to spending a lot of money and building nothing.


The Self-Assessment Checklist

Use these diagnostic questions to identify your organization's current level. Answer honestly. The point is not to score well — it is to know where you actually are.

You're at Level 1 if:

  • [ ] AI projects are owned by IT or a central innovation team, not business units
  • [ ] You have run AI pilots in the past 24 months that never reached production
  • [ ] You do not have a formal data governance function
  • [ ] Your organization's core data lives in systems that require batch exports to be useful
  • [ ] "AI strategy" primarily means "we are watching this space closely"

You're at Level 2 if:

  • [ ] You have at least one AI model deployed in production, used daily by employees or in automated workflows
  • [ ] You have a dedicated ML engineering team (even if small)
  • [ ] You have experienced data quality failures that degraded model performance in production
  • [ ] AI projects have clear business sponsors with defined success metrics
  • [ ] You are tracking model performance over time, even informally

You're at Level 3 if:

  • [ ] AI is embedded in at least one customer-facing product or workflow
  • [ ] You maintain a formal model inventory with documented risk classifications
  • [ ] You have an independent model validation function
  • [ ] Real-time inference is running at meaningful scale (thousands of inferences per day minimum)
  • [ ] Organizational resistance to AI-driven workflow change has become a recognized internal challenge

You're at Level 4 if:

  • [ ] AI is the primary value delivery mechanism in at least one core product
  • [ ] Your business model includes revenue streams that would not exist without AI
  • [ ] Model risk is treated as a board-level concern, not just a compliance issue
  • [ ] You have a dedicated AI regulatory strategy function
  • [ ] Competitors cite your AI capabilities specifically when discussing your competitive position

You're at Level 5 if:

  • [ ] Your organizational structure — not just your products — is designed around AI capabilities
  • [ ] You are training or fine-tuning foundation models, not just deploying them
  • [ ] Other organizations at Level 4 use your infrastructure, standards, or research outputs
  • [ ] Your hiring strategy is primarily determined by AI capability requirements
  • [ ] You are actively shaping regulatory frameworks, not just responding to them

The Uncomfortable Truth About Level 3

Here is what most organizations do not realize until it is too late: the hardest transition in the entire maturity model is not Level 4 to Level 5. It is Level 2 to Level 3.

The jump from Level 2 to Level 3 is where the majority of AI transformation initiatives die. Not in the pilot stage. Not in the proof-of-concept stage. After the pilots have succeeded, after the business case has been proven, after the budget has been approved — and the organization still cannot get the model embedded into the core workflow.

Three forces cause this:

First, data infrastructure. Running a pilot is compatible with imperfect data. You can hand-curate a dataset, clean it manually, and produce impressive results. Scaling that model to production — with real-time data from multiple systems, with full audit trails, with governance that meets SR 11-7 requirements — exposes every shortcut that was taken to make the pilot work. The organization discovers that its data is not ready for production AI. Fixing that takes 12 to 24 months of unglamorous engineering work that nobody wants to fund.

Second, model governance. Level 3 requires a genuine model risk management function. Independent validation. Change management processes for model updates. Documentation that satisfies internal audit and external regulators. Most organizations at Level 2 have none of this in place. Building it is not technically hard. It is organizationally hard — it requires creating a new function, staffing it with qualified people, and giving it genuine authority to slow down or stop model deployments. Business units resist this because it slows them down. Risk functions resist this because it creates accountability.

Third, organizational resistance. When AI moves from "helping employees" to "changing how work is structured," it directly threatens existing roles, reporting lines, and power bases. Middle management — the layer most affected by AI-integrated workflows — is also the layer with the most institutional influence. Their resistance is rational from their perspective. Navigating it requires executive-level commitment that most organizations articulate in strategy documents but do not actually demonstrate when it becomes costly.

The organizations that successfully make this transition share one characteristic: an executive sponsor who is willing to absorb the political cost of forcing organizational change. The technology is the easy part. The org chart is the hard part.


Key Takeaways

  • Most enterprises are at Level 1 or Level 2. The percentage claiming Level 3+ readiness vastly exceeds the percentage that has actually built Level 3+ systems. Know which you actually are.
  • The five-level model measures what you have built and deployed — not what you intend to build. Strategies do not count. Production systems count.
  • The Level 2 to Level 3 transition is the critical bottleneck. Data infrastructure and model governance are the specific blockers. Budget for them explicitly or accept that you will stall here.
  • Level 4 — AI-Native — is the correct target for most financial services organizations. It delivers durable competitive advantage without the regulatory frontier risk of Level 5.
  • Model risk at Level 4 is existential, not incidental. Upstart's 2022 experience is the canonical case study in what model concentration risk looks like when rates move.
  • Organizational design is the variable that determines maturity level — not technology procurement. You cannot buy your way to Level 3. You have to build your way there.

Related Reading