Sphere Partners

TL;DR

Every frontier AI model produces an impressive demo. What determines deployability in a regulated organisation is everything around the model: content policy enforcement at the message layer (not via instructions), per-team policy configuration, EU AI Act registry and classification, BYOK model flexibility, specific security threat detection, CSRD carbon tracking, structured persistent memory, and a full per-message audit log with adequate retention. Only 18% of enterprises have formal AI risk management processes despite 63% processing sensitive data through AI without governance controls. Ask all eight questions before you sign anything.

18%

Enterprises with formal AI risk management processes — despite 78% citing governance as a top priority (McKinsey, 2024)

63%

Organisations processing sensitive data through AI without adequate governance controls (BlackFog, January 2026)

Aug 2026

EU AI Act full enforcement deadline — your platform must support AI system registry, risk classification, and conformity documentation by this date

8 Qs

Questions that reveal governance architecture, security depth, compliance tooling, memory capability, and audit quality — the criteria demos never show

The conversation about enterprise AI platforms has been dominated by model benchmarks, UI design, and integration lists. For organisations operating in regulated industries, these criteria are secondary. A platform that scores perfectly on response quality and fails on governance architecture is not deployable — it is a compliance liability in a polished interface.

Most enterprise AI buying processes discover this after deployment. The demo impresses, the procurement team approves, the platform launches to 250 users — and three months later the legal team discovers there is no mechanism to block GDPR-violating prompts, no per-team policy differentiation, and no audit trail that meets their retention obligations.

These eight questions are designed to surface those gaps before you sign. They are specifically calibrated to expose the difference between a platform that claims governance capability and one that has built it at the architecture level.

Analyst Perspective

"Enterprise AI platform evaluation has matured significantly. The first wave of buyers evaluated on model quality and feature breadth. The current wave — particularly in regulated industries — evaluates on governance infrastructure, compliance architecture, and auditability. Vendors that built the governance layer as an afterthought are losing enterprise deals to those that built it into the foundation."

— Gartner Research, "Market Guide for AI Trust, Risk and Security Management," 2024. The guide identified governance-first platform architecture as the primary differentiator in regulated enterprise AI procurement from 2025 onward.

Why Standard Evaluation Criteria Fail for Regulated Enterprises

Standard software evaluation criteria — feature checklists, integration lists, UI/UX quality, vendor financial stability, support SLAs — are necessary but insufficient for enterprise AI in regulated industries. They do not reveal the architectural decisions that determine compliance fitness.

Consider two platforms with identical feature checklists: both claim "content policy management." One enforces policies at the message layer, synchronously, before tokens reach any model. The other stores policy rules in a document library that employees are expected to consult before submitting prompts. Both check the same box. One prevents violations. The other records them.

The eight questions below are designed to expose these architectural differences — to force vendors to describe exactly how each capability works, not just confirm that it exists. For the foundational distinction between documentation compliance and real-time enforcement, see our analysis of AI governance vs AI compliance.

Question 1: Where Does Content Policy Enforcement Happen?

The correct answer: at the message layer, before the query reaches any language model, in real time, with the enforcement decision logged.

Red Flag Answers

"Our AI is instructed to follow content policies" — model instructions are not enforcement; they can be circumvented with well-crafted prompts
"The model was safety-trained to avoid non-compliant outputs" — safety training reduces but does not eliminate policy violations; it fails against determined adversarial prompts
"Our compliance team reviews conversations" — post-hoc review documents violations after they occur; it does not prevent them
"Users agree to acceptable use policies at sign-up" — a terms-of-service agreement does not intercept a GDPR-violating prompt at the inference layer

Ask: "Show me what happens when I submit a message containing a credit card number. Where exactly in the pipeline does it get intercepted, and can you show me the enforcement log entry it generates?"

Question 2: Can Content Policies Be Applied Per Team?

The correct answer: yes — policies are configurable at the group or team level, not only platform-wide, with different policies for different regulatory contexts.

Platform-wide uniform policies are governance theatre. FINRA's retail communications standard applies to your trading desk, not your engineering team. HIPAA PHI restrictions apply to your HR team processing health data, not to your marketing team drafting copy. A governance system that cannot distinguish between teams forces a choice between policies too restrictive for productivity or too permissive for compliance — both outcomes represent failure.

Ask: "How do I configure different content policies for my legal team versus my sales team? Walk me through the configuration, not the concept."

Question 3: What Is Your EU AI Act Compliance Architecture?

Any platform marketing to EU enterprises in 2026 should have a complete answer. The architecture should include:

An AI system registry built into the platform — not a document template you fill in elsewhere
A risk classification workflow — a guided process that produces a defensible tier assignment, not a form
LLM discovery capability — automated identification of AI systems in use, including shadow AI
Conformity documentation generation for High-Risk systems — exportable in formats suitable for regulator submission
GPAI coverage for foundation model usage obligations under Articles 51–56
A regulator-ready export function — CSV or JSON, not a PDF you have to manually format

A platform that offers "EU AI Act policy templates" without a registry and classification workflow is treating the compliance obligation as a content problem. It is addressing what to block without addressing what to document and classify. For a guide to what the classification workflow must produce, see our EU AI Act risk classification guide.

Question 4: Which LLMs Can I Use, and Who Controls That Choice?

The correct answer: any LLM — the platform is model-agnostic, with BYOK, and your organisation controls which models each team can access.

Your AI strategy should not be determined by a vendor's commercial relationships with LLM providers. The platform layer — governance, knowledge retrieval, memory, security, cost controls — should be completely decoupled from the inference layer. New models emerge every quarter. Your governance infrastructure and organisational knowledge should not require rebuilding when you switch or add models.

Ask: "If I want to add a new model that was released last week, what does that process look like? Can I bring my own Anthropic keys and use Claude without routing through your infrastructure?" For a full analysis of multi-model architecture considerations, see our multi-model and BYOK enterprise AI guide.

Question 5: What Security Threats Does the Platform Detect?

Do not accept a qualitative answer. Ask for specifics: which threat categories, how many patterns, what detection mechanism, and at what latency.

A production enterprise AI security layer should detect at minimum: prompt injection (direct and indirect), jailbreak attempts, system prompt extraction, data exfiltration patterns, and role hijacking. Detection should be deterministic rather than ML-based — deterministic rules are auditable, reproducible, and reliable; ML-based detection introduces false negative rates that are difficult to quantify for compliance purposes.

Ask: "How many distinct threat patterns do you detect? Can I see the threat category list? What is your false negative rate on standard OWASP LLM Top 10 attacks?" For background on what these threats are and how they work, see our guide to prompt injection and enterprise AI security threats.

Question 6: How Do You Track and Attribute AI Carbon Emissions?

For any EU company with 250 or more employees, CSRD ESRS E1 requires Scope 3 Category 11 disclosure of AI tool emissions. The answer should include per-token carbon tracking per model, per team, with automated ESRS E1 export. If the vendor cannot provide this, your sustainability team will spend €5,000–€20,000 estimating AI emissions manually each reporting cycle.

Ask: "Show me what the ESRS E1 export looks like. What is the methodology for the CO₂e calculation, and how does the audit trail support third-party assurance?" For the full guide to what this reporting requires, see our article on CSRD AI emissions reporting.

Question 7: How Does the Platform Handle Organisational Memory?

Conversation history is not memory. Ask specifically about persistent structured memory:

Is memory session-scoped or persistent across sessions and users?
Is memory structured — classified by type with a maturity lifecycle — or unstructured text?
Does memory survive when a user account is deactivated?
Can memory be scoped at the team or organisation level, not just user level?
Is there an admin interface for reviewing, correcting, and removing memories?

An AI platform without persistent cross-user memory accumulates no institutional knowledge over time — it is permanently stateless. The context tax it imposes costs approximately 50 hours per employee per year in re-establishment overhead. For the full analysis of the context tax and what persistent memory delivers, see our article on the AI context tax.

Question 8: What Does the Audit Log Capture, and How Long Is It Retained?

The audit log is a compliance artefact — not a debugging tool. For regulated industries it needs to capture specific fields and retain them for legally prescribed periods.

A compliant audit log must record: every message submitted with user ID, session ID, timestamp, and model used; every content policy evaluation with policy name, matched pattern, and action taken; every PII detection event with detection type but not the actual data; every security threat detected with category and severity; and token consumption per message for cost attribution and CSRD carbon calculation.

Retention must match regulatory obligations. Financial services firms face 5–7 year retention requirements under FINRA Rule 17a-4 and MiFID II Article 16. Healthcare organisations face longer obligations in some jurisdictions. A platform with a 90-day rolling audit log is not deployable in these contexts regardless of its other capabilities.

Ask: "What is your maximum audit log retention period? Is the log exportable in a format suitable for regulatory submission? Can I search the audit log by policy name, user, or date range?" For a complete guide to what audit logs must contain as compliance evidence, see our article on AI audit logs as compliance evidence.

The Evaluation Principle

The quality of AI responses you see in a demo is the easy part. Every frontier model produces impressive responses to well-chosen prompts. What determines whether a platform is deployable in your organisation is everything surrounding the model: governance architecture, security infrastructure, compliance tooling, memory capability, cost controls, and audit functionality. The demo reveals model quality. The eight questions reveal platform quality. Evaluate the platform.

How Govarix Answers These Eight Questions

Question	Govarix Answer	Evidence Available
Content policy enforcement location	Message layer, pre-inference, synchronous, logged per message	Live demo, audit log export
Per-team policy support	Yes — 40+ pre-built templates, per-group assignment, real-time	Admin console walkthrough
EU AI Act compliance architecture	Built-in registry, 15-question wizard, 5 risk tiers, LLM discovery, conformity export	Registry demo, export sample
LLM flexibility and BYOK	Any LLM — fully model-agnostic, BYOK, per-team model access controls	Multi-provider configuration demo
Security threat detection	19 patterns, 6 categories, deterministic, sub-5ms latency, logged	Threat detection test, audit log
CSRD carbon tracking	Per token, per model, per team — one-click ESRS E1 export with methodology note	Carbon dashboard, export sample
Organisational memory	Engram — 9 memory kinds, 4 maturity stages, cross-product scope, permanent	Memory store admin view
Audit log	Every message, every policy match, every security event, every token — configurable retention	Audit log export, retention config

Frequently Asked Questions

What should I evaluate when choosing an enterprise AI platform?
Prioritise governance architecture over model quality. The key questions concern: where content policy enforcement happens (message layer vs. post-hoc), per-team policy configuration, EU AI Act registry and classification tooling, LLM flexibility with BYOK, specific security threat detection, CSRD carbon tracking, structured persistent memory, and audit log specification.

Why is model quality not the most important evaluation criterion?
Every frontier model produces an impressive demo. What determines deployability in a regulated organisation is everything surrounding the model — governance controls, security screening, compliance tooling, persistent memory, cost controls, and audit capability. Evaluate the platform infrastructure, not the demo responses.

What is the red flag answer to "where does content policy enforcement happen"?
Any answer that is not "at the message layer, before the query reaches the model, in real time." Red flags: "our AI is instructed to follow policies," "the model was safety-trained," "our compliance team reviews conversations," or "users agree to usage policies at sign-up." None of these prevent a non-compliant prompt from being processed.

Does the platform need to support the EU AI Act registry?
Yes, for any organisation subject to the EU AI Act from August 2026. A built-in registry with risk classification workflow, LLM discovery, and regulator-ready export is a compliance requirement. Platforms without these require a separate GRC system, adding complexity and creating the documentation-without-enforcement gap.

What is the minimum acceptable audit log for a regulated enterprise?
Every message with user ID, timestamp, and model; every content policy match with policy name and action; every PII detection event; every security threat with category and severity; and token consumption per message. Retention must match regulatory obligations — 5–7 years for financial services under FINRA and MiFID II.

Why does BYOK matter in enterprise AI platform selection?
BYOK means token consumption is billed directly to your organisation at published provider rates — no markup, full cost visibility, and provider-level spending caps available. It also means your data is not routed through the vendor's API keys, simplifying data processing agreements and reducing data exposure risk.

How to Choose an Enterprise AI Platform: 8 Questions Every Compliance and IT Leader Must Ask