Skip to main content

PRD-STD-003: Testing Requirements

Standard ID: PRD-STD-003 Version: 1.0 Status: Active Compliance Level: Level 2 (Managed) Effective Date: 2025-01-15 Last Reviewed: 2026-01-15

1. Purpose

This standard defines testing requirements specific to AI-generated code. AI coding assistants produce code that is often syntactically valid and superficially correct but may contain subtle behavioral defects, edge case omissions, and logic errors that are difficult to detect through code review alone. Given that AI co-authored code carries 1.7x more issues than traditionally written code, rigorous testing is essential to catch defects before they reach production.

This standard establishes minimum testing thresholds, required testing types, and validation practices that ensure AI-generated code is behaviorally correct, performant, and resilient.

2. Scope

This standard applies to:

  • All production code generated, modified, or substantially influenced by AI coding assistants
  • All test code, whether written manually or generated by AI tools
  • All testing activities across the software development lifecycle (unit, integration, system, regression)

This standard complements PRD-STD-002: Code Review. Testing and review are both required; neither substitutes for the other.

3. Definitions

TermDefinition
Unit TestA test that validates a single function, method, or class in isolation from external dependencies
Integration TestA test that validates the interaction between two or more components or services
Behavioral ValidationTesting that verifies code behavior against documented requirements or acceptance criteria, rather than implementation details
Regression TestA test that verifies existing functionality has not been broken by new changes
Mutation TestingA testing technique that introduces small changes (mutations) to code and verifies that tests detect those changes
Code CoverageThe percentage of code lines, branches, or paths exercised by the test suite
Branch CoverageThe percentage of conditional branches (if/else, switch) exercised by the test suite

4. Requirements

4.1 Unit Testing

MANDATORY

REQ-003-01: All AI-generated code MUST have unit test coverage of at least 80% as measured by line coverage.

REQ-003-02: All AI-generated code MUST have branch coverage of at least 70%.

REQ-003-03: Unit tests for AI-generated code MUST include explicit test cases for:

  • Normal/expected input values
  • Boundary values (minimum, maximum, empty, null)
  • Error conditions and exception paths
  • Type edge cases (empty strings, zero values, negative numbers, Unicode)

REQ-003-04: Unit tests MUST NOT be generated by the same AI session that generated the code under test, unless the tests are independently reviewed and validated by a qualified engineer.

REQ-003-05: All unit tests MUST pass before code is eligible for merge. Zero tolerance for test failures.

RECOMMENDED

REQ-003-06: Organizations SHOULD target 90% line coverage for AI-generated code in critical paths (payment processing, authentication, data persistence).

REQ-003-07: Unit tests SHOULD test behavior and outcomes rather than implementation details to prevent brittle tests that break on refactoring.

REQ-003-08: Property-based testing SHOULD be used where applicable to validate invariants across a wide range of inputs.

4.2 Integration Testing

MANDATORY

REQ-003-09: AI-generated code that interacts with external systems (databases, APIs, message queues, file systems) MUST have integration tests that validate the interaction contracts.

REQ-003-10: Integration tests MUST validate both successful interactions and failure scenarios (timeouts, connection errors, malformed responses).

REQ-003-11: Integration tests MUST be executable in CI/CD pipelines without requiring access to production systems.

RECOMMENDED

REQ-003-12: Integration tests SHOULD use contract testing (e.g., Pact, Spring Cloud Contract) for service-to-service interactions where AI-generated code defines API contracts.

REQ-003-13: Integration tests SHOULD validate data serialization/deserialization behavior for AI-generated data transfer objects.

4.3 Behavioral Validation

MANDATORY

REQ-003-14: AI-generated code MUST be validated against the original requirements or acceptance criteria that prompted its generation. The prompt or specification MUST be traceable to the tests.

REQ-003-15: Behavioral validation MUST include verification that the code does NOT perform unintended actions (e.g., does not modify data it should only read, does not call external services unnecessarily).

RECOMMENDED

REQ-003-16: Teams SHOULD use behavior-driven development (BDD) specifications as the basis for testing AI-generated business logic.

REQ-003-17: Acceptance tests SHOULD be written before AI code generation and used as constraints in the prompt, per PRD-STD-001.

4.4 Regression Testing

MANDATORY

REQ-003-18: When AI-generated code modifies existing functionality, the full regression test suite for the affected component MUST pass before merge.

REQ-003-19: When AI-generated code introduces a bug that reaches any shared branch, a regression test that specifically covers the bug MUST be added before the fix is merged.

RECOMMENDED

REQ-003-20: Teams SHOULD maintain a dedicated regression test suite for areas of the codebase with high AI-generated code concentration.

4.5 Mutation Testing

MANDATORY

REQ-003-21: Mutation testing MUST be performed on AI-generated code in critical paths (as defined by the organization's risk classification) at least once per release cycle.

REQ-003-22: The mutation score for AI-generated code in critical paths MUST be at least 70%. A mutation score below 70% indicates that the test suite is insufficient and MUST be strengthened before release.

RECOMMENDED

REQ-003-23: Organizations SHOULD target a mutation score of 80% or higher for all AI-generated code, not just critical paths.

REQ-003-24: Mutation testing results SHOULD be tracked over time to identify trends in test quality for AI-generated code versus manually written code.

5. Implementation Guidance

Coverage Thresholds Summary

MetricMinimum (All AI Code)Target (Critical Paths)
Line Coverage80%90%
Branch Coverage70%85%
Mutation ScoreNot required (general)70% (critical paths)
Integration Test CoverageAll external interactionsAll external interactions

AI-Generated Test Validation Checklist

When AI tools are used to generate tests, verify:

  • Tests actually execute the code under test (not just mocking everything)
  • Assertions are meaningful and specific (not just assertNotNull)
  • Edge cases are covered, not just the happy path
  • Test names clearly describe the behavior being tested
  • Tests are independent and do not depend on execution order
  • Test data is representative of real-world inputs
  • Mocks and stubs are used appropriately (not excessively)
  • Tests fail when the code under test is broken (verify by temporarily introducing a bug)

Testing Pyramid for AI-Generated Code

Maintain the standard testing pyramid with adjusted ratios for AI-generated code:

        /  E2E  \           ~5% of tests
/ Integration \ ~25% of tests
/ Unit Tests \ ~70% of tests

For AI-generated code, the integration layer SHOULD be proportionally larger than for manually written code because AI-generated code is more likely to have integration-level defects (incorrect API usage, wrong serialization, misunderstood contracts).

Mutation Testing Tools

LanguageRecommended ToolNotes
Java/KotlinPIT (pitest)Mature, well-integrated with Maven/Gradle
JavaScript/TypeScriptStrykerSupports all major test frameworks
PythonmutmutGood pytest integration
C#/.NETStryker.NETSupports MSTest, NUnit, xUnit
Gogo-mutestingCommunity-maintained

6. Exceptions & Waiver Process

Exceptions to coverage requirements MAY be granted for:

  • Generated boilerplate code (e.g., ORM entity classes, protobuf stubs) where the generation tool is separately validated -- coverage requirements MAY be reduced to 60%
  • UI code -- line coverage requirement MAY be reduced to 60%, provided visual regression testing is in place
  • Legacy integration code where achieving 80% coverage requires extensive refactoring -- a time-limited waiver (maximum 6 months) MAY be granted with a remediation plan

Exceptions MUST be documented in the project's testing strategy document and approved by the engineering lead.

No exceptions are available for REQ-003-05 (all tests must pass) or REQ-003-04 (independent test validation).

8. Revision History

VersionDateAuthorChanges
1.02025-01-15AEEF Standards CommitteeInitial release
1.0.12026-01-15AEEF Standards CommitteeAdded mutation testing tools table; clarified REQ-003-04