PRD-STD-003: Testing Requirements

Standard ID: PRD-STD-003 Version: 1.0 Status: Active Compliance Level: Level 2 (Managed) Effective Date: 2025-01-15 Last Reviewed: 2026-01-15

1. Purpose

This standard defines testing requirements specific to AI-generated code. AI coding assistants produce code that is often syntactically valid and superficially correct but may contain subtle behavioral defects, edge case omissions, and logic errors that are difficult to detect through code review alone. Given that AI co-authored code carries 1.7x more issues than traditionally written code, rigorous testing is essential to catch defects before they reach production.

This standard establishes minimum testing thresholds, required testing types, and validation practices that ensure AI-generated code is behaviorally correct, performant, and resilient.

2. Scope

This standard applies to:

All production code generated, modified, or substantially influenced by AI coding assistants
All test code, whether written manually or generated by AI tools
All testing activities across the software development lifecycle (unit, integration, system, regression)

This standard complements PRD-STD-002: Code Review. Testing and review are both required; neither substitutes for the other.

3. Definitions

Term	Definition
Unit Test	A test that validates a single function, method, or class in isolation from external dependencies
Integration Test	A test that validates the interaction between two or more components or services
Behavioral Validation	Testing that verifies code behavior against documented requirements or acceptance criteria, rather than implementation details
Regression Test	A test that verifies existing functionality has not been broken by new changes
Mutation Testing	A testing technique that introduces small changes (mutations) to code and verifies that tests detect those changes
Code Coverage	The percentage of code lines, branches, or paths exercised by the test suite
Branch Coverage	The percentage of conditional branches (if/else, switch) exercised by the test suite

4. Requirements

4.1 Unit Testing

MANDATORY

REQ-003-01: All AI-generated code MUST have unit test coverage of at least 80% as measured by line coverage.

REQ-003-02: All AI-generated code MUST have branch coverage of at least 70%.

REQ-003-03: Unit tests for AI-generated code MUST include explicit test cases for:

Normal/expected input values
Boundary values (minimum, maximum, empty, null)
Error conditions and exception paths
Type edge cases (empty strings, zero values, negative numbers, Unicode)

REQ-003-04: Unit tests MUST NOT be generated by the same AI session that generated the code under test, unless the tests are independently reviewed and validated by a qualified engineer.

REQ-003-05: All unit tests MUST pass before code is eligible for merge. Zero tolerance for test failures.

RECOMMENDED

REQ-003-06: Organizations SHOULD target 90% line coverage for AI-generated code in critical paths (payment processing, authentication, data persistence).

REQ-003-07: Unit tests SHOULD test behavior and outcomes rather than implementation details to prevent brittle tests that break on refactoring.

REQ-003-08: Property-based testing SHOULD be used where applicable to validate invariants across a wide range of inputs.

4.2 Integration Testing

MANDATORY

REQ-003-09: AI-generated code that interacts with external systems (databases, APIs, message queues, file systems) MUST have integration tests that validate the interaction contracts.

REQ-003-10: Integration tests MUST validate both successful interactions and failure scenarios (timeouts, connection errors, malformed responses).

REQ-003-11: Integration tests MUST be executable in CI/CD pipelines without requiring access to production systems.

RECOMMENDED

REQ-003-12: Integration tests SHOULD use contract testing (e.g., Pact, Spring Cloud Contract) for service-to-service interactions where AI-generated code defines API contracts.

REQ-003-13: Integration tests SHOULD validate data serialization/deserialization behavior for AI-generated data transfer objects.

4.3 Behavioral Validation

MANDATORY

REQ-003-14: AI-generated code MUST be validated against the original requirements or acceptance criteria that prompted its generation. The prompt or specification MUST be traceable to the tests.

REQ-003-15: Behavioral validation MUST include verification that the code does NOT perform unintended actions (e.g., does not modify data it should only read, does not call external services unnecessarily).

RECOMMENDED

REQ-003-16: Teams SHOULD use behavior-driven development (BDD) specifications as the basis for testing AI-generated business logic.

REQ-003-17: Acceptance tests SHOULD be written before AI code generation and used as constraints in the prompt, per PRD-STD-001.

4.4 Regression Testing

MANDATORY

REQ-003-18: When AI-generated code modifies existing functionality, the full regression test suite for the affected component MUST pass before merge.

REQ-003-19: When AI-generated code introduces a bug that reaches any shared branch, a regression test that specifically covers the bug MUST be added before the fix is merged.

RECOMMENDED

REQ-003-20: Teams SHOULD maintain a dedicated regression test suite for areas of the codebase with high AI-generated code concentration.

4.5 Mutation Testing

MANDATORY

REQ-003-21: Mutation testing MUST be performed on AI-generated code in critical paths (as defined by the organization's risk classification) at least once per release cycle.

REQ-003-22: The mutation score for AI-generated code in critical paths MUST be at least 70%. A mutation score below 70% indicates that the test suite is insufficient and MUST be strengthened before release.

RECOMMENDED

REQ-003-23: Organizations SHOULD target a mutation score of 80% or higher for all AI-generated code, not just critical paths.

REQ-003-24: Mutation testing results SHOULD be tracked over time to identify trends in test quality for AI-generated code versus manually written code.

5. Implementation Guidance

Coverage Thresholds Summary

Metric	Minimum (All AI Code)	Target (Critical Paths)
Line Coverage	80%	90%
Branch Coverage	70%	85%
Mutation Score	Not required (general)	70% (critical paths)
Integration Test Coverage	All external interactions	All external interactions

AI-Generated Test Validation Checklist

When AI tools are used to generate tests, verify:

Tests actually execute the code under test (not just mocking everything)
Assertions are meaningful and specific (not just assertNotNull)
Edge cases are covered, not just the happy path
Test names clearly describe the behavior being tested
Tests are independent and do not depend on execution order
Test data is representative of real-world inputs
Mocks and stubs are used appropriately (not excessively)
Tests fail when the code under test is broken (verify by temporarily introducing a bug)

Testing Pyramid for AI-Generated Code

Maintain the standard testing pyramid with adjusted ratios for AI-generated code:

        /  E2E  \           ~5% of tests
       / Integration \      ~25% of tests
      /    Unit Tests   \   ~70% of tests

For AI-generated code, the integration layer SHOULD be proportionally larger than for manually written code because AI-generated code is more likely to have integration-level defects (incorrect API usage, wrong serialization, misunderstood contracts).

Mutation Testing Tools

Language	Recommended Tool	Notes
Java/Kotlin	PIT (pitest)	Mature, well-integrated with Maven/Gradle
JavaScript/TypeScript	Stryker	Supports all major test frameworks
Python	mutmut	Good pytest integration
C#/.NET	Stryker.NET	Supports MSTest, NUnit, xUnit
Go	go-mutesting	Community-maintained

6. Exceptions & Waiver Process

Exceptions to coverage requirements MAY be granted for:

Generated boilerplate code (e.g., ORM entity classes, protobuf stubs) where the generation tool is separately validated -- coverage requirements MAY be reduced to 60%
UI code -- line coverage requirement MAY be reduced to 60%, provided visual regression testing is in place
Legacy integration code where achieving 80% coverage requires extensive refactoring -- a time-limited waiver (maximum 6 months) MAY be granted with a remediation plan

Exceptions MUST be documented in the project's testing strategy document and approved by the engineering lead.

No exceptions are available for REQ-003-05 (all tests must pass) or REQ-003-04 (independent test validation).

PRD-STD-002: Code Review Standards -- Review complements testing; both are required
PRD-STD-007: Quality Gates -- Test coverage is a mandatory quality gate
PRD-STD-004: Security Scanning -- Security testing complements functional testing
Automated Testing with AI -- Tools for AI-assisted test generation
Pillar 4: Measurement & Metrics -- Test metrics tracking

8. Revision History

Version	Date	Author	Changes
1.0	2025-01-15	AEEF Standards Committee	Initial release
1.0.1	2026-01-15	AEEF Standards Committee	Added mutation testing tools table; clarified REQ-003-04

1. Purpose​

2. Scope​

3. Definitions​

4. Requirements​

4.1 Unit Testing​

4.2 Integration Testing​

4.3 Behavioral Validation​

4.4 Regression Testing​

4.5 Mutation Testing​

5. Implementation Guidance​

Coverage Thresholds Summary​

AI-Generated Test Validation Checklist​

Testing Pyramid for AI-Generated Code​

Mutation Testing Tools​

6. Exceptions & Waiver Process​

7. Related Standards​

8. Revision History​

1. Purpose

2. Scope

3. Definitions

4. Requirements

4.1 Unit Testing

4.2 Integration Testing

4.3 Behavioral Validation

4.4 Regression Testing

4.5 Mutation Testing

5. Implementation Guidance

Coverage Thresholds Summary

AI-Generated Test Validation Checklist

Testing Pyramid for AI-Generated Code

Mutation Testing Tools

6. Exceptions & Waiver Process

7. Related Standards

8. Revision History