PRD-STD-003: Testing Requirements
Standard ID: PRD-STD-003 Version: 1.0 Status: Active Compliance Level: Level 2 (Managed) Effective Date: 2025-01-15 Last Reviewed: 2026-01-15
1. Purpose
This standard defines testing requirements specific to AI-generated code. AI coding assistants produce code that is often syntactically valid and superficially correct but may contain subtle behavioral defects, edge case omissions, and logic errors that are difficult to detect through code review alone. Given that AI co-authored code carries 1.7x more issues than traditionally written code, rigorous testing is essential to catch defects before they reach production.
This standard establishes minimum testing thresholds, required testing types, and validation practices that ensure AI-generated code is behaviorally correct, performant, and resilient.
2. Scope
This standard applies to:
- All production code generated, modified, or substantially influenced by AI coding assistants
- All test code, whether written manually or generated by AI tools
- All testing activities across the software development lifecycle (unit, integration, system, regression)
This standard complements PRD-STD-002: Code Review. Testing and review are both required; neither substitutes for the other.
3. Definitions
| Term | Definition |
|---|---|
| Unit Test | A test that validates a single function, method, or class in isolation from external dependencies |
| Integration Test | A test that validates the interaction between two or more components or services |
| Behavioral Validation | Testing that verifies code behavior against documented requirements or acceptance criteria, rather than implementation details |
| Regression Test | A test that verifies existing functionality has not been broken by new changes |
| Mutation Testing | A testing technique that introduces small changes (mutations) to code and verifies that tests detect those changes |
| Code Coverage | The percentage of code lines, branches, or paths exercised by the test suite |
| Branch Coverage | The percentage of conditional branches (if/else, switch) exercised by the test suite |
4. Requirements
4.1 Unit Testing
REQ-003-01: All AI-generated code MUST have unit test coverage of at least 80% as measured by line coverage.
REQ-003-02: All AI-generated code MUST have branch coverage of at least 70%.
REQ-003-03: Unit tests for AI-generated code MUST include explicit test cases for:
- Normal/expected input values
- Boundary values (minimum, maximum, empty, null)
- Error conditions and exception paths
- Type edge cases (empty strings, zero values, negative numbers, Unicode)
REQ-003-04: Unit tests MUST NOT be generated by the same AI session that generated the code under test, unless the tests are independently reviewed and validated by a qualified engineer.
REQ-003-05: All unit tests MUST pass before code is eligible for merge. Zero tolerance for test failures.
REQ-003-06: Organizations SHOULD target 90% line coverage for AI-generated code in critical paths (payment processing, authentication, data persistence).
REQ-003-07: Unit tests SHOULD test behavior and outcomes rather than implementation details to prevent brittle tests that break on refactoring.
REQ-003-08: Property-based testing SHOULD be used where applicable to validate invariants across a wide range of inputs.
4.2 Integration Testing
REQ-003-09: AI-generated code that interacts with external systems (databases, APIs, message queues, file systems) MUST have integration tests that validate the interaction contracts.
REQ-003-10: Integration tests MUST validate both successful interactions and failure scenarios (timeouts, connection errors, malformed responses).
REQ-003-11: Integration tests MUST be executable in CI/CD pipelines without requiring access to production systems.
REQ-003-12: Integration tests SHOULD use contract testing (e.g., Pact, Spring Cloud Contract) for service-to-service interactions where AI-generated code defines API contracts.
REQ-003-13: Integration tests SHOULD validate data serialization/deserialization behavior for AI-generated data transfer objects.
4.3 Behavioral Validation
REQ-003-14: AI-generated code MUST be validated against the original requirements or acceptance criteria that prompted its generation. The prompt or specification MUST be traceable to the tests.
REQ-003-15: Behavioral validation MUST include verification that the code does NOT perform unintended actions (e.g., does not modify data it should only read, does not call external services unnecessarily).
REQ-003-16: Teams SHOULD use behavior-driven development (BDD) specifications as the basis for testing AI-generated business logic.
REQ-003-17: Acceptance tests SHOULD be written before AI code generation and used as constraints in the prompt, per PRD-STD-001.
4.4 Regression Testing
REQ-003-18: When AI-generated code modifies existing functionality, the full regression test suite for the affected component MUST pass before merge.
REQ-003-19: When AI-generated code introduces a bug that reaches any shared branch, a regression test that specifically covers the bug MUST be added before the fix is merged.
REQ-003-20: Teams SHOULD maintain a dedicated regression test suite for areas of the codebase with high AI-generated code concentration.
4.5 Mutation Testing
REQ-003-21: Mutation testing MUST be performed on AI-generated code in critical paths (as defined by the organization's risk classification) at least once per release cycle.
REQ-003-22: The mutation score for AI-generated code in critical paths MUST be at least 70%. A mutation score below 70% indicates that the test suite is insufficient and MUST be strengthened before release.
REQ-003-23: Organizations SHOULD target a mutation score of 80% or higher for all AI-generated code, not just critical paths.
REQ-003-24: Mutation testing results SHOULD be tracked over time to identify trends in test quality for AI-generated code versus manually written code.
5. Implementation Guidance
Coverage Thresholds Summary
| Metric | Minimum (All AI Code) | Target (Critical Paths) |
|---|---|---|
| Line Coverage | 80% | 90% |
| Branch Coverage | 70% | 85% |
| Mutation Score | Not required (general) | 70% (critical paths) |
| Integration Test Coverage | All external interactions | All external interactions |
AI-Generated Test Validation Checklist
When AI tools are used to generate tests, verify:
- Tests actually execute the code under test (not just mocking everything)
- Assertions are meaningful and specific (not just
assertNotNull) - Edge cases are covered, not just the happy path
- Test names clearly describe the behavior being tested
- Tests are independent and do not depend on execution order
- Test data is representative of real-world inputs
- Mocks and stubs are used appropriately (not excessively)
- Tests fail when the code under test is broken (verify by temporarily introducing a bug)
Testing Pyramid for AI-Generated Code
Maintain the standard testing pyramid with adjusted ratios for AI-generated code:
/ E2E \ ~5% of tests
/ Integration \ ~25% of tests
/ Unit Tests \ ~70% of tests
For AI-generated code, the integration layer SHOULD be proportionally larger than for manually written code because AI-generated code is more likely to have integration-level defects (incorrect API usage, wrong serialization, misunderstood contracts).
Mutation Testing Tools
| Language | Recommended Tool | Notes |
|---|---|---|
| Java/Kotlin | PIT (pitest) | Mature, well-integrated with Maven/Gradle |
| JavaScript/TypeScript | Stryker | Supports all major test frameworks |
| Python | mutmut | Good pytest integration |
| C#/.NET | Stryker.NET | Supports MSTest, NUnit, xUnit |
| Go | go-mutesting | Community-maintained |
6. Exceptions & Waiver Process
Exceptions to coverage requirements MAY be granted for:
- Generated boilerplate code (e.g., ORM entity classes, protobuf stubs) where the generation tool is separately validated -- coverage requirements MAY be reduced to 60%
- UI code -- line coverage requirement MAY be reduced to 60%, provided visual regression testing is in place
- Legacy integration code where achieving 80% coverage requires extensive refactoring -- a time-limited waiver (maximum 6 months) MAY be granted with a remediation plan
Exceptions MUST be documented in the project's testing strategy document and approved by the engineering lead.
No exceptions are available for REQ-003-05 (all tests must pass) or REQ-003-04 (independent test validation).
7. Related Standards
- PRD-STD-002: Code Review Standards -- Review complements testing; both are required
- PRD-STD-007: Quality Gates -- Test coverage is a mandatory quality gate
- PRD-STD-004: Security Scanning -- Security testing complements functional testing
- Automated Testing with AI -- Tools for AI-assisted test generation
- Pillar 4: Measurement & Metrics -- Test metrics tracking
8. Revision History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2025-01-15 | AEEF Standards Committee | Initial release |
| 1.0.1 | 2026-01-15 | AEEF Standards Committee | Added mutation testing tools table; clarified REQ-003-04 |