Security Review for AI-Generated Code: The Checklist

Standard security review catches standard vulnerabilities. AI-generated code introduces non-standard ones. The code looks clean, follows common patterns, passes the linter, and passes the test suite — and it still has security problems that a human engineer would not have introduced because a human engineer would have known better. The AI did not know better. It pattern-matched its way to code that works and is also vulnerable.

I have been doing security-focused reviews of AI-generated code at client engagements for the last year, and the failure modes are consistent enough to codify. This article is the security review checklist I use, the specific things I look for that are different from a standard code review, and the practices I install on teams to prevent these issues from shipping.

Why AI code has different security risks

Three reasons AI-generated code has a distinct security profile:

The AI optimizes for correctness, not safety. When you ask an AI to build a feature, it produces code that makes the feature work. It does not, by default, think about what an attacker would do with the feature. A human engineer with security training has an adversarial mindset — "how could this be misused?" The AI has a cooperative mindset — "how do I fulfill the request?"

The AI uses patterns from its training data. Some of those patterns are old, deprecated, or insecure. The AI does not distinguish between a pattern it saw in a 2019 tutorial and a pattern recommended by OWASP in 2026. It picks the pattern that fits the context, and the fitting pattern is not always the secure pattern.

The AI does not know your threat model. Your application has specific security requirements based on your users, your data, and your regulatory environment. The AI knows none of this unless you tell it. It writes auth code without knowing your session requirements. It handles PII without knowing your data residency constraints. It builds API endpoints without knowing your rate limiting policy.

The security review checklist

Fifteen items I check on every AI-generated PR that touches anything security-relevant. Not all items apply to every PR — the first step is identifying which ones are relevant.

Authentication and authorization

1. Does the endpoint check auth? AI-generated endpoints sometimes skip authentication entirely, especially if the example they pattern-matched against was a public endpoint. Every new endpoint should have an explicit auth check, and the review should verify it is the right kind of auth (not just "is logged in" but "has permission for this specific action").

2. Are authorization checks granular enough? The AI often implements "is authenticated" when the requirement is "is authenticated AND is the owner of this resource." Object-level authorization is the class of bug the AI misses most often. Check that every data access is scoped to the requesting user's permissions.

3. Is the session handling correct? AI-generated auth code sometimes uses insecure session defaults — long expiry times, missing secure flags on cookies, tokens stored in localStorage instead of httpOnly cookies. Compare the generated session handling against the project's security requirements.

Input handling

4. Is all user input validated? AI loves to trust user input. Check that every value from the request — query params, body fields, headers, path params — is validated against an expected type and range before use. The AI often validates the happy path and trusts the rest.

5. Is user input sanitized for the output context? Input that ends up in HTML needs XSS sanitization. Input that ends up in SQL needs parameterization. Input that ends up in shell commands needs escaping. The AI sometimes sanitizes for one context and not another, or uses a sanitization approach that is incomplete.

6. Are there injection vectors? Check for SQL injection (is the AI using string concatenation instead of parameterized queries?), command injection (is user input passed to exec or spawn?), and template injection (is user input rendered in a template without escaping?). AI-generated code is particularly prone to SQL injection in complex queries where the AI builds the WHERE clause dynamically.

Data handling

7. Is PII handled according to policy? Check that personal data is not logged, is not included in error messages, is not exposed in API responses beyond what the client needs, and is encrypted at rest if your policy requires it. The AI does not know which fields are PII unless you told it.

8. Are secrets kept out of the code? AI sometimes hardcodes API keys, connection strings, or other secrets in the code, especially when generating configuration or setup code. Check every string literal that looks like a credential.

9. Is cross-tenant data isolated? If the feature accesses data from a multi-tenant database, check that every query includes the tenant filter. The AI will happily write a query that returns all users instead of the current tenant's users if the schema allows it.

Error handling

10. Do error responses leak internal information? AI-generated error handlers often return detailed error messages, stack traces, or internal identifiers to the client. In development this is helpful; in production it is an information leak. Check that error responses are generic and that detailed information is logged server-side only.

11. Are error paths handled at all? The AI sometimes handles errors by silently swallowing them, which can leave the system in an inconsistent or insecure state. A failed auth check that is caught and silently ignored is worse than no auth check at all, because it looks like the check is there.

Dependencies

12. Are new dependencies necessary and trustworthy? AI-generated code sometimes introduces new npm packages or library dependencies. Check that the dependency exists, is actively maintained, has no known vulnerabilities (run npm audit or equivalent), and is not a typosquat of a legitimate package. Supply chain attacks through AI-suggested packages are a real and growing risk.

13. Are dependency versions pinned? Check that the AI has not introduced unpinned or wildcard version ranges that could pull in a future compromised version.

Infrastructure

14. Are new API endpoints rate limited? AI-generated endpoints do not come with rate limiting unless the project has middleware that applies it globally. If rate limiting is per-endpoint, check that the new endpoint is covered. Unprotected endpoints are denial-of-service vectors and abuse vectors.

15. Are CORS and CSP headers correct? AI-generated API routes sometimes set overly permissive CORS headers (Access-Control-Allow-Origin: *) because permissive headers make the feature work in development. Check that CORS is scoped to the domains that should have access.

The high-risk zones

Not every AI-generated file needs the full checklist. Focus the security review on the high-risk zones:

Authentication and authorization code. Any file that controls who can access what. This is the highest-risk zone and the one where AI mistakes are most expensive.

Data access layers. Database queries, ORM configurations, API calls to internal services. Anywhere data flows between systems.

File upload and download handlers. AI-generated file handling often misses path traversal checks, file type validation, and size limits.

Payment and billing code. Anything that touches money. The AI's confident output in this area can mask subtle bugs that result in overcharges, undercharges, or data exposure.

Third-party integrations. OAuth flows, webhook handlers, API key management. The AI often implements the happy path of an OAuth flow and skips the state parameter, PKCE, or token validation steps that prevent CSRF and token theft.

The practice I install on teams

The security review checklist is useful but only if it gets used. The practice I install:

Tag security-relevant PRs. Any PR that touches the high-risk zones gets a "security" label. The team establishes clear criteria for what triggers the label.

Dedicated security reviewer. Security-tagged PRs get a second review from someone specifically looking at the checklist. This does not have to be a security specialist — it has to be someone who is looking at the code through an adversarial lens.

Automated checks in CI. Static analysis tools that catch common vulnerabilities — SQL injection, XSS, hardcoded secrets, dependency vulnerabilities — run on every PR. These are not a substitute for human review but they catch the easy stuff before the human spends time on it.

Quarterly security audit. Every three months, pull a sample of AI-generated code from the last quarter and run the full checklist against it. This catches patterns that individual PR reviews miss — the slow drift toward insecure defaults that only becomes visible in aggregate.

Threat model in the project instructions. Include the application's security requirements in the CLAUDE.md or equivalent file so the AI knows the constraints before it writes code. "All database queries must use parameterized queries. All endpoints require authentication. PII fields are: email, phone, name, address. These fields must not appear in logs." This prevents a meaningful percentage of the security issues at generation time instead of catching them at review time.

The relationship to general code review

This security checklist is an overlay on the general AI-generated PR review heuristics. It does not replace the general review — it adds a security-specific pass for high-risk code. The general review catches functional issues (duplicated logic, hallucinated APIs, tests that do not test the right thing). The security review catches vulnerability issues (injection, auth bypass, data exposure).

Both are necessary. Neither is sufficient on its own.

Counterpoint: do not make security review a bottleneck

A warning. If every PR requires a full security review, you will ship nothing. The checklist is for security-relevant code, not for all code. A PR that changes the color of a button does not need a security review. A PR that adds a new API endpoint that accepts user input and queries the database does. The judgment about which PRs need the security pass is the important organizational decision, and getting it wrong in either direction — reviewing too much or too little — is costly.

Your next step

This week, pull the last five AI-generated PRs that touched authentication, data access, or API endpoints. Run the fifteen-item checklist against them. Write down anything you find. If the list is not empty, you have found the case for installing the security review practice. If it is empty, you are in better shape than most teams, and the practice is still worth having as insurance.

Where I come in

Security reviews of AI-generated code are a standard part of my fractional CTO engagements, especially for teams that have ramped up AI tool usage in the last year. Usually a 3–5 day assessment that produces a prioritized list of findings and installs the review practice for the team going forward. Book a call if your team is shipping AI-generated code to production and has not done a dedicated security pass.