The Founder’s Security Checklist: Shipping a Vibe-Coded MVP Without Getting Hacked (Part 8)

Vibe Coding Security Series

  1. What Is Vibe Coding Security? A Field Guide for 2026
  2. The OWASP Top 10 for Vibe-Coded Applications
  3. Anatomy of a Vibe Coding Breach: Lessons from 2026’s Worst Incidents
  4. The Dependency Trap: Supply Chain Risks in AI-Generated Code
  5. Authentication & Secrets: What AI Gets Wrong Every Time
  6. Scanning Vibe-Coded Apps: Why Traditional SAST/DAST Falls Short
  7. Prompt Engineering for Secure Code
  8. The Founder’s Security Checklist (you are here)
  9. Securing the AI Coding Pipeline (coming soon)
  10. The Future of Vibe Coding Security (coming soon)

Read Time: 18 minutes

TL;DR

You built your MVP with AI. It works, users are signing up, and you’re thinking about launch. Before you do, run through these fifteen checks. They cover the vulnerabilities I see most often in vibe-coded apps — the ones that lead to data breaches, leaked credentials, and “we need to shut everything down” emails to your users. Each check has a test you can run in under five minutes, most from a browser or a single terminal command. Print the summary at the end and tape it next to your monitor.


Why This Checklist Exists

A founder I worked with shipped his vibe-coded MVP on a Thursday. By Saturday night his database was dumped — every user email, every record, everything. An attacker found the exposed MongoDB port, connected without credentials, and exfiltrated the lot. The founder had failed on three items from the list you’re about to read. It took him ten minutes to run the checks after the breach. It would have taken him ten minutes before.

I built the first version of this checklist at VULNEX after presenting at a security conference in 2025, based on vulnerabilities I kept seeing in AI-generated code. Since then, the pattern has only gotten worse. GitGuardian’s 2026 report found 28.65 million new secrets leaked on GitHub in 2025 — a 34% increase year over year. Commits involving AI coding assistants leak secrets at more than double the baseline rate. Apiiro’s research showed AI code adding over 10,000 new security findings per month across studied repositories by mid-2025. The breaches I covered in Part 3 — Moltbook, Enrichlead, apps breached within days of launch — all failed on items in this list.

This isn’t a comprehensive security program. It’s the fifteen things that, if you get them wrong, guarantee someone finds the hole before you do. If you get them right, you’re ahead of the vast majority of vibe-coded MVPs shipping today.

The checks are grouped into five areas. I’ll use QuickNote — the deliberately vulnerable note-taking app from earlier in this series — and a few other real-world examples to make each one concrete.


Area 1: The Perimeter

These are the things attackers see the moment they point a browser or a port scanner at your app.

Check 1: Force HTTPS on every page

AI-generated deployment configs routinely skip HTTPS. The model gives you a working Node.js app listening on port 3000 over plain HTTP — which is fine for local development and catastrophic in production. Without HTTPS, every login, every API token, every piece of user data travels across the internet in cleartext. Anyone on the same network — a coffee shop, a shared office, a compromised ISP — can read it.

How to test:

curl -I http://yourapp.com

You want a 301 or 308 redirect to https://. If you get a 200 on plain HTTP, your app is serving content without encryption. Also check that your API responds only on HTTPS — curl -I http://yourapp.com/api/notes should redirect, not return data.

How to fix: If you’re on Vercel, Netlify, or Cloudflare Pages, HTTPS is enforced automatically. On a VPS or Docker deployment, configure your reverse proxy (Nginx, Caddy) to redirect all HTTP to HTTPS. Caddy does this by default — one reason I recommend it for founders who don’t want to think about TLS certificates.

Check 2: Set security headers

Open securityheaders.com and scan your domain. If you get anything below a B, you have work to do. Across the web, only 21.9% of sites deploy a Content Security Policy — and vibe-coded apps are well below that average because AI rarely generates security header configuration unless you ask.

How to test:

curl -I https://yourapp.com | grep -iE "strict-transport|content-security|x-frame|x-content-type"

You want to see at least these four headers in the response: Strict-Transport-Security, Content-Security-Policy, X-Frame-Options, and X-Content-Type-Options. If you see none of them, your app has zero hardening against clickjacking, MIME sniffing, and protocol downgrade attacks.

How to fix: Add them in your reverse proxy, your Express middleware, or your hosting platform’s config. A reasonable starting set for an MVP:

Strict-Transport-Security: max-age=31536000; includeSubDomains
Content-Security-Policy: default-src 'self'; script-src 'self'
X-Frame-Options: DENY
X-Content-Type-Options: nosniff
Referrer-Policy: strict-origin-when-cross-origin
Permissions-Policy: camera=(), microphone=(), geolocation=()

Adjust Content-Security-Policy to match what your app actually loads — if you use a CDN for scripts, add its domain to script-src. If your app breaks after adding CSP (common with React apps that use inline scripts), start with script-src 'self' 'unsafe-inline' and tighten later. An imperfect CSP is better than no CSP.

Check 3: Close exposed ports and admin panels

AI deployment guides often leave database ports open to the internet. As of early 2026, Shodan indexes over 213,000 exposed MongoDB instances — many with no authentication required. If you’re using Firebase, don’t assume you’re safe: RedHunt Labs found that 1 in 5 Firebase databases had misconfigured rules allowing public read access, exposing emails, passwords, and private messages. Your database should never be reachable from the public internet — and “managed” doesn’t mean “secured.”

How to test:

nmap -Pn -p 5432,27017,6379,3306,9200 yourapp.com

That scans for PostgreSQL (5432), MongoDB (27017), Redis (6379), MySQL (3306), and Elasticsearch (9200). Every one of those ports should show filtered or closed. If any shows open, your database is directly accessible from the internet — and if it’s using default credentials or no auth (as Redis often does), it’s already compromised.

Also check for admin panels: browse to /admin, /dashboard, /supabase, /_next, /graphql, /phpmyadmin. If any of these load without requiring authentication from the public internet, lock them down or remove them.

How to fix: Configure your hosting provider’s firewall to allow database connections only from your application server’s IP. On AWS, that’s a security group rule. On a VPS, use ufw allow from <app-ip> to any port 5432. For admin panels, put them behind authentication or restrict access by IP.


Area 2: Secrets

The most common category of vibe coding vulnerability. AI generates code with secrets embedded in it because that’s what the training data shows — tutorial code hardcodes credentials for simplicity, and the model reproduces the pattern.

Check 4: Scan your codebase for hardcoded secrets

Of the 28.65 million secrets leaked on GitHub in 2025, a disproportionate share came from AI-generated code. GitGuardian found that commits involving an AI coding assistant leaked secrets at a 3.2% rate — more than double the 1.5% baseline across public GitHub. The model puts your Supabase service role key in a constant, your Stripe secret key in a config object, your database connection string in a Docker Compose file. It does this because that’s what works, and working code is what it optimizes for. Picture this: a founder pushes a Stripe secret key to a public repo at 2pm. By 4pm, bots have found it. By 6pm, fraudulent charges are hitting their account. This happens every day — GitGuardian’s data shows leaked secrets are typically exploited within hours of exposure.

How to test:

# Install and run Gitleaks on your repo
gitleaks detect --source . --report-format json --report-path leaks.json

Or use TruffleHog for deeper scanning including git history:

trufflehog git file://. --json

Any findings are secrets that have been committed to your repository. Even if you delete them from the current code, they’re in your git history — and if the repo was ever public, they’ve been scraped.

How to fix: Rotate every leaked secret immediately — don’t just remove it from code. Move all secrets to environment variables loaded at runtime. If you’re on Vercel, Railway, or Render, use their environment variable UI. Never put secrets in .env files that get committed to git. Which leads to the next check.

Check 5: Verify .env files and Docker images don’t leak secrets

Two hidden channels that AI routinely creates for secret leakage. First: .env files. The model creates a .env with your database credentials but doesn’t always add it to .gitignore. Second: Docker images. As I covered in Part 5, AI-generated Dockerfiles often bake secrets into the build with ARG and ENV instructions, making them visible in the image layer history.

How to test:

# Check if .env is in your gitignore
grep "\.env" .gitignore

# Check if any .env files are tracked by git
git ls-files | grep -i "\.env"

# Check Docker image for leaked secrets
docker history --no-trunc yourapp:latest | grep -iE "key|secret|password|token"

If git ls-files shows any .env file, that file — and every secret in it — is in your repository history. If docker history shows credentials, anyone who pulls your image can extract them.

How to fix: Add .env* to .gitignore before your first commit. For Docker, use multi-stage builds and pass secrets as runtime environment variables, never build arguments. If secrets are already in git history, you need to use git filter-repo to purge them — and rotate every exposed secret.

Check 6: Lock down CORS

Cross-Origin Resource Sharing misconfigurations are everywhere in vibe-coded apps. CORS issues consistently rank among the most common web application vulnerabilities, and vibe-coded apps are especially prone because the typical AI-generated Express.js setup includes cors() with no arguments — which defaults to Access-Control-Allow-Origin: *, allowing any website on the internet to make authenticated requests to your API.

How to test:

curl -H "Origin: https://evil.com" -I https://yourapp.com/api/notes

Look at the Access-Control-Allow-Origin header in the response. If it says * or reflects back https://evil.com, your API will happily serve data to any website that asks — including an attacker’s phishing page.

How to fix: Configure CORS to allow only your own domains:

app.use(cors({
  origin: ['https://yourapp.com', 'https://www.yourapp.com'],
  credentials: true
}));

Never use origin: true (reflects any origin) or leave CORS at the default wildcard in production.


Area 3: Authentication and Access

This is where vibe-coded apps fail hardest. The AI builds authentication that works — you can log in, you see your data — but it skips the controls that prevent everyone else from seeing your data too. I covered the details in Part 5, but here’s how to test for the critical failures.

Check 7: Add rate limiting to login and signup

Without rate limiting, your login endpoint accepts unlimited password attempts. Credential stuffing — automated attacks using leaked username/password pairs from other breaches — generates 26 billion attempts per month globally. Microsoft Entra blocks 7,000 password attacks per second. If your login has no rate limit, an attacker can try thousands of passwords per minute against your users’ accounts.

QuickNote had this exact vulnerability. No rate limiter on /api/login meant an attacker could brute-force any account password at the speed of their internet connection.

How to test:

# Send 20 rapid requests to your login endpoint
for i in $(seq 1 20); do
  curl -s -o /dev/null -w "%{http_code}\n" \
    -X POST https://yourapp.com/api/login \
    -H "Content-Type: application/json" \
    -d '{"email":"test@test.com","password":"wrong"}';
done

If all 20 return 401 (invalid credentials) with no 429 (too many requests), you have no rate limiting. You should start seeing 429 responses after 5-10 attempts.

How to fix: In Express.js, add express-rate-limit:

const loginLimiter = rateLimit({
  windowMs: 60 * 1000,
  max: 5,
  message: { error: 'Too many attempts, try again later' }
});
app.post('/api/login', loginLimiter, loginHandler);

Apply rate limiting to signup and password reset endpoints too — those are targeted just as often.

Check 8: Verify every API endpoint checks authentication

AI-generated APIs often have authentication on some endpoints but not others. The model builds a login flow, generates a token, and then forgets to check that token on half the routes. I’ve reviewed vibe-coded apps where /api/login was properly secured but /api/users, /api/notes, and /api/admin accepted unauthenticated requests.

How to test:

# Try hitting your API endpoints with no authentication token
curl -s https://yourapp.com/api/notes
curl -s https://yourapp.com/api/users
curl -s https://yourapp.com/api/settings

Every protected endpoint should return 401 Unauthorized when called without a valid token. If any of them return data, that endpoint is publicly accessible to anyone who knows the URL.

How to fix: Add authentication middleware that runs on every route by default, then explicitly exempt only public routes (login, signup, health check). In Express.js:

// Exempt public routes BEFORE the auth middleware
app.post('/api/login', loginHandler);
app.post('/api/signup', signupHandler);

// Then apply auth middleware to everything else under /api
app.use('/api', authMiddleware);

Check 9: Test that users can only access their own data

This is the IDOR vulnerability — Insecure Direct Object Reference — and it’s the single most dangerous flaw in multi-tenant vibe-coded apps. The app works correctly when you use it normally: you see your notes, your invoices, your profile. But if you change the ID in the URL or API request, you see someone else’s data. QuickNote had this: changing /api/notes/42 to /api/notes/43 returned another user’s private notes. No ownership check, no authorization — just a database lookup by ID.

How to test:

# Log in as user A, get their token, and note the ID of a resource they own
# Then try accessing a resource that belongs to user B
curl -H "Authorization: Bearer <user-a-token>" \
  https://yourapp.com/api/notes/9999

If this returns data (instead of 403 Forbidden), any authenticated user can access any other user’s data by guessing or incrementing IDs. If your app uses auto-incrementing integer IDs, an attacker can enumerate every record in your database.

How to fix: Add a WHERE user_id = authenticated_user_id clause to every database query. If you’re on Supabase, enable Row Level Security and create policies:

CREATE POLICY notes_owner ON notes
  USING (user_id = auth.uid());

Test the policy by logging in as two different users and verifying that neither can see the other’s data.


Area 4: Data Handling

How your app processes what users send it. AI-generated code is optimistic by default — it assumes all input is well-formed and trustworthy. Attackers don’t send well-formed input.

Check 10: Validate all input on the server

If your app has a form, test what happens when you put <script>alert('xss')</script> in every text field. If your app has a search feature, try '; DROP TABLE users; --. AI-generated code almost never validates input server-side unless you specifically ask for it. Client-side validation (HTML required attributes, JavaScript checks) is trivially bypassed — open the browser dev tools and delete the validation, or send requests directly with curl.

Imagine you built a freelancer invoicing app with AI. The “company name” field in the invoice form probably accepts any string. An attacker puts a script tag in the company name, generates an invoice, and when your client opens that invoice PDF or web view — the script executes in their browser, potentially stealing their session.

How to test:

# Test for XSS in a text field
curl -X POST https://yourapp.com/api/notes \
  -H "Authorization: Bearer 
<token>" \
  -H "Content-Type: application/json" \
  -d '{"title":"<script>alert(1)</script>","content":"test"}'

# Test for SQL injection in a search parameter
curl "https://yourapp.com/api/search?q=test%27%20OR%201=1--"

If the script tag is stored and rendered back without escaping, you have stored XSS. If the SQL injection test returns more data than expected, you have SQL injection.

How to fix: Validate and sanitize all input server-side. Use a validation library like Zod or Joi in Node.js. Define what each field should accept — data type, max length, character set — and reject anything that doesn’t match. Sanitize HTML with a library like DOMPurify before rendering user-generated content.

Check 11: Use parameterized queries

This is the server-side defense against SQL injection. String-concatenated queries — where user input is glued directly into the SQL string — are one of the oldest and most dangerous vulnerabilities in web development. AI generates them regularly because the training data is full of them.

How to test:

# Search your codebase for string concatenation in SQL
grep -rn "query.*\`.*\${" ./src/
grep -rn "query.*+.*req\." ./src/
grep -rn "f\".*SELECT" ./src/

Any match is a potential SQL injection vulnerability. The pattern query(\SELECT FROM notes WHERE id = ${noteId}`)is vulnerable. The patternquery(‘SELECT FROM notes WHERE id = $1′, [noteId])` is safe.

How to fix: Replace every string-concatenated query with parameterized queries. In Node.js with pg:

// Vulnerable
db.query(`SELECT * FROM notes WHERE id = ${noteId}`);

// Safe
db.query('SELECT * FROM notes WHERE id = $1', [noteId]);

If you’re using an ORM like Prisma or Drizzle, you’re mostly safe by default — but check for any $queryRawUnsafe or $executeRawUnsafe calls, which bypass ORM protections.

Check 12: Don’t store tokens or sensitive data in localStorage

This is the vulnerability that gives an attacker full account takeover through any XSS hole. localStorage is accessible to every script running on your page. If an attacker finds any way to inject JavaScript — through a stored XSS in a user profile field, through a compromised third-party script, through a browser extension — they can read every token in localStorage and send it to their server.

QuickNote stored JWT access tokens in localStorage. Combined with the missing input validation, this meant any XSS vulnerability gave an attacker every user’s authentication token.

How to test:

Open your app in the browser, log in, then open Developer Tools (F12) → Application → Local Storage. If you see anything labeled token, access_token, jwt, session, or similar — that’s a finding. Also check sessionStorage.

How to fix: Store authentication tokens in httpOnly cookies with Secure and SameSite=Strict flags. These cookies are invisible to JavaScript — XSS can’t read them, and they’re sent automatically with every request to your server. This is what the security-aware prompt in Part 7 produces by default.


Area 5: Dependencies and Deployment

What you shipped alongside your own code. AI tools pull in dependencies you never chose, generate configurations you never reviewed, and create error handling that tells attackers exactly what went wrong.

Check 13: Audit your dependencies for known vulnerabilities

Every dependency your AI tool added is an attack surface you didn’t consciously accept. Sonatype’s 2026 report documented 454,648 new malicious packages in 2025 — a 75% increase year over year. Your AI coding assistant chose packages based on training data popularity, not on whether they’ve been patched recently or whether they’ve been flagged as malicious.

How to test:

# Node.js
npm audit

# Python
pip-audit

# Or use Snyk for a more detailed report
npx snyk test

npm audit is built into Node.js and runs in seconds. Pay attention to high and critical severity findings. pip-audit does the same for Python. For a deeper analysis including transitive dependencies and reachability, Snyk and Endor Labs offer free tiers.

How to fix: Run npm audit fix for automatic patches. For vulnerabilities that can’t be auto-fixed, check if a newer version of the package resolves them, or find an alternative package. I covered the full dependency management workflow in Part 4.

Check 14: Lock down file uploads

If your app accepts file uploads — profile pictures, documents, attachments — test what happens when you upload something that isn’t what the form expects. Unrestricted file uploads are a CVSS 10.0 vulnerability class. In April 2025, CVE-2025-31324 — an unauthenticated file upload in SAP NetWeaver — was exploited in the wild to upload webshells and achieve full remote code execution. The same pattern appears in vibe-coded apps: AI generates an upload endpoint that saves whatever it receives to the filesystem, no type checking, no size limit, no filename sanitization.

How to test: Try uploading a file with a .html or .svg extension through your app’s upload form. If it’s saved and accessible at a public URL, try accessing it in a browser — if the HTML renders or the SVG executes JavaScript, you have a stored XSS via file upload. Also test uploading a very large file (100MB+) — if there’s no size limit, that’s a denial-of-service vector.

How to fix: Validate file type on the server by checking the file’s magic bytes, not just the extension (extensions can be faked). Limit file size. Store uploads in a dedicated storage bucket (S3, Cloudflare R2) with a content-type override that forces downloads rather than rendering. Never serve user-uploaded files from the same domain as your application — use a separate subdomain or CDN domain.

Check 15: Make sure errors don’t leak internal details

AI-generated code leaves detailed error messages in production. Stack traces, database connection strings, file paths, package versions — all information that helps an attacker understand your infrastructure and find their next exploit. The default Express.js error handler, for example, sends the full stack trace to the client in development mode — and AI-generated code often doesn’t switch to production mode on deployment.

How to test:

# Trigger an error by requesting a resource that doesn't exist
curl https://yourapp.com/api/notes/nonexistent-id-999999

# Try sending malformed data
curl -X POST https://yourapp.com/api/notes \
  -H "Content-Type: application/json" \
  -d '{"invalid json'

If the response includes a stack trace, file paths (like /app/src/routes/notes.js:42), database errors (like relation "users" does not exist), or framework version numbers — your error handling is leaking information.

How to fix: Set NODE_ENV=production in your deployment environment. Add a global error handler that catches all errors and returns a generic message to the client while logging the details server-side:

app.use((err, req, res, next) => {
  console.error(err); // Logged server-side, not sent to client
  res.status(500).json({ error: 'Internal server error' });
});

The Printable Checklist

Print this. Tape it next to your monitor. Run through it before every deploy. Download the one-page PDF version if you want a cleaner printout.

The Perimeter

  • 1. HTTPS forced on every page — curl -I http://yourapp.com returns 301/308 redirect
  • 2. Security headers set — securityheaders.com score B or higher
  • 3. No exposed database ports or admin panels — nmap -p 5432,27017,6379 shows filtered/closed

Secrets

  • 4. No hardcoded secrets — gitleaks detect returns zero findings
  • 5. .env excluded from git, no secrets in Docker layers — git ls-files | grep .env returns nothing
  • 6. CORS locked to your domains — curl -H "Origin: https://evil.com" doesn’t reflect origin

Authentication & Access

  • 7. Rate limiting on login/signup — 20 rapid requests trigger 429 responses
  • 8. Every API endpoint requires authentication — unauthenticated curl returns 401
  • 9. Users can only access their own data — cross-user ID test returns 403

Data Handling

  • 10. Server-side input validation — <script> tags rejected or escaped
  • 11. Parameterized queries — grep finds no string-concatenated SQL
  • 12. No tokens in localStorage — browser dev tools show no auth tokens in storage

Dependencies & Deployment

  • 13. Dependencies audited — npm audit shows zero high/critical findings
  • 14. File uploads restricted — type, size, and storage location validated
  • 15. Errors don’t leak details — malformed requests return generic messages, no stack traces

If you can only fix three things today

If you ran the checklist and failed on multiple items, here’s where to start:

First: Check 4 (hardcoded secrets). If Gitleaks found secrets in your repo, they’re already leaked. Every minute you wait is a minute an attacker can use those credentials. Rotate them now — before fixing anything else.

Second: Check 9 (users accessing other users’ data). If your IDOR test passed, any authenticated user can browse your entire database by incrementing IDs. This is the vulnerability that turns a security incident into a data breach notification.

Third: Check 1 (HTTPS). Without HTTPS, every fix you apply afterward can be intercepted in transit. HTTPS is the foundation — nothing else works without it.

Everything else matters, but these three are the ones where the gap between “vulnerable” and “breached” is measured in hours, not weeks.


What This Checklist Doesn’t Cover

Fifteen items can’t cover everything. This checklist is the floor, not the ceiling. A few things you’ll need beyond this list as you grow past MVP:

Penetration testing. Once you have paying users, hire a professional to try to break in. At VULNEX we do this kind of work regularly, and I can tell you that a pentest almost always finds things no checklist catches — business logic flaws, race conditions, trust boundary issues that only surface when a human thinks like an attacker against your specific application.

Logging and monitoring. Check 7 tells you to add rate limiting, but you also need to know when someone is probing your defenses. Log authentication attempts, data access patterns, and error rates. Ship logs to a service that can alert you when patterns change.

Compliance. If you handle health data (HIPAA), payment card data (PCI DSS), or European user data (GDPR), you have regulatory requirements beyond this checklist. Don’t assume AI-generated code is compliant — check.

Automated scanning. This checklist is manual. Once you’ve passed it, set up automated security scanning in your CI/CD pipeline — SAST, DAST, dependency checks on every pull request. I covered why vibe-coded apps need different scanner configurations than traditional code in Part 6.

Threat modeling. Part 7 covered how to build a threat model before writing code. If you skipped that step, go back and do it now. The checklist catches common issues; a threat model catches the ones specific to your application.


The One Thing to Remember

Every check in this list exists because I’ve seen a vibe-coded app fail on it in production. Not in theory — in production, with real user data exposed. The QuickNote vulnerabilities from this series, the breaches from Part 3, the authentication failures from Part 5 — they all map to items on this list.

AI built your app. It didn’t secure it. That’s your job, and this checklist is the minimum. Run it before launch. Run it again after every major feature. Make it a habit, and your vibe-coded MVP will be more secure than most traditionally coded apps I audit.

As always: trust nothing, verify everything.


Further Reading


References

Posted in AI, Business, Security, Technology | Tagged , , , , | Leave a comment

The AI Strategy Vacuum: Why “We Use ChatGPT” Isn’t a Plan

Read Time: 18 minutes

TL;DR

A CEO tells the board the company is “all in on AI.” Three floors down, here’s what that actually means: marketing is running a chatbot nobody in security has heard of, finance just pasted the quarterly numbers into a personal ChatGPT account, a developer wired an autonomous agent into the production database last week, and HR is drawing up a list of roles to cut because “the AI can do it now.” That isn’t a strategy. It’s a subscription dressed up as one.

A real AI strategy has an owner, in-house expertise, a workforce you amplify instead of fire, clean data underneath it, enforceable policies, an infrastructure plan, and visibility into every model and agent on your network. Most companies in 2026 have almost none of this. They have adoption without governance, tools without owners, and agents nobody is watching.

The numbers back it up. Around 88% of organizations now use AI in at least one business function, but only about a quarter have a real governance framework. Roughly three in four plan to adopt agentic AI within two years, while only one in five can govern the agents they already run. 49% of employees use AI tools their company never approved. And only 7% say their data is actually ready for AI.

That gap between what companies use and what they actually control is the AI strategy vacuum. In my experience it has seven recurring holes. Let’s go through them.


One thing up front. I’m not anti-AI, and I’m not here to talk anyone out of it. I run AI agents every day in my own work. The problem isn’t that companies use AI. It’s that “we use AI” has quietly come to mean “we have an AI strategy,” and those are two very different things, about as different as owning a car and knowing how to drive it.

Hole #1: No One Owns AI — The Missing CAIO

Try this test on your own organization. Who is accountable, by name, for AI strategy, AI risk, and AI ROI? If the honest answer is “well, IT and the CISO and that VP in marketing each handle a piece of it,” then nobody owns it. Shared responsibility for something this big usually means no responsibility at all.

This is why the Chief AI Officer (CAIO) has become the fastest-growing seat in the C-suite. IBM polled 2,000 CEOs worldwide for its 2026 study and found that 76% now report having a CAIO, up from just 26% a year earlier. Heineken, WPP, Nike, and CVS Health have all created the role. The payoff shows up in the data too: companies with a CAIO are close to 3x more likely to reach top-tier AI maturity (Futurum) and see meaningfully higher returns on their AI spend (IBM).

But that 76% flatters the picture. Among large enterprises specifically, only about a quarter have a genuinely dedicated CAIO. Plenty of the rest handed someone the title and nothing else: a “Head of AI” with no budget, no say over procurement, and no authority to kill a bad project.

A CAIO who can’t veto a reckless deployment isn’t a strategy owner. They’re a press release.

The point isn’t the org chart. It’s that without one accountable person, AI decisions default to whoever moves first, which is usually a department expensing a tool on a corporate card. Nobody is weighing speed against risk, nobody is tying AI spend to outcomes, and nobody has a real answer when the board asks what the exposure is. It shows: only about 32% of organizations have any formal process to measure whether their AI investments are working at all. Most are scaling something they can’t even score.

So appoint someone real. Give that person authority over strategy, budget, risk, and procurement, not just the fun “innovation” part, and a remit that crosses IT, security, legal, data, and the business units. Hold them to results and to a number, because the goal was never “we use AI”, it was “AI moved these numbers”. And if you’re too small for a dedicated CAIO, that’s fine, but still name the owner. The diffusion of responsibility is the problem, not the headcount.

Hole #2: No AI Experts In-House — Where’s Your AI Red Team?

A CAIO with no team is a general with no army. The second hole is the near-total absence of in-house AI expertise, and it’s worst on the security side.

When CIO.com asked CIOs what was holding back enterprise AI in its 2026 State of the CIO survey, the top answer, at 40%, was lack of in-house talent. A separate 2026 hiring survey found 91% of organizations prioritizing AI-skilled hires, with AI engineers (39%) ranked the hardest role to fill, just ahead of cybersecurity engineers (38%).

Most of the roles companies are missing barely existed three years ago:

  • The AI Red Team, whose job is to break your own models before someone else does: jailbreaks, prompt injection, model extraction, data poisoning, agent manipulation. Job boards listed more than 2,500 active AI/ML security engineer postings as of March 2026.
  • AI security engineers to lock down the pipeline, from the model supply chain and MCP servers to agent permissions and inference endpoints. About 32% of hiring organizations added them in 2026.
  • AI/ML security specialists (34%) and AI governance analysts (30%), the people who turn policy into actual controls and the evidence an auditor will ask for.

Accenture’s 2026 workforce report puts a finer point on it: for the first time, skills gaps overtook headcount as the top security workforce problem. It isn’t only that you don’t have enough people. It’s that the people you have were trained for a pre-AI world. A firewall admin who has never seen a prompt-injection attack is not your AI Red Team.

And this isn’t only about the specialists. Broad AI literacy across the whole workforce is now the baseline, not a nice-to-have, and in the EU it’s literally the law: the AI Act’s AI-literacy obligation has been in force since February 2025. IBM reckons more than half of employees need upskilling just to keep doing their current jobs well in an AI world. A strategy that trains a tiny elite and leaves everyone else to figure it out on their own is how you get Shadow AI in the first place.

I’ve written before about AI Agent Skill Poisoning and how to weaponize agent skills. Those attacks are invisible to a team without someone who understands how agents actually work under the hood. You can’t defend a threat model you’ve never studied.

So build a standing adversarial testing function, even a small or contracted one, instead of a once-a-year audit. Retrain the security people you already have on AI-specific threats; OWASP’s Top 10 for LLMs and its agentic threat work are free places to start. Hire for the real role when you hire, an AI security engineer or governance analyst, not “AI” bolted onto a generic IT job description. And put a real AI-literacy program in front of everyone else. Treat in-house expertise as a control, not a perk. It’s the only thing standing between a vendor’s claim and your reality.

Hole #3: Firing People Instead of Amplifying Them

Here’s the hole that gets celebrated as strategy in press releases and turns into a quiet rehiring spree six months later.

In 2026, companies aren’t just adopting AI, they’re using it as the reason to cut people. According to Challenger, Gray & Christmas, AI was cited in 87,714 US job cuts through May 2026, around 22% of all layoffs this year — already more than the 54,836 blamed on AI in all of 2025, and by May it had become the single most-cited reason for cuts. Salesforce says AI agents now handle around half its customer interactions and has “rebalanced” headcount accordingly; Block is shrinking from roughly 10,000 employees to 6,000.

The trouble is that a lot of this is a bet on what AI might do, not what it has done. A late-2025 Harvard Business Review survey found most executives cutting on AI grounds were doing it on the technology’s expected potential, not its demonstrated performance. And the bill is already arriving: Forrester found 55% of employers regret their AI-driven layoffs, and Gartner expects that by 2027, half of the companies that cut headcount citing AI will rehire for similar roles — often under new titles, sometimes at lower pay.

The textbook case is Klarna. It replaced roughly 700 customer-service staff with an OpenAI-built assistant and bragged that AI handled two-thirds of all support tickets. Then quality and customer trust fell off a cliff, and the CEO admitted the company had “gone too far.” Klarna is now hiring humans back. The lesson every analyst drew from it is the same: AI should augment people, not replace them.

This is the argument I made in AI Must Make Superhumans, Not Unemployed. As Jensen Huang put it, companies with imagination use AI to do more with more; companies out of ideas just use it to do the same with fewer. Firing your way to an “AI strategy” throws away the one thing the model doesn’t have, your people’s context: who the customers are, why the process exists, where the bodies are buried. Pair that human context with AI and you get something neither can do alone. Strip it out and you’re left with a faster way to produce confident, unaccountable mistakes.

To be fair, this doesn’t mean headcount never legitimately changes. Roles do shift, and some genuinely shrink as work gets automated, and that can be the right call. The mistake is making that call on a bet about what AI might do, before you’ve shown it can, and throwing away your people’s hard-won context in the bargain.

A real strategy here is explicit about it. Decide, out loud, that AI is there to multiply your people’s output, not to thin the ranks. Redeploy the time AI frees up toward higher-value work instead of treating it purely as a cost to extract. Keep humans in the loop on anything that touches customers, money, or judgment. And be deeply suspicious of any “we replaced the team with agents” plan that hasn’t priced in the rehiring, the lost trust, and the institutional knowledge walking out the door.

Hole #4: No Data Foundation

Every one of the holes above sits on top of this one, and it’s the one nobody wants to talk about because it isn’t shiny.

AI runs on your data, and most companies’ data is a mess. According to a 2026 Cloudera and Harvard Business Review Analytic Services report, only 7% of enterprises say their data is completely ready for AI, and other research puts it more bluntly: roughly 93% don’t have AI-ready data, and only about 30% have adequate data governance. Nearly 80% of organizations say data-access problems are actively holding their AI back.

This is why so much AI never makes it out of the lab. Somewhere around 80% of AI projects fail to reach production, about twice the failure rate of ordinary IT projects, and Gartner expects 60% of AI projects that lack AI-ready data to be abandoned through 2026. The model is almost never the problem. The data feeding it is: fragmented across systems, undocumented, ungoverned, full of duplicates and gaps, and impossible to trace.

There’s a security dimension too, and it’s the one that bites quietly. If you don’t know where your sensitive data lives, you can’t keep it out of the prompts. Every Shadow AI leak and every over-permissioned agent in the later holes is, underneath, a data-governance failure. You can’t protect what you haven’t classified.

A data foundation isn’t glamorous, but it’s the work that makes everything else pay off. Know what data you have and classify it by sensitivity. Fix ownership, quality, and lineage so you can answer “where did this come from” for anything an AI touches. Put access controls and retention rules on it before you point a model at it. The companies getting real returns from AI mostly aren’t the ones with the cleverest models. They’re the ones that did this boring work first.

Hole #5: No AI Policies — Usage, Privacy, and the Missing Blacklist

This is the cheapest hole to close and the one left open most often.

The numbers aren’t encouraging. Only 38% of US companies have published an AI policy at all. Close to a third have no AI governance policy whatsoever, with another quarter still “implementing” one. 78% of executives are not strongly confident they could pass an independent AI governance audit within 90 days (Grant Thornton, 2026). On the security side it’s worse: per Salesforce’s 2026 data, 67% of employees already use AI at work but only 18% of organizations have a formal AI security policy.

A real policy framework is not a one-page “please be responsible” memo. It’s a handful of documents people can actually be held to:

  • An acceptable use policy that says which tools are approved, for what, and under what conditions. Cursor for prototyping, fine. Pasting source code into a personal ChatGPT account, no.
  • A data and privacy policy that names the data classes that must never touch an AI system: customer PII, PHI, financials, secrets, anything regulated. This is what stops your customer records and source code from leaking into random tools.
  • An approved list and a blacklist. Almost everyone forgets the blacklist. You need an explicit, maintained list of prohibited tools and models, the unvetted consumer apps, the ones with hostile data-retention terms, the browser extensions that phone home, anything self-hosted with no authentication. A blacklist gives your DLP and proxy something concrete to block.
  • Vendor and model governance covering data residency, retention, the right to audit, and whether your data trains their model.
  • Incident and exception handling: how someone requests a new tool, and what happens when the rules get broken.

If you operate in or sell into Europe, a chunk of this is no longer optional. The EU AI Act is now partly in force: bans on certain practices and the AI-literacy duty have applied since February 2025, the rules for general-purpose AI models since August 2025, and a major compliance date lands on 2 August 2026, with fines reaching up to 7% of global turnover for the worst violations. The high-risk obligations were pushed back to late 2027 and 2028 under the Digital Omnibus, but “we’ll deal with it later” is not a plan when the literacy and transparency clocks are already running.

And it isn’t only Brussels. Member states are layering their own national laws on top. Spain, for example, approved its draft Organic Law for the Good Use and Governance of AI in May 2026, now working its way through parliament. It backs the EU rules with a domestic penalty regime (up to €35M or 7% of global turnover), a mandatory requirement to label deepfakes and AI-generated content, and a national supervisor, AESIA, that has held full sanctioning powers since August 2025 and runs a regulatory sandbox companies can apply to. The United States has no single federal statute but a fast-multiplying patchwork of state laws instead. The practical takeaway: “which AI laws apply to us, in every market we operate in?” is now a question your strategy has to answer, not a hypothetical to park for later.

Here’s the catch with policy on its own: 46% of shadow-AI users say they’d keep using their tools even if the company explicitly banned them. A policy that lives in a PDF nobody reads is theater. To matter, it has to be wired into proxies, DLP, SSO, and OAuth consent controls. Write the core policies, keep them short and specific, map every rule to a control that enforces it, maintain the blacklist as a living document, and give people a fast path to “yes”, because when approval takes three weeks, they route around you.

Hole #6: No Hardware Strategy — Local and Sovereign AI

Most “AI strategies” have the shape of an API. Everything runs on someone else’s GPUs, in someone else’s jurisdiction, under someone else’s terms. That’s fine for a demo. For regulated data, intellectual property, and geopolitical risk it’s a liability, and it means there is no infrastructure plan at all.

I learned this one the hard way. When Anthropic blocked Claude subscriptions in third-party agents earlier this year, my whole agent setup was suddenly hostage to a pricing decision I had no part in. The fix was to own more of my own stack. The same logic scales up: if your entire AI capability can be switched off or repriced by a vendor on a Friday afternoon, that’s not a strategy, it’s a dependency.

2026 is the year sovereign and local AI stopped being a niche concern, and the money makes that obvious. McKinsey now sizes sovereign AI as a market worth $500–600 billion by 2030. NVIDIA’s own sovereign-AI revenue more than tripled to over $30 billion in fiscal 2026. European spending on sovereign-cloud infrastructure is forecast around $12.6 billion this year, an 83% jump, on top of €20 billion earmarked for AI gigafactories under the broader €200 billion InvestAI push. Gartner even coined a word for the reverse migration, geopatriation: pulling data and workloads out of global public clouds and back into local or sovereign environments to manage regulatory and geopolitical risk.

The case for owning some of your own compute comes down to four things. Data residency and compliance get easier when the data never leaves your walls or your jurisdiction. Your prompts, fine-tunes, and proprietary models stay yours instead of sitting on a third party’s training set. Costs become predictable capex for steady, high-volume workloads, rather than per-token opex that climbs with usage. And you stop being one outage, price hike, or policy change away from losing your AI capability overnight.

There’s a sharp edge here, though. Doing this without a strategy is exactly how you create the Shadow AI mess in the next section. A research team that expenses a $4K NVIDIA DGX Spark, plugs it into the network, and runs Ollama bound to 0.0.0.0 with no authentication has not built sovereign AI. They’ve built an exposed attack surface. As of February 2026, researchers found more than 10,000 Ollama instances reachable from the open internet, one in four running a vulnerable version, plenty of them hosting private corporate models. Local AI done deliberately is an asset. Local AI done in the shadows is a breach waiting for its disclosure date.

So decide your tiers on purpose: which workloads can sit on public model-as-a-service, which need a sovereign or regional cloud, and which have to run on-prem, tied to how sensitive the data is. Plan for a long runway, because these migrations take three to four years, and the slow part is organizational, not technical. Route all AI hardware through procurement with IT approval, network segmentation, and a security scan before anything touches the network. And protect private models like the IP they are.

Hole #7: No Agentic Visibility — The Shadow AI You Can’t See

You can’t govern what you can’t see, and on agents most companies are working blind.

I went deep on the mechanics of this in The Shadow Twin Threats: When AI and Vibe Coding Go Rogue in Your Network, the convergence of unsanctioned AI infrastructure (Shadow AI) and unreviewed AI-built applications (Shadow Vibe Coding). The short version is invisible models chewing on your most sensitive data, unvetted apps full of flaws, and no audit trail to reconstruct any of it. Organizations with heavy Shadow AI usage face breach costs averaging $4.63 million, about $670K more per incident than those that keep it under control.

Put autonomous agents on top of that and the visibility problem gets much worse. According to Strata’s 2026 research on agent identity, roughly 80% of organizations running autonomous AI can’t tell you in real time what those systems are doing or who’s responsible for them. Only 21% keep a real-time inventory of active agents, and only 28% can trace an agent’s actions back to a human sponsor. Most still authenticate agents with shared API keys; just 22% treat them as distinct identities. And the gap I find most alarming: a large majority of executives feel confident their current policies cover unauthorized agent actions, while in the field more than half of deployed agents run with no security oversight or logging at all.

That last contrast is the whole problem in miniature. Leadership believes there’s a strategy. The network says otherwise. Gartner expects that by the end of 2027, more than 40% of agentic AI projects will be scrapped, often because the governance problems only surface after something has already broken in production.

It’s worth remembering what an agent actually is: software that takes actions on your behalf. It reads data, calls APIs, moves money, writes and ships code, and increasingly talks to other agents, usually with standing credentials and little supervision. An agent you can’t see, can’t inventory, and can’t trace to an owner is effectively an insider with system access and no manager.

The way out starts with discovery, not policy. Pull AI domains from your DNS and proxy logs, review OAuth app consents in Entra ID and Google Workspace, scan for exposed AI ports (11434 for Ollama, 1234 for LM Studio), and run an anonymous survey to find out what people are really using. Then build a live agent inventory where every agent has a distinct identity, an owner, scoped permissions, and logging, and retire the shared keys. Make every agent action traceable to a human sponsor, because audits and incident response depend on it. And apply least privilege and monitoring to these non-human identities exactly as you would to staff, because they are acting in your name.

“But Won’t All This Slow Us Down?”

This is the objection I hear most, usually from whoever is currently expensing AI tools on a credit card. It’s worth taking seriously, because the fear is real: governance can absolutely turn into a committee that says no to everything and ships nothing.

But the data points the other way. In Grant Thornton’s 2026 survey, the organizations with fully integrated, well-governed AI were the most confident they could pass an audit and were getting better returns, not worse. That’s not a coincidence. Governance is what lets you say yes quickly and safely, because there’s an approved-tools list, a data policy, and an owner who can make a call. The companies that feel “slowed down” by governance are usually the ones bolting it on after an incident, as cleanup, instead of building it in as a fast lane.

Speed and control aren’t opposites here. The Klarna reversal, the abandoned AI projects, the breach disclosures, those are what slow you down. A strategy is how you go fast without driving into a wall.

The Pattern: Adoption Without Strategy

Step back and the seven holes are really one failure in seven costumes:

What companies have What strategy requires
ChatGPT and Copilot licenses A named owner accountable for AI risk and ROI (CAIO)
Vendor promises In-house expertise, an AI Red Team that can verify them
Layoff press releases A workforce amplified by AI, not replaced by it
Data scattered across silos An AI-ready, governed data foundation
A “be responsible” memo Enforceable usage, privacy, and blacklist policies
Everything on someone else’s GPUs A deliberate local and sovereign infrastructure plan
Confidence that it’s “handled” Real-time visibility into every model and agent

The through-line is that roughly nine in ten companies have adopted AI while only about a quarter have built the governance to match. They bought the tool and skipped the strategy.

None of this argues against AI. If anything it argues the opposite. AI is too powerful and too deep into regulated work to keep running it the way most companies do now, improvised, unowned, unmonitored, and undocumented. The companies that win this decade won’t be the ones that adopted fastest. They’ll be the ones that governed it well enough to scale it safely.

Where to Start

Seven holes is a lot to stare at, so don’t try to fill them all at once. The order matters more than the speed.

  1. Name the owner. Nothing else gets sequenced until someone is accountable. Week one, not next quarter.
  2. Discover what you already have. Before you write a single policy, find the Shadow AI: query DNS and proxy logs, review OAuth consents, scan for exposed AI ports, and run an anonymous survey. You’re governing reality, not a wish.
  3. Write the policies and wire them to controls. Acceptable use, data and privacy, the blacklist. Short, specific, enforced, EU AI Act-aware if Europe is in scope.
  4. Fix the data foundation in parallel. Classify and govern the data your models will touch. This is slow, so start it early and let it run alongside everything else.
  5. Build the expertise and the literacy. A small red team, AI-aware security staff, and a literacy program for everyone else.
  6. Plan the infrastructure. Decide your public/sovereign/on-prem tiers and bring hardware procurement under control.
  7. Get agent visibility and keep it. A live inventory, distinct identities, traceability to a human. This never “finishes.”

And running underneath all of it: treat AI as a way to make your people superhuman, not redundant. That’s a posture, not a project, and it colors every decision above.

The Bottom Line

“We use ChatGPT” answers the wrong question. The real one is whether you can name who owns your AI, prove it’s making your people better instead of just fewer, and produce a live inventory of every model and agent on your network. If you can’t answer those, you don’t have an AI strategy. You have an AI subscription and a quietly growing pile of risk.

The good news is that none of these seven holes is exotic. They’re the unglamorous, doable work of governance, and the companies that do it are the ones still standing when the first wave of AI-governance incidents hits the headlines.

The app built in twenty minutes, the agent nobody inventoried, the team fired in favor of a bot that gets quietly rehired six months later, those are tomorrow’s cautionary tales. Strategy is what keeps your company out of the next one.

Further Reading:

Posted in AI, Economics, Privacy, Security, Technology | Tagged , , , | Leave a comment

Prompt Engineering for Secure Code (Part 7)

Vibe Coding Security Series

  1. What Is Vibe Coding Security? A Field Guide for 2026
  2. The OWASP Top 10 for Vibe-Coded Applications
  3. Anatomy of a Vibe Coding Breach: Lessons from 2026’s Worst Incidents
  4. The Dependency Trap: Supply Chain Risks in AI-Generated Code
  5. Authentication & Secrets: What AI Gets Wrong Every Time
  6. Scanning Vibe-Coded Apps: Why Traditional SAST/DAST Falls Short
  7. Prompt Engineering for Secure Code (you are here)
  8. The Founder’s Security Checklist
  9. Securing the AI Coding Pipeline (coming soon)
  10. The Future of Vibe Coding Security (coming soon)

Read Time: 21 minutes

TL;DR

AI models already know how to write secure code — they identify 78.7% of their own vulnerabilities when asked to review. The problem is they don’t apply that knowledge by default. Five prompting strategies close the gap: role-setting, reverse prompting, threat-model-first prompting, negative constraints, and iterative repair. Targeted security prompts reduce vulnerabilities by up to 56%. This post covers what works, what doesn’t, and how to make security instructions permanent through instruction files.


The Gap Between What AI Knows and What AI Does

Here’s the most important finding in AI code security this year. An April 2026 study formally verified 3,500 code artifacts across seven LLMs using Z3 SMT solver. The results: 55.8% of artifacts contained at least one verified vulnerability. GPT-4o was worst at 62.4% vulnerable. Gemini 2.5 Flash was best at 48.4%. No model scored better than a D.

But the study had a second finding that changes everything. When the researchers asked the same models to review their own output for vulnerabilities, the models correctly identified the problems 78.7% of the time. The model that just wrote a SQL injection could explain why it was dangerous and how to fix it — when asked.

The researchers call this the “generation-review asymmetry.” I call it the gap between what AI knows and what AI does. The model has the security knowledge. It just doesn’t activate it during generation. Default prompts optimize for functionality — “build me a login page” gets you a login page that works. Whether it’s secure is a secondary concern the model doesn’t consider unless you tell it to.

This asymmetry is exactly what prompt engineering exploits. You’re not teaching the model something new. You’re activating knowledge it already has.

The baseline is bad. CodeRabbit’s analysis of 470 real-world pull requests found that AI-generated code has 2.74x higher vulnerability density than human-written code, with 1.4x more critical security issues. Veracode tested over 100 LLMs and found they fail to prevent XSS in 86% of test cases. By mid-2025, Apiiro’s analysis of thousands of repositories showed AI code adding over 10,000 new security findings per month — a 10x increase from six months earlier.

The gap is real. The question is whether prompting can close it.


Why “Write Secure Code” Doesn’t Work

The intuitive approach — adding “make sure the code is secure” to your prompt — doesn’t do much. A 2026 study ran chi-square tests on code generated with and without simple security prefixes and found no statistically significant improvement in several configurations. Worse, a weaknesses-aware Chain-of-Thought approach — where the prompt listed specific vulnerability types to avoid — failed to reduce vulnerabilities in any statistically significant way, and in some configurations the numbers actually went up. The researchers found that overloading the prompt with security concerns primarily shifted which vulnerability types appeared rather than reducing the total count, and can degrade the model’s ability to generate functional code, introducing bugs that create new attack surfaces.

Generic security instructions fail for the same reason generic coding instructions fail. “Write good code” produces the same output as no instruction at all. The model needs specifics: what threats apply to this feature, what patterns to avoid, what security controls to implement, and in what order.

Bruni et al. (February 2025) showed what happens when you get specific. Their benchmarks across GPT-3.5-turbo, GPT-4o, and GPT-4o-mini found that targeted security-focused prompt prefixes — ones that named specific vulnerability classes and described concrete defensive patterns — reduced vulnerabilities by up to 56%. Iterative prompting, where you feed vulnerability findings back to the model and ask it to repair its own output, fixed between 41.9% and 68.7% of issues.

The takeaway: specificity matters more than intent. “Be secure” does nothing. “This endpoint must validate that the authenticated user owns the requested resource before returning data, and must return 403 if ownership verification fails” changes the output.


Five Strategies That Work

These aren’t theoretical. I use variations of all five at VULNEX when working with AI coding tools, and the first two — role-setting and reverse prompting — are the backbone of how I approach every engagement.

Strategy 1: Role-Setting

Before asking an AI to write or review code, I set its role explicitly. Not a vague “you’re helpful” — a specific professional identity that activates domain expertise.

For code generation:

“You are a senior developer with years of experience building secure products. You follow security best practices by default: input validation, parameterized queries, proper authentication and authorization checks, secure secret management, and defense in depth.”

For security review:

“You are a senior pentester and cybersecurity expert. Your job is to find every vulnerability, misconfiguration, and security weakness in this code. Think like an attacker. Report what you find with severity ratings and remediation guidance.”

The key is one role per task. When building, the model thinks like a security-conscious developer. When reviewing, it thinks like an attacker. Mixing the two dilutes both. A developer worrying about attacks while writing code produces defensive but brittle implementations. An attacker reviewing code while thinking about functionality misses vulnerabilities that conflict with feature requirements.

Role-setting works because LLMs adjust their output distribution based on the persona they’re given. A “senior pentester” prompt activates patterns the model learned from security research, vulnerability reports, and penetration testing documentation. A “junior developer” prompt — or no role at all — activates patterns from Stack Overflow answers and tutorial code, which is where most insecure defaults come from.

Strategy 2: Reverse Prompting

Most people use AI coding tools in one direction: “Build me X.” Reverse prompting flips it. Instead of telling the model what to build, you ask it questions — and you do it in both directions.

Before writing code, I interrogate the model about the problem space:

“I need to build a multi-tenant API where users can only access their own data. Before writing any code: what are the top security risks for this kind of system? What authentication and authorization model should I use? What are the common mistakes developers make with multi-tenant data isolation?”

The model’s answers are often excellent — remember, it identifies 78.7% of vulnerabilities in review mode. By asking it to think about threats before generating code, you front-load that security knowledge into the generation context. The code it writes afterward is informed by the threat analysis it just produced.

After generating code, I question the output:

“Review the code you just wrote. What vulnerabilities does it have? How would an attacker bypass the authentication? What edge cases could lead to data leakage? What’s missing from this implementation that a production system would need?”

This exploits the generation-review asymmetry directly. The model generated code with some security blind spots. Now you’re asking it to activate review mode on its own output. It will flag issues it just introduced — not all of them, but a substantial percentage.

The two-direction approach creates a feedback loop. Pre-code questions shape the model’s understanding of what matters. Post-code questions catch what slipped through. Together, they narrow the gap between what the model knows and what it produces.

Strategy 3: Threat-Model-First Prompting

This builds on reverse prompting but makes the threat model explicit in the code request itself. Instead of asking the model to generate a feature and hoping it considers security, you describe the threat landscape as part of the prompt.

Without threat context:

“Build a REST API endpoint that lets users update their profile information.”

With threat context:

“Build a REST API endpoint that lets users update their profile information. This is a multi-tenant SaaS application. Assume attackers will attempt: IDOR (accessing other users’ profiles by changing the user ID), privilege escalation (modifying role or permission fields), mass assignment (sending fields the API shouldn’t accept like isAdmin), and injection through profile fields displayed to other users. The endpoint must validate ownership, whitelist allowed fields, sanitize all input, and log modification attempts.”

The same model, the same task — but the second prompt produces code with authorization checks, field whitelisting, input sanitization, and audit logging that the first prompt almost certainly omits. The model didn’t learn anything new between the two prompts. The threat context activated security patterns it already had.

For the vulnerability classes I covered throughout this series — the missing auth checks from Part 5, the architectural blind spots from Part 6 — threat-model-first prompting is the most direct prevention. You’re telling the model exactly what can go wrong before it writes a single line.

Strategy 4: Negative Constraint Prompting

AI models follow prohibitions more consistently than open-ended guidance. “Be secure” is vague. “Do NOT do these specific things” is concrete and verifiable.

“Build the authentication system for this Express.js application. Constraints:

  • Do NOT store tokens in localStorage (use httpOnly cookies)
  • Do NOT use MD5 or SHA-1 for password hashing (use bcrypt with cost factor 12+)
  • Do NOT skip server-side input validation even if client-side validation exists
  • Do NOT hardcode API keys, database credentials, or secrets anywhere in the code
  • Do NOT set CORS to allow all origins
  • Do NOT disable Supabase RLS or Firebase security rules
  • Do NOT create JWT tokens without an expiration time”

This works because constraints are binary — the model either followed them or it didn’t. You can verify compliance mechanically. And the constraints directly target the patterns I’ve documented across this series: the localStorage tokens from Part 5, the missing RLS from the QuickNote example, the hardcoded secrets that SAST can’t always catch.

Build your constraint list from your own vulnerability history. Every security issue you’ve found in AI-generated code becomes a “Do NOT” for future prompts. Over time, your constraint list becomes a negative-space security policy — the inverse image of every mistake the AI has made.

Strategy 5: Iterative Repair Prompting

This is the only strategy with direct benchmarks. Bruni et al. tested generating code, scanning it, feeding the scan results back to the model, and asking for repairs. The best configurations repaired between 41.9% and 68.7% of vulnerabilities.

The practical workflow:

  1. Generate code with your chosen AI tool
  2. Run Semgrep: semgrep --config=p/security-audit --json ./src > findings.json
  3. Feed the findings back: “Here are the Semgrep security findings for the code you just wrote. Fix each issue. For each fix, explain what the vulnerability was and why your fix resolves it.”
  4. Run Semgrep again on the output
  5. Repeat until clean or diminishing returns

Combining this with role-setting amplifies the effect. Instead of “fix these findings,” try: “You are a senior security engineer. Here are the Semgrep findings from a code review. For each finding, determine if it’s a true positive or false positive. For true positives, provide the fix. For false positives, explain why the alert is incorrect.”

The false positive distinction matters. As I covered in Part 6, SAST tools flag 68–75% of safe code as vulnerable. Having the model filter the noise before acting on it produces better repairs than blindly fixing every alert.


Making It Permanent: Instruction Files

The five strategies above work in conversation. But nobody re-types a threat model and constraint list for every prompt. The practical answer is instruction files — permanent security prompts that apply to every interaction with your AI coding tool.

Claude Code

Claude Code supports a security guidance plugin that reviews code at three levels: per-edit pattern matching (no model call, zero cost), end-of-turn diff review, and a deeper agentic review on each commit. You configure it through a .claude/claude-security-guidance.md file that describes your threat model in plain language. The plugin catches injection, unsafe deserialization, and DOM vulnerabilities before they reach a pull request — the reviewer runs as a separate model call with a fresh context, so it’s not grading its own work.

Beyond the plugin, Claude Code reads project-level instructions from CLAUDE.md files. You can embed your role-setting, constraints, and threat model directly:

# Security Requirements

You are a senior developer building a multi-tenant SaaS application.
Every API endpoint MUST:
- Verify authentication (valid JWT with expiration check)
- Verify authorization (user owns the requested resource)
- Validate and sanitize all input
- Return 403 for unauthorized access, not 404
- Log access attempts for security-sensitive operations

Do NOT:
- Store secrets in environment variables baked into Docker images
- Use localStorage for authentication tokens
- Disable RLS on any Supabase table
- Create endpoints without rate limiting

GitHub Copilot

Copilot reads from copilot-instructions.md in the .github directory, with support for path-scoped *.instructions.md files. The community has built OWASP-aligned rulesets with 55+ anti-patterns and “Do Not Suggest” blocklists covering eval(), inline SQL, insecure deserialization, and more. The github/awesome-copilot repository has a ready-to-use template.

Cross-Tool Security Rules

SecureCodeWarrior publishes open-source security rule files compatible with Copilot, Cursor, Windsurf, and other AI assistants. Robotti.io maintains customizable rulesets for Java, Node.js, C#, and Python that block risky patterns at the IDE level. Trail of Bits published Claude Code skills for security workflows including CodeQL and SARIF integration.

The practical step: pick the instruction file format for your primary AI coding tool, start with one of the open-source security rulesets, and customize it with your own constraints. Every “Do NOT” from Strategy 4 belongs in this file. Every lesson from a security review becomes a permanent instruction.


The Attack Surface You Just Created

Instruction files are powerful, which makes them a target. If someone can modify your instruction file, they control what the AI generates for your entire project.

The Rules File Backdoor attack (CVE-2025-53773), disclosed by Pillar Security in March 2025, demonstrated exactly this. Researchers embedded hidden Unicode characters — bidirectional text markers and zero-width joiners — inside Copilot and Cursor configuration files. These invisible characters contained instructions that manipulated the AI’s code generation: injecting backdoors, disabling security checks, exfiltrating data through generated code. The configuration file looked clean to human reviewers. The AI read the hidden instructions and followed them.

Trail of Bits demonstrated prompt injection attacks achieving remote code execution in three agent platforms. VentureBeat reported in 2026 that three AI coding agents leaked secrets through a single prompt injection. The attack surface isn’t theoretical.

The defense is straightforward: treat instruction files like code. Review them in pull requests. Audit them for hidden characters (cat -v shows control characters, file shows unusual encodings). Pin them under version control. Don’t accept instruction files from untrusted sources — a shared project template with a poisoned .github/copilot-instructions.md is the software supply chain attack adapted for the AI era.


Putting It Together: A Complete Workflow

The five strategies aren’t five separate techniques — they’re stages in a pipeline. Here’s how I approach it at VULNEX when building or reviewing AI-generated code.

Step 1: Set the role. Before anything else, establish the LLM’s identity. For building: senior developer with security expertise. For reviewing: senior pentester.

Step 2: Reverse-prompt the problem. Before writing code, ask the model about the security landscape. “What are the top risks for this feature?” “What authentication model fits this use case?” “What mistakes do developers typically make here?” Use the answers to inform your code request.

Visualizing the threat model. You can take Step 2 further by asking the model to produce a formal threat model you can render as a diagram. At VULNEX we built usecvislib, an open-source security visualization library that generates STRIDE threat models, attack trees, and other security diagrams from TOML configuration files. The prompt becomes:

“Based on the security risks you identified, generate a STRIDE threat model for this application in usecvislib TOML format. Include externals, processes, datastores, dataflows, trust boundaries, and threats with CVSS 3.1 vectors.”

The model produces something like this (trimmed for brevity):

[model]
name = "QuickNote Threat Model"
description = "STRIDE threat model for note-taking SaaS"
type = "Threat Model"

[externals.user]
label = "User"
description = "Authenticated app user"

[externals.attacker]
label = "Attacker"
description = "Unauthenticated malicious actor"

[processes.api_server]
label = "API Server"
description = "Express.js REST API"

[processes.auth_service]
label = "Auth Service"
description = "Supabase Auth"

[datastores.postgres]
label = "PostgreSQL"
description = "Supabase DB with RLS policies"

[dataflows.login]
from = "user"
to = "api_server"
label = "Login Request"

[dataflows.note_query]
from = "api_server"
to = "postgres"
label = "Note Query"

[boundaries.internet]
label = "Internet"
elements = ["user", "attacker"]

[boundaries.backend]
label = "Backend Services"
elements = ["auth_service", "postgres"]

[threats.brute_force]
element = "api_server"
threat = "No rate limiting on /api/login enables brute force"
mitigation = "Rate limit to 5 attempts/minute per IP"
cvss_vector = "CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N"

[threats.idor_notes]
element = "note_query"
threat = "User modifies note ID to access other users' data"
mitigation = "Verify resource ownership before returning data"
cvss_vector = "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:N"

[threats.token_theft]
element = "login"
threat = "localStorage token accessible to injected scripts"
mitigation = "Store tokens in httpOnly secure cookies"
cvss_vector = "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:H/I:N/A:N"

[threats.disabled_rls]
element = "postgres"
threat = "RLS policies disabled, no row-level access control"
mitigation = "Enable RLS, test policies with different tenant contexts"
cvss_vector = "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H"

Then render it: usecvis -m 1 -i quicknote_threat.toml -o quicknote_threats -f png -r. You get a data flow diagram with trust boundaries, CVSS-scored threats, and color-coded severity — a visual artifact that makes security risks concrete for the whole team:

quicknote_threat_model

The -r flag also generates a written threat report. The threats the model identified in this diagram become the exact constraints you feed into the next step.

Step 3: Write the prompt with threat context and constraints. Combine threat-model-first prompting with negative constraints. Describe what you’re building, what threats apply, and what the code must not do.

Step 4: Reverse-prompt the output. After the model generates code, switch to review mode. “What vulnerabilities does this have?” “How would you bypass this auth check?” “What’s missing?” Feed the model’s own critique back into the next iteration.

Step 5: Run automated scans and iterate. Semgrep, npm audit, the pipeline from Part 6. Feed findings back to the model with a security engineer role. Repair, re-scan, repeat.

Step 6: Encode lessons as permanent instructions. Every vulnerability you find — through reverse prompting, automated scanning, or manual review — becomes a constraint in your instruction file. The instruction file grows with every project, capturing your team’s security knowledge in a form the AI applies automatically.

To make this concrete, here’s a before/after using the login endpoint from QuickNote (Part 5).

Naive prompt:

“Build a login endpoint for my Express.js app with Supabase.”

This is what produced the QuickNote vulnerabilities: no rate limiting, no token expiration, credentials in environment variables baked into the Docker image, RLS disabled. Here’s a representative output:

// Naive prompt output — typical AI-generated login
app.post('/api/login', async (req, res) => {
  const { email, password } = req.body;
  const { data, error } = await supabase.auth.signInWithPassword({
    email, password
  });
  if (error) return res.status(401).json({ error: 'Invalid credentials' });
  res.json({ token: data.session.access_token, user: data.user });
});

No rate limiting — an attacker can try thousands of passwords per minute. The token goes straight to the response body, where the frontend stores it in localStorage (accessible to any XSS). No input validation. No token expiration check. The Supabase credentials are baked into the Docker image at build time. This is what the model generates by default.

Security-aware prompt (using the pipeline):

“You are a senior developer with years of experience building secure authentication systems. Build a login endpoint for an Express.js app using Supabase. This endpoint will be publicly accessible on the internet. Assume attackers will attempt brute-force attacks, credential stuffing, and token theft. The endpoint MUST: rate-limit login attempts (max 5 per minute per IP), use bcrypt for password verification, return JWT tokens with a 1-hour expiration set via expiresIn, store tokens in httpOnly secure cookies (not localStorage), validate all input server-side. Do NOT hardcode any secrets — use environment variables loaded at runtime only. Do NOT disable Supabase RLS on any table. After generating the code, list any remaining security concerns I should address.”

Same model, same task — different output:

// Security-aware prompt output — same model, same task
const loginLimiter = rateLimit({
  windowMs: 60 * 1000, max: 5,
  message: { error: 'Too many login attempts, try again later' }
});

app.post('/api/login', loginLimiter, async (req, res) => {
  const { email, password } = req.body;

  // Server-side input validation
  if (!email || typeof email !== 'string' || !email.includes('@')) {
    return res.status(400).json({ error: 'Valid email required' });
  }
  if (!password || typeof password !== 'string' || password.length < 8) {
    return res.status(400).json({ error: 'Invalid password format' });
  }

  const { data, error } = await supabase.auth.signInWithPassword({
    email: email.trim().toLowerCase(),
    password
  });
  if (error) {
    logger.warn('Failed login attempt', { email, ip: req.ip });
    return res.status(401).json({ error: 'Invalid credentials' });
  }

  // Token in httpOnly cookie, not response body
  res.cookie('session', data.session.access_token, {
    httpOnly: true, secure: true, sameSite: 'strict',
    maxAge: 3600000 // 1 hour
  });
  res.json({ user: { id: data.user.id, email: data.user.email } });
});

Rate limiting. Input validation. Token in an httpOnly cookie, not the response body. Failed attempts logged. Email normalized. The model didn’t learn anything new between the two prompts — the security-aware prompt activated what it already knew.


The Prompt Engineering Checklist

  1. Set a specific professional role before every code generation or review task — “senior developer” for building, “senior pentester” for reviewing
  2. Reverse-prompt before coding: ask the model to identify security risks, recommend auth models, and flag common mistakes for your specific feature
  3. Include threat context in every code request: name the threats (IDOR, XSS, injection, brute force) and specify the attack surface (public API, multi-tenant, handles payments)
  4. Add negative constraints for your stack’s known pitfalls: “Do NOT use localStorage for tokens,” “Do NOT disable RLS,” “Do NOT skip server-side validation”
  5. Reverse-prompt after code generation: ask the model to review its own output as a pentester and list what’s missing or vulnerable
  6. Run Semgrep and feed findings back with a security engineer role — don’t just say “fix these,” ask it to distinguish true positives from false positives
  7. Create an instruction file (.claude/claude-security-guidance.md, .github/copilot-instructions.md, or equivalent) with your permanent security constraints
  8. Start with an open-source security ruleset (SecureCodeWarrior, Robotti.io, Trail of Bits skills) and customize it
  9. Audit instruction files for hidden characters and treat them as security-critical code in version control
  10. Add every vulnerability you discover to your constraint list — your instruction file should grow with every project and every security review

If You Do Nothing Else

Ten checklist items and a six-step pipeline can feel like a lot when you’re a solo founder shipping a feature at midnight. Here’s the minimum: set a role and add three constraints.

“You are a senior developer building a secure web application. Build [your feature]. Do NOT store tokens in localStorage. Do NOT skip server-side input validation. Do NOT hardcode secrets.”

That’s it. One sentence of role-setting plus three “Do NOT” constraints tailored to your stack. It takes ten seconds to type and covers the vulnerabilities I see most often in vibe-coded apps. Add the reverse-prompt step when you have time — ask the model to review its own output as a pentester. Those two moves alone close a surprising amount of the gap.

On prompt length: there’s a point of diminishing returns. The Kharma study showed that overloading a prompt with security concerns can degrade functional code quality — the model tries to satisfy too many constraints at once and introduces logic bugs. In practice, I keep security prompts under a paragraph for individual code requests. If you need more than five or six constraints, that’s a sign to move them into an instruction file where they apply automatically rather than cramming them into every prompt.


What You Should Take From This

Prompt engineering for security isn’t about tricking the model into being careful. It’s about activating knowledge the model already has. The generation-review asymmetry — 55.8% vulnerable output, 78.7% detection in review — tells us the security knowledge is there. The default prompt just doesn’t ask for it.

The five strategies in this post close that gap from different angles. Role-setting activates domain expertise. Reverse prompting forces the model to think about threats before and after generation. Threat-model-first prompting gives the model the context it needs to make secure architectural decisions. Negative constraints prevent the specific mistakes you’ve seen before. Iterative repair catches what slipped through.

None of this replaces the manual review I described in Part 6. A well-prompted model still misses roughly 20% of its own vulnerabilities in review mode, and architectural issues like broken authorization logic require human judgment. But a well-prompted model produces code that’s measurably safer — up to 56% fewer vulnerabilities — and that narrows the gap the manual review needs to cover.

My workflow at VULNEX: role first, questions second, code with constraints third, review fourth, scan fifth, and encode everything I learn into instruction files that make the next project start from a stronger baseline. The instruction file is the compound interest of security knowledge — every engagement makes the next one more secure by default.

As always: trust nothing, verify everything.


Further Reading


References

Posted in AI, Pentest, Security, Technology | Tagged , , , , | Leave a comment