Breaking News: Malware-Slop - Malicious npm Package Leaks its own GitHub Token

Vibe Coding Security: Why 62% Of AI-Generated Code Ships With Vulnerabilities

AI-generated coding assistant creating software while a cracked security wall highlights vulnerabilities in vibe coding

TL;DR

  • AI models prioritize “making it work” by mimicking training data, often defaulting to insecure string concatenation and legacy patterns that bypass modern safety protocols.
  • Developers shipping prompt code without a deep technical understanding miss non-functional requirements like Row Level Security (RLS) and input validation, leading to massive data exposures.
  • Fragmented SAST and DAST tools fail to keep pace with AI. The OX Platform serves as a Unified Control Plane to eliminate the blind spots between AI coding and runtime.
  • Models frequently insert unverified dependencies and hardcoded secrets from their training sets, introducing exploitable CVEs and credentials directly into the application core.
  • Static analysis cannot verify if database access policies or cloud storage permissions are active; you must simulate adversarial attacks to confirm the system actually resists unauthorized access.
  • To maintain velocity without catastrophe, security context must be embedded into the prompting workflow, preventing vulnerable patterns before the AI ever outputs a single line of code.

A production application can pass every functional test, deploy successfully to staging, and serve thousands of users before anyone notices the authentication bypass sitting in the login flow. The code works. Database queries execute within milliseconds. API responses return well within SLA thresholds. But the Row Level Security policies designed to restrict unauthorized data access were never configured on the database.

The Moltbook breach made headlines in January 2026 when its founder, who built an AI social network using only AI coding tools without writing a single line of code himself, exposed 1.5 million API authentication tokens plus 35,000 email addresses within 72 hours of launch. The vulnerability wasn’t sophisticated. Any experienced engineer conducting a standard code review would have caught the configuration gap. But no code review happened because the entire application emerged from conversational prompts fed to AI coding assistants.

Research confirms the scale of the problem; Gartner warns that by 2028, prompt-to-app approaches adopted by citizen developers will increase software defects by 2500%, triggering a software quality and reliability crisis. Meanwhile, a Gartner survey of 175 employees conducted between May and November 2025 found that over 57% use personal GenAI accounts for work purposes and 33% admit inputting sensitive information into unapproved tools.

AI coding tools have collapsed the timeline from concept to deployed code. Security validation hasn’t compressed along with it. The disconnect creates a class of risk where applications function perfectly but remain fundamentally insecure. We’ll examine the specific security failures appearing in AI-generated code, why traditional security tooling misses them, and where the actual exposure lives in production systems.

What Makes Vibe Coding Different From Traditional Development

The development model has shifted from structured engineering to conversational iteration, and security assumptions built over decades no longer hold.

The Conversational Development Loop

Vibe coding replaces architectural planning with natural language prompts. A developer describes desired functionality, the AI generates working code, and iteration continues until observed behavior matches expectations. No security requirements gathering occurs. No threat modeling session happens. Structured code review becomes optional rather than mandatory.

The AI model optimizes exclusively for functional correctness. Security properties like input validation, proper authentication, and access control enforcement count as non-functional requirements. They don’t affect whether a feature demonstrates the requested behavior during basic testing. A file upload endpoint can successfully store files while accepting executable payloads. An API can return correct data while leaking records belonging to other users through parameter manipulation.

Engineers writing code by hand understand what each function does, why it exists, and how components interact across the system. When you write a database query manually, you know parameterization prevents injection attacks. When you implement authentication, you understand tokens need validation and sessions require secure handling.

Why Functional Correctness Doesn’t Equal Security

Diagram showing how AI-generated code passes functional tests while hiding security risks like missing RLS policies and hardcoded keys

A login endpoint can authenticate legitimate users successfully while remaining vulnerable to SQL injection. Testing with valid credentials confirms the happy path works. Nobody tests with malicious input unless security verification is explicitly part of the development process.

Traditional QA teams catch functional defects. They verify features behave as specified. Security defects require adversarial thinking. Someone needs to ask: what happens when input is crafted to exploit rather than use? When development happens through AI-generated suggestions accepted without deep comprehension, adversarial testing rarely occurs.

The Comprehension Gap Between Vibe Coding And Traditional Development

AI-assisted developers often accept generated code without understanding implementation details. The code passes functional tests using normal input patterns, so deployment proceeds. Security properties that experienced developers enforce through habit and training never get implemented because the developer accepting the code lacks the expertise to recognize what’s missing.

Vibe coding creates a knowledge barrier between the person deploying code and the security implications of how it’s written. When you don’t understand how authentication middleware validates tokens, you can’t evaluate whether the AI-generated implementation is secure. When you can’t trace data flow through the application, you can’t assess whether access controls actually restrict unauthorized queries.

The Vulnerability Landscape In AI-Generated Code

Multiple independent research efforts have documented consistent security failure patterns across AI coding platforms.

Academic Research Shows Consistent Failure Patterns

Georgetown’s CSET found that 86% of AI-generated code failed XSS defense mechanisms. The failure rate isn’t marginal. Cross-site scripting vulnerabilities appear in nearly nine out of ten code samples. CodeRabbit’s analysis showed AI-generated code contains 2.74 times more cross-site scripting vulnerabilities than human-written code.

The Escape.tech study scanned 5,600 applications built with vibe coding tools and found over 2,000 vulnerabilities, 400+ exposed secrets (API keys, credentials, tokens), and 175 instances of exposed PII. These numbers represent exploitable vulnerabilities in production applications, not theoretical risks flagged by overly sensitive static analysis.

A separate study by Tenzai in December 2025 tested 15 applications across five major AI coding platforms including Cursor, Claude Code, Replit, Devin, and OpenAI Codex. Results: every single tool introduced Server-Side Request Forgery vulnerabilities. Zero of the 15 applications implemented CSRF protection. Zero set any security headers.

Research from Carnegie Mellon University found that while 61% of AI-generated code functions correctly, only 10.5% passes security review. Fewer than 11 out of every 100 AI-generated code snippets meet basic security standards. The remaining 89 contain exploitable vulnerabilities or fail to follow established security practices.

Why AI Models Generate Insecure Code

Infographic explaining how AI coding models replicate insecure code patterns and introduce vulnerabilities like XSS and weak encryption

AI coding models train on public repositories scraped from GitHub, GitLab, and similar platforms. These repositories contain decades of insecure code patterns, deprecated cryptographic libraries, and example code explicitly marked as unsuitable for production use. When an AI generates an authentication flow, statistical likelihood favors reproducing patterns observed during training, including insecure implementations.

Models lack understanding of context, risk, or business logic. They generate patterns based on training data, which contains insecure code, outdated practices, and ambiguous logic. An AI model doesn’t understand MD5 is cryptographically broken. The model knows MD5 appears frequently in hashing examples throughout its training corpus, so MD5 gets generated when hashing is requested.

Veracode tested whether AI models would choose secure versus insecure implementation methods when both options were available. The models selected insecure options 45% of the time. Nearly half of all implementation decisions favor insecurity when the model faces a choice.

Injection Vulnerabilities Remain The Most Common Class

When a model produces database queries, command executions, or client-side script handling, it often defaults to simple string concatenation. String concatenation appears most frequently in training data. Parameterized queries require more complexity, so models generate the simpler, insecure approach first.

SQL injection, command injection, and XSS share identical root causes. The AI generates code functional with trusted input but catastrophically broken under malicious input. Testing uses valid input exclusively, so vulnerabilities remain invisible until exploitation occurs in production.

The pattern repeats across injection classes. Database queries use string building instead of prepared statements. Shell commands concatenate user input directly into execution strings. HTML rendering inserts variables without encoding. Each follows the statistical distribution of the training data rather than security best practices.

Cryptographic Mistakes From Outdated Training Data

AI-generated code might suggest weak or outdated algorithms such as MD5 or SHA1, hardcode encryption keys, use predictable random number generators, or misconfigure encryption modes. These aren’t random errors. They’re patterns learned from legacy codebases where such practices were once industry standard.

The model has zero awareness that security standards evolved. It generates what statistically matches training distribution. When encryption is requested, the model produces code similar to examples it observed during training. If those examples used ECB mode or hardcoded initialization vectors, the generated code will too.

Research examining cryptographic implementations in AI-generated code found security-focused prompts actually produced the highest percentage of cryptographic errors. Explicitly requesting secure implementations doesn’t prevent the model from generating weak cryptography if weak patterns dominate its training data.

Secrets Exposure Through Pattern Reproduction

AI models are prone to generating code containing secrets because models replicate patterns found in training data, such as example API keys, environment variables, database passwords, or authentication tokens. The model doesn’t comprehend that these values are sensitive. Pattern recognition sees API integration code includes a key format, so similar code gets generated.

GitHub’s secret scanning reports millions of leaked credentials annually. AI-generated code contributes measurably to this problem. The model might directly insert what resembles a real API key because the prompt described an integration. Configuration files get created with passwords. Placeholder values look identical to actual credentials.

Secrets can leak through multiple pathways. Direct insertion into source files. Configuration templates with example values too realistic to distinguish from production credentials. Environment variable definitions with sample secrets. Each happens because the model reproduces patterns without understanding sensitivity.

Supply Chain Risk From Automatic Dependency Insertion

AI coding assistants frequently recommend dependencies or automatically insert import statements, introducing supply chain risks because developers may not notice when the AI adds new libraries to the codebase. These dependencies bring transitive packages potentially containing known CVEs.

AI models trained on repositories using older package versions might recommend libraries with publicly disclosed vulnerabilities. The model doesn’t check CVE databases. Code generation matches statistical patterns from training data. If outdated packages appear frequently in the corpus, they’ll get recommended despite known security flaws.

Dependencies also introduce license risk and unmaintained code. The AI might suggest a package abandoned years ago because it appeared in enough training examples. Supply chain attacks targeting specific packages become amplified when AI tools recommend vulnerable versions across thousands of developers.

Where Traditional Security Tools Fail

Visualization of AI coding speed outpacing traditional security testing and creating unvalidated production risks

Security tooling built for human-authored code makes assumptions violated by AI-generated development workflows.

SAST And DAST Weren’t Designed For Conversational Development

Static Application Security Testing scans code after authoring completes. Dynamic Application Security Testing tests applications already running. Both assume code moves through structured review gates where security teams intervene before production deployment.

Vibe coding eliminates these gates entirely. The conversational loop between developer and AI happens faster than weekly SAST scans can execute. By the time traditional tooling flags an issue, code often already runs in production. The rapid, conversational nature of vibe coding bypasses pre-production requirements. SAST tools are perceived as too slow for the vibe coding loop, while SCA checks are missed entirely, allowing vulnerable dependencies introduced by AI to reach production.

Traditional scanning assumes deliberate code structure. Human developers organize functions logically, name variables meaningfully, and structure modules coherently. AI-generated code might function perfectly while organized in ways that confuse pattern-matching scanners. The scanner looks for specific vulnerability signatures that don’t match AI-generated patterns.

SCA Can’t Catch What It Can’t See

Software Composition Analysis tracks known vulnerabilities in declared dependencies. The approach works when developers explicitly add packages through package managers like npm, pip, or Maven. When an AI inserts dependencies directly into code, those packages might never appear in dependency manifests until build time.

By build time, the vulnerable library is integrated throughout application logic. Removing it requires rewriting features, which means returning to the AI for different implementation approaches. The dependency scanning happened too late in the pipeline to prevent integration.

SCA tools also struggle with transitive dependencies. The AI recommends one package, which pulls in fifteen others as dependencies. One of those fifteen contains a critical CVE. The developer never sees those transitive packages. The AI never checks them. Vulnerable code reaches production without anyone examining the complete dependency tree.

Behavioral Vulnerabilities Don’t Show Up In Static Analysis

The Moltbook breach happened because Row Level Security policies weren’t configured on the Supabase database. Static analysis scanning application code would see a Supabase client initialization with a public API key. For Supabase applications, public API keys in client code are normal and expected. The scanner has no mechanism to verify whether database access controls are properly configured.

A static scanner examining the code would see a credential in the bundle and possibly flag it, but could not verify whether the database access controls making the credential safe were actually active. This vulnerability class requires behavioral testing. You need to actually attempt unauthorized access and verify the system blocks it. Static tools analyze code in isolation without executing it.

Configuration vulnerabilities, missing authorization checks, and improperly configured cloud resources all fall into this category. The code itself looks fine. The vulnerability exists in how systems are configured or how components interact at runtime.

The Coverage Gap For Execution Context

Traditional tools analyze code without tracing whether vulnerable functions are actually reachable through user input, whether they touch sensitive data, or whether existing controls mitigate the risk. AI-generated code often introduces dead code paths containing vulnerabilities but never executed in practice.

Traditional scanners flag these dead paths with identical severity to actively exploitable issues. Security teams receive alerts for hundreds of theoretical vulnerabilities. Actual exploitable flaws hide among false positives and low-priority findings. The signal-to-noise ratio makes prioritization impossible.

Execution context matters for AI-generated code specifically because the code might implement features never actually deployed. Experimental prompts generate entire modules. Some get abandoned mid-development. Others get replaced with different implementations. Vulnerable code remains in the repository but never executes in production.

The Real Attack Surface

AI coding tools generate more than application logic. Every layer of the stack becomes a potential vulnerability source.

Configuration Vulnerabilities Ship With Generated Infrastructure

AI coding tools don’t just write application code. They generate database schemas, API gateway configurations, deployment manifests, and infrastructure-as-code templates. Each represents attack surface. When an AI generates a cloud storage bucket configuration, it optimizes for making the bucket accessible to the application. Whether public read access should be restricted never factors into generation logic.

API gateway configurations get generated to create routes enabling features. Whether those routes enforce authentication, validate tokens, or restrict access based on user roles isn’t implicit in “create an API endpoint.” The configuration works from a connectivity standpoint. Security properties require explicit consideration.

Infrastructure-as-code templates create entire environments. Security groups, network ACLs, IAM policies, and encryption settings all get generated to make services operational. Least privilege principles and defense in depth strategies don’t guide generation unless explicitly prompted. The result: infrastructure functional but broadly permissive.

Authentication And Authorization Logic Frequently Incomplete

Building secure authentication requires understanding threat models, session management, token rotation, privilege separation, and password storage. These concepts aren’t implicit when a developer prompts “add user login.” The model generates code demonstrating login behavior, not code securely enforcing access control.

Missing CSRF protection appears endemic in AI-generated authentication flows. The feature works when tested normally. Nobody tests what happens when an attacker crafts requests from different origins. Weak session handling allows session fixation or hijacking. Permissive authorization checks validate authentication but skip role-based access verification.

The code works for the happy path. Users can log in, sessions persist, and authenticated requests succeed. Under adversarial conditions, authentication schemes bypass completely or authorization checks allow privilege escalation. Security requirements that would be obvious to experienced engineers never get implemented because they’re not part of “make login work.”

API Security Gaps From Implicit Assumptions

AI models generate REST APIs responding correctly to well-formed requests. Rate limiting, input validation, response filtering, and error handling all require explicit consideration. The assumption: APIs receive only legitimate traffic from trusted clients.

Production APIs face automated scanners, credential stuffing tools, and enumeration attacks. APIs built through vibe coding rarely include defenses because threat models never entered development conversations. An API endpoint might validate authentication but skip input sanitization, allow unlimited requests per second, or return detailed error messages revealing system internals.

Why This Problem Is Accelerating

The vulnerability landscape is growing faster than security teams can respond.

CVE Growth Shows Increasing Production Impact

Georgia Tech’s Systems Software and Security Lab launched the Vibe Security Radar in May 2025, showing growth from 6 CVEs in early tracking to 35 CVEs three months later. As of March 2026, 74 CVEs had been catalogued as traceable to AI coding tools.

Growth reflects both increased adoption and expanding attack surface. More development teams ship AI-generated code without security review. More vulnerabilities reach production. Eventually, exploitation occurs and disclosure follows. The CVE count tracks real security incidents, not theoretical risks.

The trajectory matters more than absolute numbers. Doubling every few months suggests exponential growth. As AI coding tools gain adoption, the vulnerability disclosure rate will accelerate unless security practices change fundamentally.

Platform Vulnerabilities Compound Application Risks

The development tools themselves contain security flaws. CVE-2025-54135 affects Cursor, allowing malicious Model Context Protocol servers to execute arbitrary actions through the IDE. CVE-2025-55284 affects Claude Code, enabling DNS exfiltration where sensitive data leaks through DNS lookups embedded in generated code.

When the development environment itself is compromised, every line of generated code becomes suspect. The trust boundary extends to every MCP server connected to the coding assistant. If a malicious server can execute arbitrary commands or exfiltrate data through the IDE, even perfect code generation becomes a security liability.

Platform vulnerabilities also enable supply chain attacks. Compromise the coding assistant, and you compromise every application built with it. Attackers can inject backdoors, steal credentials, or manipulate code generation to introduce vulnerabilities developers won’t detect during review.

The Scale Problem Makes Manual Review Impractical

A single developer using AI coding tools can generate thousands of lines of code daily. Volume makes line-by-line security review impossible. Traditional code review processes assume human-speed development where reviewers can thoroughly examine each change. Three hundred lines of new code per week is reviewable. Three thousand lines per day is not.

AI-assisted development inverts the assumption. The reviewer becomes the bottleneck unless security validation is automated and integrated directly into the development loop. Security teams can’t hire enough reviewers to match AI generation rates. Automation becomes mandatory, not optional.

What Actually Works

Effective security for AI-generated code requires different approaches than traditional tooling provides.

Behavioral Testing Catches What Scanners Miss

A behavioral test authenticating as one user and attempting to read another user’s data directly through the API would have caught the missing RLS policies immediately. Testing how systems behave under adversarial conditions reveals vulnerabilities static analysis can’t detect.

Behavioral testing requires actually running the application. Send malicious input. Verify security controls activate. Attempt unauthorized access and confirm blocking occurs. Check whether rate limits are enforced. Validate error handling doesn’t leak sensitive information. Each test simulates attacker behavior rather than normal usage.

The approach works for AI-generated code specifically because it doesn’t rely on analyzing code structure. Behavioral tests validate outcomes regardless of how code is written or organized. If the system allows unauthorized access, the test fails, regardless of whether static analysis detected a vulnerability.

Context-Aware Security Validation During Generation

VibeSec from OX Security represents a different approach. Instead of scanning code after generation, VibeSec embeds dynamic organizational security context directly into AI code-generation agents, guided by live signals from each environment spanning APIs, runtime, cloud, code and OX’s continuously updated AI Data Lake, running autonomously and always-on in the background while developers use AI tools.

At the core of VibeSec is the OX Mind, an AI-powered security intelligence engine built on three foundations: the OX AI Data Lake maintains real-time alignment between security measures and company-specific code, cloud infrastructure, APIs, and runtime environments; Environment Mapping analyzes each organization’s unique infrastructure, architecture, and codebase to enable precisely targeted, autonomous preventative actions and tailors threat models and prioritizations; and Policy Integration embeds security policies and organizational priorities into development workflows, ensuring compliance at every stage.

Security shifts left to the actual point of creation. Rather than detecting SQL injection hours after code was written, the system prevents the vulnerable pattern from being generated initially. The AI receives organizational context, existing vulnerability data, and policy requirements while generating code.

Execution-Aware Prioritization Separates Noise From Risk

Not every vulnerability carries equal risk. SQL injection in an admin panel accessible only from internal networks carries different weight than identical vulnerability in a public-facing API. A hardcoded secret in dead code never executed is less urgent than one in authentication logic.

OX Security’s approach connects code-level findings to actual execution paths and data exposure. The system maps “this function has a vulnerability” to “this vulnerability can access customer PII through this API endpoint exposed on the public internet.” Teams prioritize based on actual exploitability rather than theoretical CVSS scores.

Context awareness becomes critical for AI-generated code because the sheer volume of findings overwhelms traditional triage. Hundreds of potential issues need ranking by actual risk. Which vulnerabilities are reachable? Which touch sensitive data? Which exist in production versus development branches? Answering these questions requires understanding execution context, not just analyzing code in isolation.

Continuous Validation In Production

Security testing can’t function as a pre-deployment gate when code deploys multiple times daily. Runtime monitoring becomes necessary to detect when AI-generated code behaves unexpectedly under real traffic conditions.

Runtime security doesn’t replace pre-production testing. It adds a layer catching issues only manifesting in production environments with actual user behavior and attack traffic. Configurations that looked correct in staging might be permissive in production. Rate limits adequate for development traffic might fail under production load. Authorization checks working in test environments might bypass under specific edge cases only occurring with real data.

Continuous validation also adapts to evolving threats. A vulnerability might not be exploitable when code initially deploys. New attack techniques emerge. Dependencies get disclosed CVEs. Runtime monitoring detects when previously acceptable code becomes vulnerable due to external changes.

Choosing An Approach Based On Your Risk Profile

Different organizations face different risk levels and require different security approaches.

If You’re Building Internal Tools With Limited Exposure

Focus on dependency scanning and secrets detection. Internal tools with small user bases and restricted network access can tolerate more vulnerability debt than customer-facing applications handling PII. Prioritize catching vulnerable libraries and leaked credentials.

Use Gitleaks or Trufflehog for secret scanning integrated into CI/CD pipelines. Run SCA tools like Snyk or Dependabot tracking known CVEs in dependencies. These tools catch the highest-severity issues without requiring deep security expertise. Automated scanning blocks obvious problems while letting development proceed at AI speed.

Internal tools also benefit from network segmentation and access controls. If the tool only runs on internal networks behind VPN, exploit difficulty increases even when vulnerabilities exist. Defense in depth compensates for less rigorous code security.

If You’re Shipping Customer-Facing Applications

You need behavioral testing and execution-aware prioritization. Static analysis will generate too many findings to address effectively. Most won’t represent real exploitable risk given your actual system architecture and data flows.

Implement automated security tests attempting common attack patterns against your APIs. Use the PBOM (Pipeline Bill of Materials) to maintain Code-to-Runtime traceability, correlating code-level findings with actual data access across the software lifecycle. Focus remediation on issues reaching sensitive data or privileged operations.

Customer-facing applications can’t rely on network controls. Public internet exposure means any vulnerability is potentially exploitable. Risk tolerance drops. Security validation needs to be comprehensive and continuous. Behavioral testing catches configuration issues and logic flaws static analysis misses.

If You’re Operating AI-Heavy Development Workflows

Security must embed in the development loop itself, not exist as separate validation occurring hours after code generation. Tooling needs to work in real-time as code is written, not during CI/CD pipelines.

As a core capability of the OX Platform, OX VibeSec provides AI-native security engineering that embeds context directly into AI coding assistants, influencing every code generation so vulnerabilities never make it past the initial prompt, with every code change automatically resolving related vulnerabilities in existing code so the more you develop, the more secure your codebase becomes.

Combine generation-time security with runtime validation catching edge cases that slip through. The goal: security moving at AI speed without creating friction blocking development velocity. Developers continue working in preferred tools. Security happens invisibly in the background. Vulnerabilities get prevented rather than detected.

Conclusion: Eliminating Risk at the Source

Academic research and production incidents confirm AI coding tools generate vulnerable code at measurably higher rates than human developers. The vulnerability classes are predictable: injection flaws, weak cryptography, exposed secrets, and missing authentication. Even the development tools themselves contain security flaws expanding the attack surface. Treating AI-generated code as if it came from an experienced security-aware engineer is the root mistake. 

Statistical models trained on public repositories full of insecure patterns can’t produce secure code without explicit security guidance. Organizations shipping AI-generated code without validation appropriate to the risk profile will experience breaches. The gap between “this code works” and “this code is secure” has always existed. AI coding tools widen it while making it easier to ignore.

  1. 1. What’s The Difference Between Vibe Coding And AI-Assisted Development?

    Vibe coding describes workflows where developers accept AI-generated code with minimal review, prioritizing speed over comprehension. AI-assisted development uses AI as a productivity tool while maintaining human oversight and code understanding.

  2. 2. Why do traditional AppSec tools fail to provide Predictive Risk Context for AI code?

    Traditional tools lack the Unified Control Plane required to see the code journey. The OX Platform eliminates the noise by pinpointing risk to the exact line of code.

  3. 3. Do Small Teams Building Internal Tools Need This Level Of Security?

    Risk scales with exposure so internal tools with restricted access can tolerate more vulnerability debt than customer-facing applications. But even internal tools should scan for secrets exposure and vulnerable dependencies.

  4. 4. What’s The Highest Risk Area In AI-Generated Code Right Now?

    Authentication and authorization logic; AI models generate code demonstrating login behavior but rarely implement complete access control, missing CSRF protection, weak session handling, and permissive authorization checks.

Tags:

post banner image

Run Every Security Test Your Code Needs

Pinpoint, investigate and eliminate code-level issues across the entire SDLC.

GET A PERSONALIZED DEMO
Frame 2085668530

Subscribe to Our Newsletter

Stay updated with the latest SaaS insights, tips, and news delivered straight to your inbox.

Security Starts at the Source