It’s 3:00 pm. You have been heads down all day working on your application. You determine that you need to generate source code that will talk with a given SaaS vendor. You ask AI for a code sample, and the application returns it — a growing practice among developers, and a growing agentic AI use case. Early in the process, as is common, you’ve created variables to store information. Sometimes, this includes credentials. So far, so good! All is working as intended.
However, just as you are about to enter the variables, you notice that they are already filled in. And, surprisingly, they work!
The scenario described above is not uncommon. It’s happening more and more frequently to developers everywhere. Credentials — or secrets — are exposed and used, sometimes in places they shouldn’t be and by people who shouldn’t have access. Suddenly, secrets are not so secret. People and data are at risk.
This raises the questions: How does this happen? Why is it happening? AI can be magical, but accessing and using stolen secrets is not part of their wizardry.
In this blog, I will dig into how hardcoded secrets make their way into agentic AI and what developers can do to protect themselves from credential exposure.
A Bit of Background on Agentic AI
Instead of making assumptions that everyone understands how agentic AI works, let’s take a moment to describe the inner workings. Most of you are familiar with ChatGPT, Claude, Co-Pilot, and other web-based AI tools. These tools will take questions (prompts) from users and return some type of result. What most users don’t realize is that behind the scenes, these tools are using agentic AI.
Agentic AI refers to artificial intelligence systems that can act independently to accomplish goals rather than simply respond to direct commands. It’s like a smart digital assistant that can plan and execute a series of actions to complete a complex task autonomously or semi-autonomously.
Most of these tools have their own large language models, or LLMs for short. To oversimplify, LLMs are nothing more than very large databases or sets of data that can be easily retrieved by an agent. Massive amounts of data (web content, books, news articles, academic papers, reference materials, code repositories, and several others) are stored in these systems. Code repositories include publicly available source code from vendors such as GitHub, GitLab, Bitbucket, and many others.
Depending on the AI model you choose (which varies by AI vendor), the way in which you interact with the model will differ. For example, developers might use an IDE tool that connects directly to a model that provides code generation or code suggestion capabilities. Another example would be a model used inside the application created. The AI could review data entered by a human user and create a summary from it.
How Agentic AI Finds Secrets
In the scenario described in the opening paragraph, when AI was asked to generate a code sample, if this were happening in real life, multiple steps would automatically occur: First, the AI’s agent would check its own LLM for the answer. If the LLM could not find an answer or the answer had a low confidence score, the AI would crawl external sources for a better answer.
During this process, the system might tap hundreds of publicly available sources for information that improve accuracy, including websites, databases, SCM tools, and more. After aggregating the results, the agent will analyze its findings and attempt to return an appropriate answer to the question asked.
In the above case, we asked for code that could help us communicate with a SaaS app. We will never know where or how the AI system found it, but code will be returned. That’s the promise of the agentic AI. The problem is that the code returned contained secrets. This means that the source had this information publicly available. At some point, someone either committed code to a public SCM system or publicly shared it, and AI was able to find it.
Why the Increase in Exposed Secrets?
There are several reasons we are seeing more hardcoded secrets in agentic AI. Some of these include:
- As time passes, AI scrapers have more time to find new sources of information and consume them. Github, for example, is massive. Consuming every repository would take months or years. So, the more time that passes, the more time the AI scraper has to find and add it.
- Developers commit source code to public repositories that contain hardcoded secrets
- It’s common for developers to hardcode the secrets into projects in their early stages of development. Occasionally, this code, along with the secret, will get accidentally committed to the repository.
- Private repositories are sometimes accidentally misconfigured, changing the settings to “public”
- This does not happen often, but when it does, anyone can access it. AI companies can now scan and potentially add the repository to their databases.
How to Prevent Secrets From Compromise
So, where do we go from here?
- Create healthy development practices that encourage developers to use secrets managers to store sensitive data.
- As a general rule, do not hardcode secrets into projects
- Automate frequent security check-ups on your SCM tools.
- Avoid storing secrets even in private repositories. Why? Because with a few clicks, a private repository can become public.
- When any secrets are identified — public or private —invalidate and re-issue them.
- Create best practices around a secret rotation schedule.
- Rotating secrets will automatically invalidate any credentials that were leaked — either accidentally or purposely.
The Future of Agentic AI and Secrets
Agentic AI is evolving, and so are the risks. The genie is out of the bottle and there is no way of putting it back in. Agentic AI is helping developers be more productive. Everyone is using it to some degree — if they realize it or not. The tools you use, the applications you run — nearly everything in the software development lifecycle today is impacted by AI.
When used properly, AI quickly reduces complexity and provides valuable information at speeds never seen before. However, AI is not without risk and is not error-free. The vast majority of models are clear that “AI can make mistakes,” and that it’s up to humans to double-check sources and outputs. Not all information that can be grabbed from public sources is correct or useful, yet, anything the model can find and train on, it will. Both good and bad information. It’s important to review all results from AI’s output to make sure it’s accurate.
Remember, we are better together. If you see someone sharing a secret, please let them know. They might have accidentally done something they shouldn’t.