Press "Enter" to skip to content

To write secure code, be less gullible than your AI

Graphite is an AI code review platform that helps you get context on code changes, fix CI failures, and improve your PRs right from your PR page. Connect with Greg on LinkedIn and keep up with Graphite on their Twitter.

### This Week’s Shoutout

This week’s shoutout goes to user **Xeradd**, who won an Investor badge by dropping a bounty on the question [How to specify x64 emulation flag (EC_CODE) for shared memory sections for ARM64 Windows?](https://stackoverflow.com/questions/). If you’re curious about that, we’ll have an answer linked in the show notes.

### Transcript: Conversation with Greg Foster of Graphite on AI and Security in Software Engineering

**Ryan Donovan:** Urban air mobility can transform the way engineers envision transporting people and goods within metropolitan areas. Matt Campbell, guest host of *The Tech Between Us*, and Bob Johnson, principal at Johnson Consulting and Advisory, explore the evolving landscape of electric vertical takeoff and lift aircraft and discuss which initial applications are likely to take flight. Listen from your favorite podcast platform or visit mouser.com/empoweringinnovation.

**Ryan Donovan:** Hello, and welcome to the Stack Overflow Podcast, a place to talk all things software and technology. I’m your host, Ryan Donovan, and today we’re delving into some of the security breaches triggered by AI-generated code. While there’s been a lot of noise around this topic, my guest today argues that the problem isn’t the AI itself, but rather a lack of proper tooling when shipping that code.

My guest is Greg Foster, CTO and co-founder at Graphite. Welcome to the show, Greg.

**Greg Foster:** Thanks for having me, Ryan. Excited to talk about this.

**Ryan Donovan:** Before we dive deep, tell us a bit about your background. How did you get into software and technology?

**Greg Foster:** Happy to share! I’ve been coding for over half my life now. It all started in high school—I was 15 and needed a job, and I figured I could either bag groceries or code iOS apps, so I picked the latter. I went on to college, did internships at Airbnb, working on infrastructure and dev tools teams, helping build their release management software. Interestingly, I was hired as an iOS engineer but immediately shifted to dev tools, which I loved. For the last five years, I’ve been in New York working with friends to create Graphite, continuing my passion for dev tools.

**Ryan Donovan:** Everybody’s talking about AI-powered code generation now—some people doing “vibe coding” where they don’t even touch the code themselves and just say, “build me an app.” Then we see the ensuing security laughs on Twitter. You’re saying the AI isn’t purely the problem?

**Greg Foster:** It’s nuanced. Fundamentally, there’s a shifting landscape of trust and volume. Normally, when you do code reviews, you trust your teammates to some degree. You carefully vet code for bugs and architecture, but you don’t scrutinize every line on security—assuming teammates aren’t malicious. AI changes this because a computer writing code holds no accountability, and you might be the first person ever to lay eyes on it. Moreover, the volume of code changes is skyrocketing. Developers, including juniors, push many small PRs rapidly, which overloads the review process. This creates a bottleneck and trust deficit.

**Ryan Donovan:** Interesting. Our survey from a few months ago found people use AI more but trust it less, which seems natural since AI generates code based on statistical models of previous code.

**Greg Foster:** Yes, and AI can be quite gullible. Take recent hacks like the Amazon NX hack—prompts told the AI to scour user file systems deeply to find secrets. A human engineer would never do that blindly, but AI systems might follow those instructions unquestioningly. It’s a real challenge.

**Ryan Donovan:** So it’s really a lack of real-world context that AI code generators have. The speed and volume of PRs make human review difficult. Naturally, that calls for tooling solutions.

**Greg Foster:** Exactly. Graphite is all about tooling that helps make code review better. One timeless best practice remains: keep code changes small. Research from Google showed that longer pull requests get disproportionately fewer review comments—in fact, engagement drops steeply beyond about 100 lines of code.

We’ve seen the same data at Graphite. People tend to skim or blindly approve massive PRs, so small, manageable PRs—around 100-500 lines—hit a sweet spot for deep review.

But this requires tooling to manage stacked, incremental commits so developers can maintain flow without submitting giant PRs.

**Ryan Donovan:** That’s a key point. Many AI-generated chunks of code are enormous, unrefined, and not necessarily human-friendly. How do you see developers breaking that down and improving readability?

**Greg Foster:** Another concern is losing context. When you write code yourself over hours, you internalize the intricacies of that module or system. With AI-generated code, you often don’t fully absorb or understand the details. This means reviewers must pay extra-close attention.

Overall, fast, blind shipping of code reduces deep understanding and increases risk, especially for security.

**Ryan Donovan:** Copy-pasting from Stack Overflow has long been a source of vulnerabilities. AI seems to intensify that issue.

**Greg Foster:** Exactly. We used to shame copy-pasting, but now AI-generated snippets can propagate security flaws just as easily. Though these AI systems are generally well-intentioned, they create false confidence and lower the bar for attackers who now can craft malicious code with minimal skill.

**Ryan Donovan:** How do you guard not just the code, but the prompts themselves? Can prompts be sanitized or secured?

**Greg Foster:** It’s tough—probably impossible to secure prompts completely. You could try meta-prompting where one AI judges the security of another’s prompt output, but this is a cat-and-mouse game.

In some cases, suspicious prompts could trigger extra user verification steps, like password confirmation or biometric checks.

Also, if prompts come from untrusted users, they should be sandboxed or highly restricted, similar to executing untrusted code.

**Ryan Donovan:** Browsers already sandbox JavaScript and WebAssembly to prevent dangerous abuse.

**Greg Foster:** Indeed. Some AI-powered browsers or extensions have been exploited by injecting invisible prompts to perform malicious actions. This gullibility is something we should expect and prepare for.

At the end of the day, best practices—like minimizing exposure of secrets and being cautious about input—are more important than ever.

**Ryan Donovan:** You mentioned using LLMs themselves as judges for security scanning. How do you ensure those AI judges are trustworthy?

**Greg Foster:** Good question—“Who watches the watchman?”

Major LLMs today are reasonably reliable if well-prompted. If compromised at root, that’s a whole different challenge.

But in day-to-day use, you can trust security tools running LLMs to find real issues. You can measure their effectiveness through true positive and false positive rates. LLMs are actually pretty good at detecting security vulnerabilities in code, sometimes surpassing humans, who grow distracted or fatigued.

**Ryan Donovan:** Is there still a role for traditional static analysis and linting alongside LLM-based tools?

**Greg Foster:** Absolutely. Great security practice is layered. Keep your unit tests, linters, human code review, and add LLM scanning as a powerful augmentation layer.

Think of LLM-based scanners as “super linters” that run quickly and flexibly across many languages without much setup.

But don’t replace deterministic tests and human judgment—they catch problems LLMs can’t.

**Ryan Donovan:** That sounds like a healthy, balanced approach.

**Greg Foster:** For sure. The combination is greater than its parts. For example, LLMs can even help generate missing unit tests, reducing the barrier for engineers to write more tests.

**Ryan Donovan:** Do you worry developers will start outsourcing their security expertise entirely to AI?

**Greg Foster:** Not really. Much of security engineering involves manual processes, audits, policies, and incident response that AI can only assist, not replace.

For example, at Graphite, our security team implements network proxies, audit logging, and SOC2 compliance—all human-driven.

AI can help surface information faster during incidents, or assist with paperwork, but it won’t replace deep human expertise anytime soon.

**Ryan Donovan:** Every new abstraction layer in software adds complexity that engineers need to manage. AI seems to be another one in that lineage.

**Greg Foster:** Exactly. Engineering isn’t about typing lots of code; it’s about problem-solving, decision-making, and communication. AI just changes the tools we use.

Just like 3D printing shifted manufacturing but didn’t replace craftsmen, AI will change software engineering but not eliminate great engineers.

**Ryan Donovan:** We’re entering a new era of productivity and tooling with AI. How do you see AI tooling evolving?

**Greg Foster:** I see three main areas:

1. **Code Generation:** From simple tab completion to complex agent-driven creation that can even submit PRs directly.

2. **Code Review:** LLMs scanning diffs to find bugs, architectural issues, or security risks.

3. **Background Agents:** Autonomous tools that trigger off existing PRs to enhance them—splitting PRs, adding tests, or suggesting improvements proactively.

On the other hand, core infrastructure like CI, builds, and deployments remain largely unchanged.

This evolution highlights the importance of fundamentals—clean, small, incremental code changes, robust testing, rollbacks, and feature flags. Senior engineers who combine these classic best practices with AI tooling get the most value.

**Ryan Donovan:** Wise words. Thanks so much, Greg, for sharing your insights.

**Greg Foster:** Thank you, Ryan. If folks want to learn more about modern code review, stacking code changes, or applying AI in their workflows, check out [graphite.dev](https://graphite.dev) or follow us on Twitter.

**Ryan Donovan:** And that’s a wrap! Remember, good coding and good security both come from solid fundamentals enhanced by smart tools. For questions or feedback on the podcast, reach out at [email protected] or find me on LinkedIn.

Thanks for listening!

*This transcript has been edited for clarity and readability.*
https://stackoverflow.blog/2025/11/04/to-write-secure-code-be-less-gullible-than-your-ai/

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *