Most articles about how security research gets done are written in retrospect — after the bug has been found, the CVE assigned, the conference talk given. The version that emerges in retrospect is tidy. There's a clear motivation, a clean methodology, a satisfying result. The retrospective version is honest about the technical work and dishonest about the messy human decisions that came before it.
I want to write the other version. Below is something close to a diary entry from a single Monday morning, in our office, when three of us sat down to decide what we were going to spend the next two months researching. The names are softened. The arguments are otherwise verbatim. If you've ever wondered how research targets actually get chosen inside a small security firm — not in theory, in practice — this is what it looks like.
9:14 AM — the whiteboard
Maya is at the whiteboard. She's our research lead. She writes six candidates in a column, the result of two weeks of background work that the rest of us have only partially read:
- Detection bypasses in popular EDR products
- Memory safety bugs in a widely-deployed Rust networking library
- Authentication issues in a newly-released identity SaaS
- Parser differential attacks against XML signing libraries (continuation of last quarter's work)
- Prompt injection vectors in agentic LLM systems
- Side channels in WebAssembly runtimes
She steps back. "We have time for one. Maybe one-and-a-half if we're lucky. Pick."
The meeting is supposed to last an hour. It will last two.
9:22 AM — David goes first
David has been at the firm the longest. He votes for #2 — memory safety in the Rust library.
His argument: "The library is in everything. Cloudflare uses it. AWS Lambda uses it. Half the modern web infrastructure ships it. If we find a bug, we have impact at a scale that doesn't really exist for the other targets. And nobody's looked closely at it because everyone assumes Rust is safe."
Maya pushes back. "Everyone assumes Rust is safe because Rust mostly is safe. The bugs in Rust libraries tend to be in unsafe blocks, which are a small percentage of code. The hit rate is going to be much lower than you think."
David: "Lower hit rate, higher impact per hit. The expected value pencils out."
Me: "The expected value pencils out if we hit. What's the floor? If we spend two months and find nothing, what do we have?"
David, honestly: "We have a really detailed map of the library's unsafe blocks, which we publish, and which other researchers use to find the bugs we missed. That's a legitimate output. Not as good as finding the bug ourselves, but real."
This is the conversation that doesn't appear in research write-ups. Every research project has a probability distribution. The expected outcome is rarely the modal outcome. David is making the case that the floor — the worst likely result — is acceptable. Maya is making the case that the floor might not be acceptable, because two months of opportunity cost is real.
9:45 AM — the LLM prompt injection angle
I argue for #5 — prompt injection in agentic LLM systems.
My case: "Six months ago, EchoLeak hit Microsoft 365 Copilot. CVSS 9.3. First major weaponization of prompt injection in production. Every company is now deploying agentic LLMs, and almost none of them have specifically tested for this attack class. The research timing is right. The market timing is right."
Maya: "Counterpoint. Half the prompt injection research being published right now is repackaged versions of the same observation. Add one more voice to that pile and we're just noise."
David: "The interesting research isn't 'prompt injection works' — that's a stale finding now. The interesting research is 'here's a class of agentic system where existing defenses don't apply, and here's the chain to data exfiltration through it.' If we can find a specific agent architecture where the standard mitigations have a gap, that's novel."
Me: "Right. The framing is: what's the next EchoLeak? Which deployed agent system has the structural weakness that's about to become a public incident?"
Maya writes on the whiteboard: "Novel = a class with a specific gap, not just another demo."
That note is the conversation in compressed form. The vote isn't whether prompt injection is a real threat. It's whether we can produce work that adds to the public understanding rather than restating it.
10:03 AM — Maya makes her own case
Maya has been arguing against everyone else's pick without naming her own. She finally does.
Her vote: #4 — continue the parser differential research from last quarter.
Her argument: "We already have the harness. We already have the bug class. We've already published one CVE in this space and have credibility with the maintainers. The next bug in the chain is probably six weeks of focused work, with a high probability of landing a CVE and a moderate probability of finding the systemic pattern across the language ecosystem."
David: "It's incremental. We're going to find another instance of the same thing."
Maya: "An incremental finding with high probability is worth more than a novel finding with low probability, for our position as a research-active firm. Our reputation compounds on volume of meaningful disclosures."
This is the most uncomfortable argument in the room, because Maya is making the case that the reputational math matters. The honest version of "what should we research?" includes "what produces the best return for the firm's reputation, our relationships with vendors, and our hiring funnel?" Researchers don't like saying this out loud. It's still true.
10:31 AM — the vote that doesn't happen
We haven't voted. We're just talking. The conversation has shifted from "which target?" to "what are we actually trying to accomplish?"
Maya draws three columns on the whiteboard:
| Goal | Best fit | Worst fit |
|---|---|---|
| Maximum impact (one big bug) | #2 (Rust library) | #4 (incremental parser) |
| Research depth / publishable | #5 (LLM injection) | #1 (EDR bypasses, contractual issues) |
| Predictable output for client work | #4 (incremental parser) | #5 (LLM, novelty risk) |
| New skill development for the team | #5 or #6 (WASM) | #4 (we already know it) |
| Cost / time per finding | #3 (SaaS, fast iteration) | #2 (kernel-adjacent work) |
"We can't optimize for everything," Maya says. "Which row matters most this quarter?"
Me: "Skill development. The team is leaning toward LLM work for client engagements, and we don't have deep institutional knowledge yet."
David: "Predictable output. We have three contracts starting in February. The team needs to be focused on billable work, not exploratory research that might not land."
Maya: "I want to argue for impact. The reason we do research is to find the bug that matters. If we optimize for predictability we slowly become a pentest mill that calls itself a research firm."
The disagreement is real and it doesn't get resolved in the meeting.
10:58 AM — the decision
We split the difference, which is the decision most security firms actually make and almost none write about.
Maya gets eight weeks on #4 — the parser differential continuation. Bounded scope, high probability of CVE, builds on existing work.
I get four weeks on #5 — the LLM agentic system research, with a narrow framing. Not "prompt injection in general" but "find a specific deployed agent architecture where the standard mitigations have a structural gap, and write up the gap with a proof of concept." If I can't make progress on the narrow framing in four weeks, I pivot to client work.
David gets nothing this cycle. He volunteers to take on the heavier client load that frees the rest of us up. He'll get the next research slot.
The Rust library is parked for next quarter. The EDR bypasses are parked indefinitely because of contractual complications with several of our clients. The WebAssembly side channels are parked because it's adjacent to academic work in the area and we'd need to spend a month catching up on the literature before we could even start.
The decision takes 104 minutes. The meeting was supposed to take 60.
What the meeting was actually about
The meeting was framed as "pick a research target." It was actually about something else. It was about negotiating, between three people with different incentives and different sources of expertise, what the firm should care about for the next two months.
That negotiation is, more than the technical work itself, what determines what research firms produce. The clichéd version of security research is one person, in a basement, with a laptop, finding a bug. The actual version is two or three people, in a meeting, arguing about which bug to look for, with budget and reputation and team development and client work all pulling in different directions.
The people who don't sit in those meetings tend to assume the technical work is the hard part. The people who do sit in those meetings know the technical work is the part that comes after the harder decision has been made.
How this connects to clients
You don't directly pay us for the meeting above. When you hire us for a pentest, you're paying for the engagement on your specific application. The research work is, in some sense, our overhead. The reason we do it anyway:
One. The harnesses and tools we build during research apply to client work. The custom XML parser fuzzer that produced last quarter's CVE is the same fuzzer we point at your custom XML parser when you have one. Research tooling becomes assessment tooling.
Two. The pattern recognition compounds. The bug class we're studying in our research is the bug class we recognize faster in your codebase. A team that's not actively researching gradually loses the edge on what modern bugs actually look like. We've watched this happen to other firms; we work hard not to let it happen to ours.
Three. The vendor relationships matter. When we find something serious in your application that happens to depend on a vendor library, we have working relationships with several of the maintainers from past disclosures. The disclosure-and-fix cycle is faster for our clients because of work that wasn't done for them.
Four. The reputation that lets us hire the people who do good work is built on the public research. The team you talk to when you hire us is the team that exists because we publish. The day we stop publishing is the day our hiring funnel starts to dry up. We have seen this happen to other firms. We work hard not to be one of them.
The honest version of "how research time gets allocated"
If you want the one-paragraph summary of the meeting above:
We optimize, in roughly this order of priority: maintaining our pattern recognition for client work, producing publishable output that justifies our team's existence, taking on incremental risk that compounds over time, and very occasionally swinging at a high-impact target that might not land. The allocation across those four goals shifts quarter to quarter based on what's currently happening in the world and what our team needs. It is not a science. It is a series of judgment calls made by people who are individually fallible.
That summary isn't very dramatic. It is, I think, true.
One last thing
If you're a founder reading this and wondering whether "your security partner does research" matters when you're evaluating who to hire — I'd argue it does, but for subtler reasons than the marketing version suggests. It's not that the research itself directly protects you. It's that the kind of team that does research is the kind of team that gets the deep work right when it does your assessment.
The version of "we do research" that doesn't matter is the version where the firm has a research department that's siloed from the client team. The version that does matter is the version where the people who do the research are also the people who do the engagements. The transfer of pattern recognition is what you're buying.
Ask, when you're evaluating: "How does what you've found in your research show up in your client work?" The answer to that question, more than the existence of a research practice itself, tells you whether the research is a marketing decoration or an actual edge.
For us, the answer is the meeting above. The arguments we have about what to research are downstream of the assessments we want to be doing in 18 months. That's the connection. Everything else is detail.