Security insights

Insecure Deserialization: The Class That Refuses to Die

In December 2021, a single line in a logging library brought the internet to its knees. Three years later, the same bug class shipped in Apache MINA, in Cisco's enterprise authentication platform, and in 400 million pickle files quietly downloaded every month by AI engineers who didn't read the warnings. The "old" bug that refuses to die — and the reason it keeps killing.

BU
BugSwagger Team

December 9, 2021. A Minecraft player types a clever string into the chat. Halfway across the world, a server logs the message. Inside the logging library, eleven characters trigger a chain reaction nobody saw coming — the server fetches code from the player's computer, runs it, and gives the player complete control.

By the time the rest of the internet noticed, the bug — Log4Shell — was already in everything. Apache logged it CVSS 10.0, the maximum. Within 72 hours, attackers were hitting one in ten servers on the public internet. The patching effort cost the industry billions of dollars, with the average incident response engagement alone running over $90,000. Most large enterprises spent weekends rotating credentials and triaging compromised systems. Some are still finding unpatched corners three years later.

Log4Shell was, at its core, a deserialization-adjacent bug. A logging library reached out across the network, fetched a Java object, and ran it. Classic gadget chain, mass casualty event.

You'd think that after a catastrophe of that magnitude, the security community would have collectively killed this bug class. We did not. In 2024, Apache MINA shipped CVE-2024-52046 — another Java deserialization RCE through the same kind of unfiltered ObjectInputStream path. Cisco ISE — the enterprise platform many Fortune 500 companies use to control who can plug into their networks — disclosed multiple insecure deserialization vulnerabilities. And in the AI/ML world, malicious "pickle" files are quietly being uploaded to Hugging Face, getting downloaded 400 million times a month, and executing attacker-controlled code on data scientists' laptops the instant the model is loaded.

This article is for two readers. If you sign off on the budget, the first sections explain what's at stake in business terms. If you write the code, the technical sections walk through the languages, the gadget chains, the recent CVEs, and the defenses that actually hold.

The 60-second version, if you sign the checks

Software stores data on disk, in caches, in queues, and in transit between services. To do that, it has to convert the in-memory form (an "object") to a stream of bytes ("serialization") and back ("deserialization"). Most languages have a built-in way to do this. Most of those built-in ways are dangerous.

Here's the part that makes the bug class so durable: in many languages, when you ask the runtime to deserialize a byte stream, the byte stream itself describes the type. "Make me a User object," it says. The runtime obediently constructs a User. But the byte stream could just as easily say "Make me a Runtime.exec object that runs rm -rf /," and the runtime would obey just as enthusiastically. The byte stream is in charge.

If an attacker controls any byte stream that reaches your deserializer — a request body, a cached session, a queued job, an uploaded model file — they control what your application becomes. Often, they get instant remote code execution as your application's process. Often, that's the same process that has access to your database, your cloud credentials, and your customer data.

Why this matters at the executive level:

  • It's the worst outcome category. When this bug fires, it doesn't leak data or modify a setting. It runs attacker code on your server. Same blast radius as a backdoor — except the attacker wrote it for you, and your software faithfully executed it.
  • Log4Shell is the proof of concept for what "industry-wide" looks like. Average response cost: $90,000 per company. Total industry cost: in the billions. Three years later, security teams still mention Log4Shell weekends with a tone usually reserved for natural disasters.
  • The bug class isn't winding down — it's expanding into AI. Pickle files (Python's native serialization format) are the dominant way machine learning models are shipped. Hugging Face hosts millions. Most contain code that runs the moment the model is loaded. Researchers in 2025 documented malicious models on the platform whose payloads include credential theft, system fingerprinting, and reverse shells. The "model" is the gadget chain.
  • Even the scanners can't be trusted. In 2025, JFrog disclosed three zero-days in PickleScan — the most popular tool teams use to check whether a pickle file is malicious. The scanner itself could be bypassed by carefully crafted payloads. The detection layer your team probably built around the problem may not detect anything.
  • The fix is structural and the audit is the work. Replacing native serialization with safe formats (JSON, Protobuf, MessagePack) is straightforward in code. Finding every place your application uses native serialization — across services, dependencies, caching layers, message queues, ML pipelines, IPC — that's the part that takes months. And most teams haven't started.

If any of the following sentences applies to your stack, this bug class is currently a question mark in your security posture: "we cache user objects with pickle." "We use Java RMI." "We load ML models from Hugging Face." "Our queue serializes jobs with the native library." "We have an old service running BinaryFormatter." Each one is a place an attacker can plant a gadget chain.

The story behind the bug

Imagine you're writing a Java application in 2008. You need to send a User object to another server. You call ObjectOutputStream.writeObject(user). Out comes a stream of bytes. The other server calls ObjectInputStream.readObject(). Up pops a User object on the other side. Beautiful. The protocol handles everything — types, fields, nested objects, references.

What the protocol does not handle is the assumption you made when you wrote it: that the byte stream is going to describe a User. Because the byte stream gets to declare its own type. It can declare itself a User. It can also declare itself an instance of org.apache.commons.collections.functors.InvokerTransformer, which — when constructed — can be made to call Runtime.exec("malicious command").

Nobody designed it to do that. The class was for invoking transformations on collections. But the act of reconstructing the object — calling its constructor, populating its fields — happens to take attacker-controlled values. And the right combination of fields, on the right class, in the right order, triggers code execution.

This is called a "gadget chain." The attacker doesn't write new code; they assemble existing code paths in your dependencies into a chain that ends in RCE. In 2015, a tool called ysoserial packaged dozens of these chains for Apache Commons, Spring, Hibernate, and other widely-used libraries. Suddenly anyone could generate a working exploit by selecting a chain from a menu and pointing it at a vulnerable endpoint.

Java was first. Then .NET (with similar tools for BinaryFormatter). Then PHP (where unserialize had been quietly vulnerable since the early 2000s). Then Python pickle. Then Ruby Marshal. Then Node.js modules like node-serialize. Every language with native serialization has the same bug class. Every fix has been partial. The bug refuses to die because the underlying mechanic — "let the byte stream specify the type" — is a feature, not a defect.

How an attacker actually exploits this

The attack is more elegant — and more devastating — than most teams expect. Here's the basic flow.

Step one: find the entry point. Anywhere the application calls a "deserialize this" function on data that came from a network request, a file upload, a cache, a queue, or an integration. In Java, that's ObjectInputStream.readObject, JMS, RMI, JNDI lookups. In .NET, BinaryFormatter, NetDataContractSerializer. In PHP, unserialize. In Python, pickle.loads, yaml.load (without SafeLoader), marshal.loads. The list is long, and the surface keeps expanding.

Step two: pick a gadget chain. Open ysoserial (or its language-specific equivalent — marshalsec for Java, ysoserial.net for .NET, PHPGGC for PHP). Browse the menu of pre-built chains. Each one targets a specific library that's probably in the application's classpath. Apache Commons Collections, the canonical victim, was in 90% of Java enterprise applications in 2015. Most of that exposure still exists.

Step three: generate and deliver the payload. One command produces the bytes:

# Generate a payload that executes 'id' on the target
java -jar ysoserial.jar CommonsCollections6 'id' > payload.bin

# Send it to the vulnerable endpoint
curl -X POST --data-binary @payload.bin https://target.example.com/api/upload

The server deserializes. The gadget chain fires. Code runs as the application's user.

Step four: pivot. The attacker now has shell access. They look around. They find database credentials in environment variables, cloud IAM tokens in the metadata service, source code on disk. The classic chain from "single deserialization endpoint" to "full cloud account compromise" runs in about ten minutes for an experienced operator. The original entry point was a feature your team forgot to lock down.

Where this is alive right now

Java

The original. Native serialization has been famous since ysoserial in 2015. Newer Java versions ship filtering APIs (ObjectInputFilter), but legacy applications using ObjectInputStream on attacker-controlled input still exist in massive numbers. RMI, JMX, JNDI, message queues — all classic vectors.

Log4Shell (CVE-2021-44228) was the dramatic example: a logging library, of all things, performed JNDI lookups on attacker-controlled input. A specially crafted log message caused the server to fetch and execute Java code from the attacker's server. CVSS 10.0. Three years of cleanup. Still finding pockets.

Spring4Shell (CVE-2022-22965) followed shortly after. RCE in the Spring framework — the most-used Java web framework on Earth — through abuse of parameter binding and class introspection. Same deserialization-adjacent pattern.

Apache Commons Text (CVE-2022-42889 / "Text4Shell") was the followup that the press hyped but turned out to be narrower than Log4Shell in practice.

CVE-2024-52046 — Apache MINA. The most recent in the line. A critical insecure deserialization vulnerability in Apache MINA's ObjectSerializationDecoder. The decoder processes incoming serialized Java objects without proper class filtering. RCE for anyone who can deliver bytes to a MINA-based service.

Cisco Identity Services Engine (2024–2025 advisories). The enterprise platform Fortune 500 companies use to control network access shipped multiple insecure Java deserialization vulnerabilities. If you wanted to know whether the bug class still hits well-resourced enterprise vendors with mature security programs — yes.

.NET

BinaryFormatter, NetDataContractSerializer, and ObjectStateFormatter have known gadget chains. Microsoft has been telling people not to use BinaryFormatter for years; it ships disabled in newer .NET. Code that called it is still in production. ysoserial.net packages payloads for all the common targets.

Python

Python's own documentation warns, in capital letters, that pickle.loads on attacker-controlled input is unsafe. The warning has been ignored thoroughly for two decades.

The dramatic resurgence: machine learning. PyTorch's default model storage format uses pickle. Hugging Face hosts millions of models. The pickle format embeds and executes arbitrary Python code during deserialization. Loading a malicious model is equivalent to running attacker code.

Researchers in 2024–2025 documented malicious pickle models on Hugging Face whose payloads include credential theft, system fingerprinting, and reverse shells. Almost every malicious ML model discovered on the platform uses pickle. Pickle-only models are downloaded over 400 million times a month. That's 400 million chances per month for an AI engineer's laptop — often their corporate laptop, often with VPN access into production — to load attacker code.

And in 2025, JFrog disclosed three zero-day vulnerabilities in PickleScan, the most popular tool for detecting malicious pickles. The scanner itself could be bypassed. The detection layer most teams built around the problem may detect nothing.

Celery (Python's distributed task queue), Django sessions configured with the pickle serializer, and any internal RPC layer using pickle remain reliable findings on engagements.

PHP

unserialize with attacker control has produced bugs across major PHP applications for over a decade. Magic methods (__wakeup, __destruct, __toString) are gadget primitives by design. PHPGGC (PHP Generic Gadget Chains) provides pre-built payloads for WordPress plugins, Laravel, Symfony, Magento, and dozens of other common targets.

Node.js

Native serialization is less common, but applications using node-serialize (which uses eval internally — there's no fixing this), funcster, or homegrown JSON-based class reconstructors fall into the same patterns. We have seen apps run eval on a JSON field that contained type metadata. The bug class adapts to its host language.

Rust and Go

The newer languages were supposed to be immune because they don't have native polymorphic deserialization. They are mostly safer — but not immune. bincode in Rust, encoding/gob in Go, and several MessagePack implementations have had bugs where the type information in the stream produced unexpected behavior. The form changes; the underlying problem — type-implying byte streams from untrusted sources — keeps returning.

The pattern that keeps producing the bugs

The bug class persists because developers reach for deserialization in legitimate-looking contexts where it feels safe — and isn't:

  • Session storage that caches user objects to disk or to Redis. The session is "trusted" because it's our session — until someone tampers with the cookie or breaches the cache.
  • Background job queues that pickle/serialize tasks. The queue is "internal" — until a service that publishes to it is compromised, and now every consumer is RCE-able.
  • Cache layers that store complex objects. The cache is "ours" — until cache poisoning gets data in there from outside.
  • Distributed computing frameworks that ship code over the wire (Spark, Dask, Ray). The cluster is "isolated" — except when the network boundary isn't what people thought it was.
  • IPC mechanisms that "feel" trusted but accept network input. Unix sockets, named pipes, internal RPC.
  • ML model loading. The model is "from a teammate" — except it's actually from a public registry with no provenance check.

Each of these is a place where serialization makes sense as a feature, and where the data path crosses a trust boundary that the developer didn't think hard enough about.

What "actually safe" looks like

  1. Avoid native serialization for cross-boundary data. Use JSON, MessagePack, or Protobuf with explicit schemas. The simpler format has a narrower attack surface. You give up some convenience; you give up almost all of the bug class.
  2. If you must use native serialization, sign the bytes. An HMAC over the serialized blob, verified before deserialization, removes the attacker's ability to inject objects. The attacker can no longer forge a valid serialized payload — they don't have the signing key.
  3. Use serialization filters. Java has ObjectInputFilter. Python has restricted_loads patterns. Configure them to allow only the specific classes your application actually deserializes. Reject everything else.
  4. For ML models: switch to SafeTensors. Hugging Face released SafeTensors in 2022 specifically to address pickle's exploitability. Many models are now available in both formats. Pin your loaders to SafeTensors. Block pickle loads at the CI level.
  5. Treat all model sources as untrusted. Even ones from your own internal registry. The supply chain attack against ML models is real and growing.
  6. Sandbox the deserialization layer. If a gadget triggers, sandboxing limits the damage. This is defense-in-depth, not a primary control — but a good last line of defense.
  7. Test for it. ysoserial, marshalsec, ysoserial.net, PHPGGC — the payloads exist. Throw them at every deserialization endpoint your application exposes, every quarter.

The boardroom translation table

What your team says What it actually means What it could cost you
"We use Java's ObjectInputStream for performance" If any data path into it crosses a trust boundary, you have potential RCE. Log4Shell-class incident. Average response: $90K+. Industry total: in the billions.
"Our ML team loads models from Hugging Face" Most pickle models execute arbitrary Python on load. Treat every download as code from a stranger. Credential theft, lateral movement from data-science laptops into production.
"We use Celery / Sidekiq / a queue with pickle-based serialization" If any publisher into that queue gets compromised, every consumer becomes RCE. Internal pivot path that bypasses all your perimeter controls.
"Our session cookies are serialized objects" If the serialization is native (not JSON with schema), tampering or stealing one cookie may lead to RCE. Account takeover that escalates to server compromise.
"We have PickleScan in our CI" JFrog disclosed three zero-day bypasses against it in 2025. It may detect nothing. False sense of security on the riskiest part of your ML pipeline.

Five questions to put to your engineering team this week

  1. "List every place in our codebase that calls a deserialize function on data we didn't generate." If the team can produce the list in under a day, you have parser hygiene. If not, that audit is your highest-leverage security project for the quarter.
  2. "Are we still on Log4j 2.17+ everywhere, including in our build tooling, internal services, and forgotten projects?" Three years on, Log4Shell still has unpatched pockets. Audit anyway.
  3. "How does our ML team verify the models they download? Do we block pickle loads in favor of SafeTensors?" If the answer involves PickleScan as the only defense, you have a 2025-disclosed bypass problem.
  4. "Do our background job queues use native serialization, or JSON/Protobuf with explicit schemas?" Native is the worst case. If yes, the queue is an internal RCE pivot.
  5. "When was the last time we threw ysoserial payloads at our endpoints?" If "never," that is the single highest-leverage test you can add to your next assessment.

What we test, every engagement

  • Every endpoint that accepts a serialized format: ysoserial / marshalsec / ysoserial.net / PHPGGC payloads, per language.
  • Session cookie tampering: try replacing the serialized session blob with a forged one.
  • Message queues: where the application allows external publishing, test gadget delivery via the queue.
  • Java JNDI lookup endpoints (Log4Shell pattern): test for unbounded JNDI references in logging, error handlers, expression evaluators.
  • Pickle/YAML loads in Python services: identify and test any code path calling pickle.loads or yaml.load without SafeLoader.
  • BinaryFormatter remnants in .NET applications and admin tools.
  • WebSocket and gRPC channels for serialized payloads.
  • File upload paths that process model files, serialized configs, or document templates.
  • ML model loading code paths: verify SafeTensors-only or signed-pickle enforcement.
  • Sandboxing tests: confirm that a successful deserialization gadget cannot reach production credentials.

If your last assessment didn't include the 2024–2026 catalog — Apache MINA, Cisco ISE deserialization, ML pickle supply chain, PickleScan bypasses — your application has not been pressure-tested against the version of this bug class that's currently producing CVEs.

Why this bug class will outlive us all

Every new framework, every new ecosystem, eventually recapitulates the bug class. The form changes — Java in 2015, .NET in 2017, PHP forever, Python in the ML era, Rust and Go in their early years — but the underlying problem doesn't. Type-implying byte streams from untrusted sources will always be dangerous. Languages keep adding new ways to serialize. Frameworks keep choosing convenience over safety. Teams keep assuming the boundary they care about is the boundary the runtime cares about.

For us, that's why "do you use native serialization anywhere?" is the first question we ask on a Java, .NET, PHP, Python, or ML-heavy engagement. The answer is usually yes. The followup question — "where, exactly?" — is usually where the audit begins.

The bottom line

This bug class brought down the internet once. It is currently quietly executing code on AI engineers' laptops 400 million times a month. It is in Apache MINA, in Cisco ISE, in every PHP application still using unserialize, in every Java application still using ObjectInputStream. It is the bug that refuses to die because the feature it abuses — letting bytes describe their own type — is too useful for languages to remove.

The defenses are well-understood: drop native serialization for cross-boundary data, sign bytes when you can't, filter aggressively, switch ML models to SafeTensors, test with ysoserial. The cost of putting them in place this quarter is small. The cost of being the next Log4Shell case study — with your company name in the post-mortem and your weekend in the incident timeline — is not.

If your stack does anything more than parse JSON with a schema validator on input, this conversation belongs on your security roadmap. Not next year. Now.

Found this helpful?

Want a hand-tested assessment for your own stack?

Tell us what you're protecting — we'll respond within one business day with a scoped proposal written by a pentester.