The Latest News and Information from Trail of Bits
The Trail of Bits Blog Recent content on The Trail of Bits Blog
- We beat Google’s zero-knowledge proof of quantum cryptanalysison April 17, 2026 at 11:00 am
Two weeks ago, Google’s Quantum AI group published a zero-knowledge proof of a quantum circuit so optimized, they concluded that first-generation quantum computers will break elliptic curve cryptography keys in as little as 9 minutes. Today, Trail of Bits is publishing our own zero-knowledge proof that significantly improves Google’s on all metrics. Our result is not due to some quantum breakthrough, but rather the exploitation of multiple subtle memory safety and logic vulnerabilities in Google’s Rust prover code. Google has patched their proof, and their scientific claims are unaffected, but this story reflects the unique attack surface that systems introduce when they use zero-knowledge proofs. Google’s proof uses a zero-knowledge virtual machine (zkVM) to calculate the cost of a quantum circuit on three key metrics. The total number of operations and Toffoli gate count represent the running time of the circuit, and the number of qubits represents the memory requirements. Google, along with their coauthors from UC Berkeley, the Ethereum Foundation, and Stanford, published proofs for two circuits; one minimizes the number of gates, and the other minimizes qubits. Our proof improves on both. Resource Type Google’s Low-Gate Google’s Low-Qubit Our Proof Total Operations 17,000,000 17,000,000 8,300,000 Number of Qubits 1,425 1,175 1,164 Toffoli Count 2,100,000 2,700,000 0 Table 1: Resource upper bounds reported in different proofs for circuits computing the correct output across 9,024 randomly sampled inputs Our proof fully verifies when using Google’s unpatched verification code. It has the same verification key as their original proofs and is cryptographically indistinguishable from a zero-knowledge proof resulting from actual algorithmic improvements to the quantum circuit. We are releasing the code we developed to forge the proof, and a summary of our proof follows. Circuit SHA-256 hash: 0x7efe1f62bb14a978322ab9ed41d670fc0fe0f211331032615c910df5a540e999 Groth16 proof bytes: 0x0e78f4db0000000000000000000000000000000000000000000000000000000000000000008cd56e10c2fe24795cff1e1d1f40d3a324528d315674da45d26afb376e8670000000000000000000000000000000000000000000000000000000000000000024ac7f8dd6b1de6279bcce54e8840d8eb20d522bf27dedd776046f6590f33add217db465201c63724e6b460641985543d2b79c3c54daeea688581676a786aafc1dba8604a361acdd9809e268b6d8bc73943a713bb0ed0d96221f73d26def6ea4041d05b077523d9351a48b2ecd984c686b6473df69d20a24296d0a1cba3cdbe92eb13a7cc0ecd92f27f7bf23f9ac859d4293e17216dcbd85d1c7f60a52f65a9d02faef077336acd39e845d534200b575b029d6e3f0afb4f90815557233eab70b0fe88919834dd9beb90d47241f1490dc202e0dce44e4894982b07073c8d4426513732d79e9af9913b254aa29471e1a98fa1b43a1886afb5dbd36988153217aa2 Verification key: 0x00ca4af6cb15dbd83ec3eaab3a0664023828d90a98e650d2d340712f5f3eb0d4 Zero-knowledge virtual machines Google used Succinct Labs’ SP1 zkVM for their proofs. A zkVM is essentially a way to prove that you know which private inputs for an arbitrary guest program on the zkVM generate some public output. For example, consider this basic Rust guest program. #![no_main] sp1_zkvm::entrypoint!(main); pub fn main() { // Read in private inputs a and b let a = sp1_zkvm::io::read::<u32>(); let b = sp1_zkvm::io::read::<u32>(); // Add them together let c = a + b; // Write the public output a + b sp1_zkvm::io::commit(&c); } A user can take the private inputs 2 and 3, run this program on the zkVM, and get a proof that the program ran successfully and that the output was 5. Anyone can verify the proof, but they would get zero knowledge about whether the input was (2, 3), (1, 4), or (6, 0xffffffff). Obviously, this toy problem is simple; real programs can be significantly more complicated. Behind the scenes, the Rust guest program compiles down to a RISC-V ELF binary. This simple architecture allows complex program logic to be encoded into provable mathematical relationships. For example, the state of the RISC-V registers after executing an instruction is a deterministic function of their state before execution. Having to prove every step makes generating zkVM proofs resource-intensive and costly, but significant engineering work has enabled proving statements about complex programs. Google’s zkVM guest In the case of Google’s zero-knowledge proofs, the private input is the quantum circuit (in a custom assembly language), and the program is a simulator that checks the circuit. Note that these are “circuits” in the quantum sense, not the typical zero-knowledge definition. The public output includes bounds on the number of qubits and gate operations. In general, simulating quantum circuits is difficult, but the “kickmix” circuits defined in this paper refer to a specific subset that can be tested classically. The following script, adapted from one of Google’s examples, increments a 3-qubit value. It includes three operations and a total of three qubits. Note that the first instruction CCX has two inputs (q0 and q1) and computes q2 = q2 ^ (q0 & q1). This is called a Toffoli gate. Toffoli gates are quite useful, but they’re much harder to implement on actual quantum hardware, so the complexity of quantum algorithms is sometimes measured in the number of Toffoli gates (or more accurately, non-Clifford gates). Circuits like this are serialized into bytes and sent to the zkVM simulator. # Increment a value held in 3 qubits (q2, q1, q0). Sends # (0, 0, 0) -> (0, 0, 1) # (0, 0, 1) -> (0, 1, 0) # … # (1, 1, 1) -> (0, 0, 0) # If q0 and q1 are set, flip q2. CCX q0 q1 q2 # If q0 is set, flip q1. CX q0 q1 # Flip q0. X q0 To verify that a circuit computes the correct function, the simulator deserializes the circuit, randomly initializes the qubits (e.g., to (1, 0, 1)), iteratively applies every operation in the circuit, and panics unless the final state is as expected (e.g., (1, 1, 0)). The simulator repeats this for many different inputs (9,024 times, to be precise), so proving that the simulator terminated without error is essentially the same as proving that the circuit is correct with high probability. In Google’s zkVM program, the circuit must compute one elliptic curve point addition, a critical subroutine of Shor’s algorithm for solving the elliptic curve discrete logarithm problem. In addition to checking that the circuit computes the correct function, it also counts the total number of operations, the number of qubits, and the average number of Toffoli gates (some Toffoli gates are conditioned on classical bits and may be skipped during simulation). These performance metrics are checked to ensure they do not exceed specified upper bounds; if they don’t, the upper bounds are committed as public output. Plan of attack Since Google’s zero-knowledge proof comes from the results of running a Rust simulator on a private kickmix assembly script, we can create our own zero-knowledge proof by providing our own private input to the same program. If we find some input that causes the simulator to misreport the quantum costs, we’ll have successfully forged a proof. To beat Google’s results on any metric, we have the following goals: Must compute elliptic curve point addition correctly Preferably reports fewer than 17 million total operations Preferably reports fewer than 2.1 million Toffoli gates Preferably reports fewer than 1,175 qubits This turns a quantum computing problem into an application security problem. Any deserialization bugs when parsing the kickmix circuit input are fair game, as well as any logic bugs we find in the simulator. Vulnerability 1: Bypassing the Toffoli counter One area of concern in the Rust source code was the use of unsafe blocks, disabling important memory safety checks. This was presumably done to reduce the overall cycle count of the zkVM guest program; each additional bounds check inflates the already substantial cost of generating a zero-knowledge proof, particularly checks that run millions of times. The vulnerability starts in the following two lines of code from program/src/main.rs. let private_circuit_bytes = sp1_zkvm::io::read_vec(); let ops = unsafe { rkyv::access_unchecked::<rkyv::Archived<Vec<Op>>>(&private_circuit_bytes) }; The first line shows that private circuit bytes (private_circuit_bytes) are directly read from outside the zkVM, and the use of the rkyv serialization library’s access_unchecked function instructs the library to assume that private_circuit_bytes corresponds to a valid serialization. But data from outside the zkVM is untrusted, so what happens if the bytes, which are meant to represent a vector of circuit operations, are malformed? The answer is “not much.” There are relative pointer offsets and length fields in the serialization for the Vec type, but I couldn’t see a viable path from manipulating those to getting the prover to underreport resource counts. The Op type is similarly simple, consisting of seven 32-bit fields: one describes the OperationType, and six describe the identifiers of which qubits and classical bits to use as inputs and outputs for the operation. For a while, I was chasing down a bug in how the magic identifier 0xffffffff could bypass the qubit count and trigger an out-of-bounds write in the array of simulated qubit values. I was deep in the details of understanding the Rust heap allocator used by the SP1 zkVM before a colleague pointed out that Google was using SP1’s 64-bit RISC-V architecture rather than the potentially exploitable 32-bit architecture. That left the kind field, an enum describing which of the 18 supported kickmix OperationType opcodes to apply. When simulating the quantum circuit, the guest program iterates over the vector of operations and determines whether to conditionally execute each operation; if so, it increments the count of Toffoli or Clifford gates, depending on the operation type, and executes the operation. This code is in Simulator::apply_iter. match op.kind { OperationType::CCZ | OperationType::CCX => { self.stats.toffoli_gates += executed_shots; } OperationType::CX | OperationType::CZ | OperationType::Swap | OperationType::R | OperationType::Hmr => { self.stats.clifford_gates += executed_shots; } // Note: X and Z are not considered Clifford gates in the // stats because they can be tracked in the classical control system. // They don’t need to cause something to happen on the quantum computer. _ => {} } match op.kind { OperationType::CCX => { let v = cond & self.qubit(op.q_control1) & self.qubit(op.q_control2); *self.qubit_mut(op.q_target) ^= v; } OperationType::CX => { let v = cond & self.qubit(op.q_control1); *self.qubit_mut(op.q_target) ^= v; } What if op.kind falls outside of the expected 0–17 range because rkyv was instructed not to check this value during deserialization? This is undefined behavior, so to investigate, I used Ghidra to reverse-engineer the RISC-V ELF binary Google provided with their proof. After identifying the location of this function in the binary, I discovered that the Rust compiler emits a pair of jump tables for these two match expressions. The first jump table determines which gate counter to increment, and the second performs the actual operation. But we maliciously control the value of op.kind, so what if instead of the normal behavior, we dereference past the end of the first jump table and directly jump to an address from the second jump table? Then an out-of-range OperationType could still perform the correct operation, but it would completely bypass the Toffoli counter! Figure 1: In this simplified execution flow, providing an invalid operation type bypasses the Toffoli counter, giving the same functionality while hiding the true cost. I calculated the necessary offsets, modified Google’s example prover code to inject the invalid operation types, and attempted to simulate a zero-knowledge proof of a simple 64-qubit adder circuit. To my surprise, it worked on the first try. stdout: circuit.average_cliffords_performed() = 0 stdout: circuit.average_non_cliffords_performed() = 0 stdout: The circuit passed fuzz testing. I had been concerned that the RISC-V registers would be in an invalid state when jumping into the wrong table, but this ended up not being the case. Now I had the primitive I needed to forge a circuit that misreports the number of Toffoli gates, and I just had to scale up my attack on the 64-qubit adder circuit to full elliptic curve point addition. Building a quantum circuit I now had a virtually unlimited budget for Toffoli operations, and the path forward looked simple. I could implement any kickmix circuit that correctly performs elliptic curve point addition without worrying about the Toffoli count, tweak the operation types before feeding the script to the prover, and then forge a proof for whatever Toffoli upper bound I wanted. I might use more total operations or more qubits than Google’s circuits, but it would be an amusing proof of concept. The only concern was that the prover’s running time is proportional to the total number of operations, so my circuit still needed a reasonably low operation count. It turns out that programming a quantum computer is way more challenging than I anticipated, and this is because of the requirements of reversibility and uncomputation. Requirement 1: Reversibility. A quantum circuit is made up of a series of reversible (unitary) gates. For kickmix circuits, think of these as reversible bit operations. For example, c’ = c XOR b is allowed because the original value of c can be recovered with c = c’ XOR b. On the other hand, c’ = c AND b is not allowed because if c’ and b are both 0, we cannot know if c was originally 0 or 1. By itself, AND is not reversible, but with an additional input in Toffoli gates, it is. The kickmix Toffoli operation CCX q1 q2 q3 updates q3 to q3’ = q3 XOR (q1 AND q2), and this operation can be reversed with q3 = q3’ XOR (q1 AND q2). Requirement 2: Uncomputation. To avoid the undesirable effects of entanglement, any auxiliary (or ancilla) qubits used to store intermediate results of computation must be “uncomputed,” or reset to state 0. The reversibility requirement makes this a challenge, since the intermediate result may have been 0 or 1. The intermediate state must be uncomputed from the computation result in order to be reversibly cleared out. As we try to build our reversible elliptic curve point addition circuit with uncomputation, a couple of tools are available. We could use Bennett’s trick, which involves preserving inputs and outputs in spare qubits, then running the full computation a second time in reverse to clear ancilla qubits. This approach isn’t ideal because it roughly doubles the operation count for each level of the call stack. Another approach is to use the more efficient measurement based uncomputation. Google has revealed that this is the technique their circuits use, but it requires a much finer-grained algorithmic analysis to apply correctly. Vulnerability 2: Efficient operations with register aliasing After struggling to implement elliptic curve point addition while keeping the operation count and qubit count low, I discovered another exploitable vulnerability: register aliasing. Recall the Toffoli (CCX) operation defined in Simulator::apply_iter. OperationType::CCX => { let v = cond & self.qubit(op.q_control1) & self.qubit(op.q_control2); *self.qubit_mut(op.q_target) ^= v; } There’s no check that the qubit inputs (op.q_control1 and op.q_control2) are different from the qubit output (op.q_target), so tying all three together becomes q1 = q1 ^ (q1 & q1) = 0. That is, we can immediately reset a qubit to zero, violating the quantum requirement of reversibility and making uncomputation trivial.1 Figure 2: By setting the output of a kickmix operation to the input, we can build circuits that violate quantum reversibility and implement arbitrary classical logic gates. In addition, we can use this primitive to create any logical gate we want, like the classical AND gate that violates reversibility or the functionally complete NAND gate. Now that I don’t have to deal with the limitations of quantum circuits, it’s basically Nand2Tetris, except the goal is elliptic curve point addition. I implemented basic logic gates, followed by integer addition and subtraction, modular addition, modular multiplication, modular inversion, and, finally, point addition. After exploiting a memory corruption issue in unsafe Rust code, implementing elliptic curve operations from the ground up using individual logic gates, and squeezing whatever performance I could out of the non-quantum aspects of the design, I finally had a working kickmix script that passed validation. 0 Toffolis, 8 million operations, and 1288 qubits. This beats one of Google’s two proofs but falls short of beating the other one by just 113 qubits. If I wanted to truly claim that our zero-knowledge proof beat Google’s, I couldn’t leave it there. I needed to find some way to shave off 113 qubits, but I was all out of vulnerabilities. The final challenge: Euclidean algorithm optimization Profiling my circuit made it clear that the most expensive operation was modular inversion, and the same is true for many published quantum elliptic curve addition circuits. My optimized circuit required 4 field elements (1024 qubits) for the inversion, including some tricks to store intermediate field elements, and a handful of qubits for control flags and carry bits. If I were to beat Google’s proof, I needed to lose those tricks and do modular inversion using fewer than 2.59 field elements. One idea is to use Fermat’s little theorem: $x^{-1} \equiv x^{p-2} \pmod{p}$. We replace inversion with exponentiation, which is just a sequence of modular multiplications. Each multiplication requires three field elements, and this approach requires hundreds of multiplications, well beyond our total qubit and operations budget. What many quantum circuits use instead is a variant of the extended Euclidean algorithm (EEA). To compute $x^{-1} \pmod{p}$, this algorithm involves four variables $(a, u, b, v)$ initialized to $(x, 1, p, 0)$. The algorithm proceeds through several iterations to cancel out bits of $a$ and $b$, perform the same operations to $u$ and $v$, and (assuming $x$ and $p$ are coprime) the algorithm terminates with $(a, u, b, v) = (0, 0, 1, x^{-1})$. I based my implementation on the binary EEA, a variant that involves canceling out the least significant bits of a and b rather than the standard most significant bits. Thanks to Thomas Pornin’s clear exposition of this algorithm, it was relatively easy to reimplement a high-performance version in my circuit, but the qubit overhead was still too high. Next, I found this recent preprint by Han Luo, Ziyi Yang, Ziruo Wang, Yuexin Su, and Tongyang Li, which came out just days after Google’s announcement. It describes a method to compute modular inverses with the space equivalent of 3 field elements. Many of the techniques went above my head, but they open-sourced their code, so I had a much easier time understanding their paper. Their code included a Qiskit circuit, but I was unsuccessful in integrating this into my exploit. Despite these difficulties, the paper gave me the key term I would need to shave off the remaining qubits: Proos-Zalka register sharing. The 2003 paper by John Proos and Christof Zalka recognizes that over the course of the standard EEA, the bit-lengths of a and b gets smaller, while the bit-lengths of u and v get larger. Their register-sharing algorithm saves space by limiting the number of qubits for each value at each iteration. This can fail with low probability, but rare failures are tolerable when doing Shor’s algorithm. I implemented a classical version of the register-sharing algorithm of Proos and Zalka, and I ended up with 30 million total operations, almost twice Google’s result. Finally, I had the insight I needed. What if I combined the operation efficiency of the binary EEA with the space efficiency of the Proos-Zalka algorithm? The binary EEA doesn’t have the same bounds on u and v as the standard EEA, but a slight tweak (doubling v instead of halving u) does, and needs only a simple correction factor at the end. This idea is deeply connected to Kaliski’s method, which is considered in papers by Roetteler et al., Gouzien et al., Häner et al., and Litinski. Reversibility constraints require an extra qubit for each of about 512 iterations, but our implementation doesn’t need to be reversible. Figure 3: The first 20 and last 5 rounds of the modified binary EEA depict how different variables can share space when performing modular inversion. A final correction factor is not applied here. Thanks to register sharing, my final modular inversion requires the space of only 2.55 field elements, barely less than the 2.59 required. In total, my elliptic curve point addition circuit uses 8,288,880 operations, 1,164 qubits, 5,980,691 pre-bypass Toffoli gates, and 0 reported Toffoli gates. This is less than half the reported operations in Google’s circuits and just a few qubits fewer than their best variant. The source code for generating this proof of concept is available here. What Google’s secret circuit (probably) does The zero-knowledge properties of the proof makes this unanswerable, but framed in a different way, we can answer what problems are documented in prior work that Google would have to overcome to achieve their results. Google’s circuit does elliptic curve point addition, which requires at least one modular division. In previous circuits, modular inversion is the most expensive step in terms of gate count and qubit count, so that’s where improvements are needed most. Our register-sharing implementation shows that 2.55 field elements of storage is enough for a nonreversible circuit, but prior quantum implementations of Kaliski’s EEA variant require an extra qubit per iteration to preserve reversibility. This adds 512 qubits of overhead to guarantee that modular inversion is invertible, and a circuit based on Kaliski’s method with Google’s qubit counts would need to solve this problem. Even the most revolutionary scientific breakthroughs are rooted in published literature, and I think a healthy understanding of prior work can help demystify the risk of a shadowy adversary destabilizing cryptocurrencies with a secret algorithm. The aftermath Zero-knowledge proofs are a transformational new technology with wide-ranging impacts, and their application to vulnerability disclosure is still new. Without knowing the details of their circuit, it’s impossible for me to conclude whether Google’s decision to announce this discovery using a zero-knowledge proof is justified. However, I do have experience with both vulnerability disclosure and academic publishing, and this points to broader implications in the deployment of zero-knowledge technology. One potentially overlooked aspect of coordinated disclosure is the importance of an embargo period. Current industry best practices recommend a 30-day buffer between a timely patch becoming available and full disclosure of the technical details. This allows time for patch adoption, benefits defenders who rely on the technical details, and prevents opportunistic exploitation by low-skill attackers. Zero-knowledge proofs can communicate the importance of patching, but they are not a cryptographic replacement for the benefits of eventual disclosure. In academic publishing, the more details that are available in published work, the easier it is to improve upon that work. Papers that intentionally facilitate replication and have a clear statement of methods and claims are usually the ones that are later cited and have the greatest impact. Using a zero-knowledge proof still establishes improvement over prior work; it also indicates a confidence that no one else will independently develop the same improvement, and that no one but the authors will be able to improve upon the discovery in future work. As a direct example of the value of open publishing, I want to highlight Google’s decision to release a well-documented kickmix simulator and thorough proof generation instructions. This is the sole reason I was able to find and demonstrate the vulnerabilities, and their patches simultaneously increase confidence in their zero-knowledge claims while preventing attackers from forging proofs of quantum breakthroughs that spread fear, uncertainty, and doubt. Zero-knowledge systems are an incredible technology with many applications, but their use introduces a different set of risks than traditional approaches. They aren’t a magic wand that eliminates trust; instead, they redistribute trust from an original domain, such as the opinions of scientific experts, to trust in programming languages, compilers, proof systems, and cryptography experts. There are many frontiers that are considering the benefits of zero-knowledge, including electronic voting and age verification, but it’s also critical to consider the risks and make plans for what happens when this technology fails. Acknowledgments Thank you to Craig Gidney, Ryan Babbush, Tanuj Khattar, and Adam Zalcman from Google for their quick response and for putting up with my naive questions about quantum algorithms, and to Sophie Schmieg for putting us in touch. Finally, this would not have happened without Joe Doyle and the wider Trail of Bits cryptography team, whose suggestions and enthusiasm pushed this project over the finish line. There’s a second bug in the HMR and R instructions, which are meant to reset a qubit to 0 while randomizing the phase. An error in conditional logic makes it possible to reset the qubit without trashing the phase, but register aliasing is a strictly better exploit primitive. ↩︎
- Master C and C++ with our new Testing Handbook chapteron April 9, 2026 at 11:00 am
We added a new chapter to our Testing Handbook: a comprehensive security checklist for C and C++ code. We’ve identified a broad range of common bug classes, known footguns, and API gotchas across C and C++ codebases and organized them into sections covering Linux, Windows, and seccomp. Whereas other handbook chapters focus on static and dynamic analysis, this chapter offers a strong basis for manual code review. LLM enthusiasts rejoice: we’re also developing a Claude skill based on this new chapter. It will turn the checklist into bug-finding prompts that an LLM can run against a codebase, and it’ll be platform and threat-model aware. Be sure to give it a try when we release it. And after reading the chapter, you can test your C/C++ review skills against two challenges at the end of this post. Be in the first 10 to submit correct answers to win Trail of Bits swag! What’s in the chapter The chapter covers five areas: general bug classes, Linux usermode and kernel, Windows usermode and kernel, and seccomp/BPF sandboxes. It starts with language-level issues in the bug classes section—memory safety, integer errors, type confusion, compiler-introduced bugs—and gets progressively more environment-specific. The Linux usermode section focuses on libc gotchas. This section is also applicable to most POSIX systems. It ranges from well-known problems with string methods, to somewhat less known caveats around privilege dropping and environment variable handling. The Linux kernel is a complicated beast, and no checklist could cover even a part of its intricacies. However, our new Testing Handbook chapter can give you a starting point to bootstrap manual reviews of drivers and modules. The Windows sections cover DLL planting, unquoted path vulnerabilities in CreateProcess, and path traversal issues. This last bug class includes concerns like WorstFit Unicode bugs, where characters outside the basic ANSI set can be reinterpreted in ways that bypass path checks entirely. The kernel section addresses driver-specific concerns such as device access controls, denial of service through improper spinlock usage, security issues arising from passing handles from usermode to kernelmode, and various sharp edges in Windows kernel APIs. Linux seccomp and BPF features are often used for sandboxing. While more modern tools like Landlock and namespaces exist for this task, we still see a combination of these older features during audits. And we always uncover a lot of issues. The new Testing Handbook chapter covers sandbox bypasses we’ve seen, like io_uring syscalls that execute without the BPF filter ever seeing them, the CLONE_UNTRACED flag that lets a tracee effectively disable seccomp filters, and memory-level race conditions in ptrace-based sandboxes. Test your review skills We’ve provided two challenges below that contain real bug classes from the checklist. Try to spot the issues, then submit your answers. If you’re in the first 10 to submit correct answers, you’ll receive Trail of Bits swag. The challenge will close April 17, so get your answers in before then. Stuck? Don’t worry. We’ll be publishing the answers in a follow-up blog post, so don’t forget to #like and #subscribe, by which we mean add our RSS feed to your reader. The many quirks of Linux libc In this simple ping program, there are two libc gotchas that make the program trivially exploitable. Can you find and explain the issues? If you can’t, check out the handbook chapter. Both bugs are covered in the Linux usermode section. #include <stdio.h> #include <stdlib.h> #include <string.h> #include <arpa/inet.h> #define ALLOWED_IP “127.3.3.1” int main() { char ip_addr[128]; struct in_addr to_ping_host, trusted_host; // get address if (!fgets(ip_addr, sizeof(ip_addr), stdin)) return 1; ip_addr[strcspn(ip_addr, “\n”)] = 0; // verify address if (!inet_aton(ip_addr, &to_ping_host)) return 1; char *ip_addr_resolved = inet_ntoa(to_ping_host); // prevent SSRF if ((ntohl(to_ping_host.s_addr) >> 24) == 127) return 1; // only allowed if (!inet_aton(ALLOWED_IP, &trusted_host)) return 1; char *trusted_resolved = inet_ntoa(trusted_host); if (strcmp(ip_addr_resolved, trusted_resolved) != 0) return 1; // ping char cmd[256]; snprintf(cmd, sizeof(cmd), “ping ‘%s'”, ip_addr); system(cmd); return 0; } Windows driver registry gotchas This Windows Driver Framework (WDF) driver request handler queries product version values from the registry. There are several bugs here, including an easy-to-exploit denial of service, but one of them leads to kernel code execution by messing with the registry values. Can you figure out the bug and how to exploit it? NTSTATUS InitServiceCallback( _In_ WDFREQUEST Request ) { NTSTATUS status; PWCHAR regPath = NULL; size_t bufferLength = 0; // fetch the product registry path from the request status = WdfRequestRetrieveInputBuffer(Request, 4, ®Path, &bufferLength); if (!NT_SUCCESS(status)) { TraceEvents( TRACE_LEVEL_ERROR, TRACE_QUEUE, “%!FUNC! Failed to retrieve input buffer. Status: %d”, (int)status ); return status; } /* check that the buffer size is a null-terminated Unicode (UTF-16) string of a sensible size */ if (bufferLength < 4 || bufferLength > 512 || (bufferLength % 2) != 0 || regPath[(bufferLength / 2) – 1] != L’\0’) { TraceEvents( TRACE_LEVEL_ERROR, TRACE_QUEUE, “%!FUNC! Buffer length %d was incorrect.”, (int)bufferLength ); return STATUS_INVALID_PARAMETER; } ProductVersionInfo version = { 0 }; HandlerCallback handlerCallback = NewCallback; int readValue = 0; // read the major version from the registry RTL_QUERY_REGISTRY_TABLE regQueryTable[2]; RtlZeroMemory(regQueryTable, sizeof(RTL_QUERY_REGISTRY_TABLE) * 2); regQueryTable[0].Name = L”MajorVersion”; regQueryTable[0].EntryContext = &readValue; regQueryTable[0].Flags = RTL_QUERY_REGISTRY_DIRECT; regQueryTable[0].QueryRoutine = NULL; status = RtlQueryRegistryValues( RTL_REGISTRY_ABSOLUTE, regPath, regQueryTable, NULL, NULL ); if (!NT_SUCCESS(status)) { TraceEvents( TRACE_LEVEL_ERROR, TRACE_QUEUE, “%!FUNC! Failed to query registry. Status: %d”, (int)status ); return status; } TraceEvents( TRACE_LEVEL_INFORMATION, TRACE_QUEUE, “%!FUNC! Major version is %d”, (int)readValue ); version.Major = readValue; if (version.Major < 3) { // versions prior to 3.0 need an additional check RtlZeroMemory(regQueryTable, sizeof(RTL_QUERY_REGISTRY_TABLE) * 2); regQueryTable[0].Name = L”MinorVersion”; regQueryTable[0].EntryContext = &readValue; regQueryTable[0].Flags = RTL_QUERY_REGISTRY_DIRECT; regQueryTable[0].QueryRoutine = NULL; status = RtlQueryRegistryValues( RTL_REGISTRY_ABSOLUTE, regPath, regQueryTable, NULL, NULL ); if (!NT_SUCCESS(status)) { TraceEvents( TRACE_LEVEL_ERROR, TRACE_QUEUE, “%!FUNC! Failed to query registry. Status: %d”, (int)status ); return status; } TraceEvents( TRACE_LEVEL_INFORMATION, TRACE_QUEUE, “%!FUNC! Minor version is %d”, (int)readValue ); version.Minor = readValue; if (!DoesVersionSupportNewCallback(version)) { handlerCallback = OldCallback; } } SetGlobalHandlerCallback(handlerCallback); } We’re not done yet Our goal is to continuously update the handbook, including this chapter, so that it remains a key resource for security practitioners and developers who are involved in the source code security review process. If your favorite gotcha is not there, please send us a PR. Checklist-based review, even combined with skilled-up LLMs, is only a single step in securing a system. Do it, but remember that it’s just a starting point for manual review, not a substitute for deep expertise. If you need help securing your C/C++ systems, contact us.
- What we learned about TEE security from auditing WhatsApp’s Private Inferenceon April 7, 2026 at 11:00 am
WhatsApp’s new “Private Inference” feature represents one of the most ambitious attempts to combine end-to-end encryption with AI-powered capabilities, such as message summarization. To make this possible, Meta built a system that processes encrypted user messages inside trusted execution environments (TEEs), secure hardware enclaves designed so that not even Meta can access the plaintext. Our now-public audit, conducted before launch, identified several vulnerabilities that compromised WhatsApp’s privacy model, all of which Meta has patched. Our findings show that TEEs aren’t a silver bullet: every unmeasured input and missing validation can become a vulnerability, and to securely deploy TEEs, developers need to measure critical data, validate and never trust any unmeasured data, and test thoroughly to detect when components misbehave. The challenge of using AI with end-to-end encryption WhatsApp’s Private Processing attempts to resolve a fundamental tension: WhatsApp is end-to-end encrypted, so Meta’s servers cannot read, alter, or analyze user messages. However, if users also want to opt in to AI-powered features like message summarization, this typically requires sending plaintext data to servers for computationally expensive processing. To solve this, Meta uses TEEs based on AMD’s SEV-SNP and Nvidia’s confidential GPU platforms to process messages in a secure enclave where even Meta can’t access them or learn meaningful information about the message contents. The stakes in WhatsApp are high, as vulnerabilities could expose millions of users’ private messages. Our review identified 28 issues, including eight high-severity findings that could have enabled attackers to bypass the system’s privacy guarantees. The following sections explore noteworthy findings from the audit, how they were fixed, and the lessons they impart. Key lessons for TEE deployments Lesson 1: Never trust data outside your measurement In TEE systems, an “attestation measurement” is a cryptographic checksum of the code running in the secure enclave; it’s what clients check to ensure they’re interacting with legitimate, unmodified software. We discovered that WhatsApp’s system loaded configuration files containing environment variables after this fingerprint was taken (issue TOB-WAPI-13 in the report). This meant that a malicious insider at Meta could inject an environment variable, such as LD_PRELOAD=/path/to/evil.so, forcing the system to load malicious code when it started up. The attestation would still verify as valid, but the attacker’s malicious code would be running inside, potentially violating the system’s security or privacy guarantees by, for example, logging every message being processed to a secret server. Meta fixed this by strictly validating environment variables: they can now contain only safe characters (alphanumeric plus a few symbols like dots and dashes), and the system explicitly checks for dangerous variables like LD_PRELOAD. Every piece of data your TEE loads must either be part of the measured boot process or be treated as potentially hostile. Lesson 2: Do not trust data outside your measurement (have we already mentioned this?) ACPI tables are configuration data that inform an operating system about the available hardware and how to interact with it. We found these tables weren’t included in the attestation measurement (TOB-WAPI-17), creating a backdoor for attackers. Here’s why this matters: a malicious hypervisor (the software layer that manages virtual machines) could inject fake ACPI tables defining malicious “devices” that can read and write to arbitrary memory locations. When the secure VM boots up, it processes these tables and grants the fake devices access to memory regions that should be protected. An attacker could use this to extract user messages or encryption keys directly from the VM’s memory, and the attestation report will still verify as valid and untampered. Meta addressed this by implementing a custom bootloader that verifies ACPI table signatures as part of the secure boot process. Now, any tampering with these tables will change the attestation measurement, alerting clients that something is wrong. Lesson 3: Correctly verify security patch levels AMD regularly releases security patches for its SEV-SNP firmware, fixing vulnerabilities that could allow attackers to compromise the secure environment. The WhatsApp system did check these patch levels, but it made an important error: it trusted the patch level that the firmware claimed to be running (in the attestation report), rather than verifying it against AMD’s cryptographic certificate (TOB-WAPI-8). An attacker who had compromised an older, vulnerable firmware could simply lie about their patch level. Researchers have publicly demonstrated attacks that can extract encryption keys from older SEV-SNP firmware versions. An attacker could use these published techniques against WhatsApp users to exfiltrate secret data while the client incorrectly believes it’s connected to a secure, updated system. Meta’s solution was to validate patch levels against the VCEK certificate’s X.509 extensions. These extensions are cryptographically signed data from AMD that can’t be forged by compromised firmware. The client then enforces minimum patch levels based on values set in the WhatsApp client source code. Lesson 4: Attestations need freshness guarantees Before our review, when a client connected to the Private Processing system, the server would generate an attestation report proving its identity, but this report didn’t include any timestamp or random value from the client (TOB-WAPI-7). This meant that an attacker who compromised a TEE once could save its attestation report and TLS keys, then replay them indefinitely. Achieving a one-time compromise of a TEE is typically much more feasible and much less severe than a persistent compromise affecting each individual session. For example, consider an attacker who can extract TLS session keys through a side channel attack or other vulnerability. For a single attack, the impact tends to be short-lived, as the forward security of TLS makes the exploit impactful for only a single TLS session. However, without freshness, that single success becomes a permanent backdoor because the TEE’s attestation report from that compromised session can be replayed indefinitely. In particular, the attacker can now run a fake server anywhere in the world, presenting the stolen attestation to clients who will trust it completely. Every WhatsApp user who connects would send their messages to the attacker’s server, believing it’s a secure Meta TEE. Meta addressed this issue by including the TLS client_random nonce in every attestation report. Now each attestation is tied to a specific connection and can’t be replayed. When implementing remote-attested transport protocols, we recommend performing attestation over a value derived from the handshake transcript, such as the scheme specified in the IETF draft Remote Attestation with Exported Authenticators. How Meta fixed the remaining issues Before their launch, Meta resolved 16 issues completely and partially addressed four others. The remaining eight unresolved issues are low- and informational-severity issues that Meta has deliberately not addressed. Meta provided a justification for each of these decisions, which can be reviewed in appendix F of our audit report. In addition, they’ve implemented broader improvements, such as automated build pipelines with provenance verification and published authorized host identities in external logs. Beyond individual vulnerabilities: Systemic challenges in TEE deployment While Meta has resolved these specific issues, our audit revealed the need to solve more complex challenges in securing TEE-based systems. Physical security matters: The AMD SEV-SNP threat model doesn’t fully protect against advanced physical attacks. Meta needed to implement additional controls around which CPUs could be trusted (TOB-WAPI-10). If you are interested in a more detailed discussion on physical attacks targeting these platforms, check out our webinar, which discusses recently published physical attacks targeting both AMD SEV-SNP and Intel’s SGX/TDX platforms. Transparency requires reproducibility: For external researchers to verify the system’s security, they need to be able to reproduce and examine the CVM images. Meta has made progress in this area, but achieving full reproducibility remains challenging, as issue TOB-WAPI-18 demonstrates. Complex systems need comprehensive testing: Many of the issues we found could have been caught with negative testing, specifically testing what happens when components misbehave or when malicious inputs are provided. The path forward for securely deploying TEEs Can TEEs enable privacy-preserving AI features? Our audit suggests the answer is yes, but only with rigorous attention to implementation details. The issues we found weren’t fundamental flaws in the TEE model but rather implementation and deployment gaps that a determined attacker could exploit. These are subtle flaws that other TEE deployments are likely to replicate. This audit shows that while TEEs provide strong isolation primitives, the large host-guest attack surface requires careful design and implementation. Every unmeasured input, every missing validation, and every assumption about the execution environment can become a vulnerability. Your system is only as secure as your TEE implementation and deployment. For teams building on TEEs, our advice is clear: engage security reviewers early, invest in comprehensive testing (especially negative testing), and remember that security in these systems comes from getting hundreds of details right, not just the big architectural decisions. The promise of confidential computing is compelling. But, as this audit shows, realizing that promise requires rigorous attention to security at every layer of the stack. For more details on the technical findings and Meta’s fixes, see our full audit report. If you’re building systems with TEEs and want to discuss security considerations, we offer free office hours sessions where we can share insights from our extensive experience with these technologies.
- Simplifying MBA obfuscation with CoBRAon April 3, 2026 at 11:00 am
Mixed Boolean-Arithmetic (MBA) obfuscation disguises simple operations like x + y behind tangles of arithmetic and bitwise operators. Malware authors and software protectors rely on it because no standard simplification technique covers both domains simultaneously; algebraic simplifiers don’t understand bitwise logic, and Boolean minimizers can’t handle arithmetic. We’re releasing CoBRA, an open-source tool that simplifies the full range of MBA expressions used in the wild. Point it at an obfuscated expression and it recovers a simplified equivalent: $ cobra-cli –mba “(x&y)+(x|y)” x + y $ cobra-cli –mba “((a^b)|(a^c)) + 65469 * ~((a&(b&c))) + 65470 * (a&(b&c))” –bitwidth 16 67 + (a | b | c) CoBRA simplifies 99.86% of the 73,000+ expressions drawn from seven independent datasets. It ships as a CLI tool, a C++ library, and an LLVM pass plugin. If you’ve hit MBA obfuscation during malware analysis, reversing software protection schemes, or tearing apart VM-based obfuscators, CoBRA gives you readable expressions back. Why existing approaches fall short The core difficulty is that verifying MBA identities requires reasoning about how bits and arithmetic interact under modular wrapping, where values silently overflow and wrap around at fixed bit-widths. An identity like (x ^ y) + 2 * (x & y) == x + y is true precisely because of this interaction, but algebraic simplifiers only see the arithmetic and Boolean minimizers only see the logic; neither can verify it alone. Obfuscators layer these substitutions to build arbitrarily complex expressions from simpler operations. Previous MBA simplifiers have tackled parts of this problem. SiMBA handles linear expressions well. GAMBA extends support to polynomial cases. Until CoBRA, no single tool achieved high success rates across the full range of MBA expression types that security engineers encounter in the wild. How CoBRA works CoBRA uses a worklist-based orchestrator that classifies each input expression and selects the right combination of simplification techniques. The orchestrator manages 36 discrete passes organized across four families—linear, semilinear, polynomial, and mixed—and routes work items based on the expression’s structure. Most MBA expressions in the wild are linear: sums of bitwise terms like (x & y), (x | y), and ~x, each multiplied by a constant. For these, the orchestrator evaluates the expression on all Boolean inputs to produce a signature, then races multiple recovery techniques against each other and picks the cheapest verified result. Here’s what that looks like for (x ^ y) + 2 * (x & y): CoBRA linear simplification flow: (x ^ y) + 2 * (x & y) Step 1: ClassificationInput expression is identified as Linear MBA ↓ Step 2: Truth Table GenerationEvaluate on all boolean inputs → [0, 1, 1, 2] truth table ↓ Step 3a: Pattern MatchScan identity database Step 3b: ANF ConversionBitwise normal form Step 3c: InterpolationSolve basis coefficients ↓ Step 4: CompetitionCompare candidate results → Winner: x + y (Lowest Cost) ↓ Step 5: VerificationSpot-check against random 64-bit inputs or prove with Z3 → Pass When constant masks appear (like x & 0xFF), the expression enters CoBRA’s semi-linear pipeline, which breaks it down into its smallest bitwise building blocks, recovers structural patterns, and reconstructs a simplified result through bit-partitioned assembly. For expressions involving products of bitwise subexpressions (like (x & y) * (x | y)), a decomposition engine extracts polynomial cores and solves residuals. Mixed expressions that combine products with bitwise operations often contain repeated subexpressions. A lifting pass replaces these with temporary variables, simplifying the inner pieces first, then solving the expression that connects them. Here’s what that looks like for a product identity (x & y) * (x | y) + (x & ~y) * (~x & y): CoBRA mixed simplification flow: (x & y) * (x | y) + (x & ~y) * (~x & y) Step 1: ClassificationInput is identified as Mixed MBA ↓ Step 2: DecomposeDecompose into subexpressions↓ (x & y) * (x | y) (x & ~y) * (~x & y) ↓ ↓ Step 3: Lift & SolveLift products, solve inner pieces ↓ Step 4: Collapse IdentityCollapse product identity → x * y ↓ Step 5: VerificationSpot-check against random 64-bit inputs or prove with Z3 → Pass Regardless of which pipeline an expression passes through, the final step is the same: CoBRA verifies every result against random inputs or proves equivalence with Z3. No simplification is returned unless it is confirmed correct. What you can do with it CoBRA runs in three modes: CLI tool: Pass an expression directly and get the simplified form back. Use –bitwidth to set modular arithmetic width (1 to 64 bits) and –verify for Z3 equivalence proofs. C++ library: Link against CoBRA’s core library to integrate simplification into your own tools. If you’re building an automated analysis pipeline, the Simplify API takes an expression and returns a simplified result or reports it as unsupported. LLVM pass plugin: Load libCobraPass.so into opt to deobfuscate MBA patterns directly in LLVM IR. If you’re building deobfuscation pipelines on top of tools like Remill, this integrates directly as a pass. It handles patterns spanning multiple basic blocks and applies a cost gate, only replacing instructions when the simplified form is smaller, and supports LLVM 19 through 22. Validated against seven independent datasets We tested CoBRA against 73,066 expressions from SiMBA, GAMBA, OSES, and four other independent sources. These cover the full spectrum of MBA complexity, from two-variable linear expressions to deeply nested mixed-product obfuscations. Category Expressions Simplified Rate Linear ~55,000 ~55,000 ~100% Semilinear ~1,000 ~1,000 ~100% Polynomial ~5,000 ~4,950 ~99% Mixed ~9,000 ~8,900 ~99% Total 73,066 72,960 99.86% The 106 unsupported expressions are carry-sensitive mixed-domain cases where bitwise and arithmetic operations interact in ways that current techniques can’t decompose. CoBRA reports these as unsupported rather than guessing wrong. The full benchmark breakdown is in DATASETS.md. What’s next CoBRA’s remaining failures fall into two categories: expressions with heavy subexpression duplication that exhaust the worklist budget even with lifting, and carry-sensitive residuals where bitwise masks over arithmetic products create bit-level dependencies that no current decomposition technique can recover. We’re also exploring broader integration options beyond just an LLVM pass, like native plugins for IDA Pro and Binary Ninja. The source is available on GitHub under the Apache 2.0 license. If you run into expressions CoBRA can’t simplify, please open an issue on the repository. We want the hard problems.
- Mutation testing for the agentic eraon April 1, 2026 at 11:00 am
Code coverage is one of the most dangerous quality metrics in software testing. Many developers fail to realize that code coverage lies by omission: it measures execution, not verification. Test suites with high coverage can obfuscate the fact that critical functionality is untested as software develops over time. We saw this when mutation testing uncovered a high-severity Arkis protocol vulnerability, overlooked by coverage metrics, that would have allowed attackers to drain funds. Today, we’re announcing MuTON and mewt, two new mutation testing tools optimized for agentic use, along with a configuration optimization skill to help agents set up campaigns efficiently. MuTON provides first-class support for TON blockchain languages (FunC, Tolk, and Tact), while mewt is the language-agnostic core that also supports Solidity, Rust, Go, and more. The goal of mutation testing is to systematically introduce bugs (mutants) and check if your tests catch them, flagging hot spots where code is insufficiently tested. However, mutation testing tools have historically been slow and language-specific. MuTON and mewt are built to change that. To understand how, it helps to first understand what they’re replacing. The regex era Mutation testing dates to the 1970s, but for a long time, the technique rarely saw much adoption in the blockchain space as a software quality measurement. Testing frameworks are coupled tightly to target languages, making support for new languages expensive. Universalmutator changed this with its regex engine. After a commit on March 10, 2018 added Solidity support, the tool gained immediate traction in the blockchain space. We collaborated with the universalmutator team to advance smart contract testing and highlighted the tool in our 2019 blog post. Despite (or perhaps because of) its elegant approach and compact codebase, universalmutator generated impressive mutant counts, enabling developers to assess test coverage more thoroughly than simpler tools could. Vyper and other language support followed, establishing universalmutator as the leading mutation testing tool for blockchain. But regex has fundamental limits. Line-based patterns cannot mutate multi-line statements, a critical gap acknowledged by the original paper. More problematic: without mutant prioritization, the tool wastes time on redundant mutations. When commenting a line triggers no test failures, universalmutator still generates and tests every possible variation of that line, dramatically extending campaign runtime. Printing the results to stdout adds further friction for humans and AI agents reviewing campaigns. Later improvements (including a 2024 switch to comby for better syntactic handling) addressed some pain points, but remaining limitations prompted the development of more focused alternatives. Between 2019 and 2023, several tools emerged to address them, including our own slither-mutate solution. Each took a different approach to the core problems of language comprehension, scalability, and test quality. slither-mutate: Speed through prioritization We launched slither-mutate in August 2022, after our wintern, Vishnuram, brought the concept to life. Because Slither already parsed Solidity’s AST and provided a Python API, the groundwork was laid to generate syntactically valid mutations and implement a cleaner tweak-test-restore cycle (earlier tools polluted repositories with mutated files). The tool’s key innovation was mutant prioritization: high-severity mutants replace statements with reverts (exposing unexecuted code paths), medium-severity mutants comment out lines (revealing unverified side effects), and low-severity mutants make subtle changes, such as swapping operators. The tool skips lower-severity mutants when higher-severity ones already indicate missing coverage on the same line, dramatically reducing campaign runtime, the biggest obstacle to wider mutation testing adoption. By late 2022, we were deploying slither-mutate across most Solidity audits. Two limitations remained. First, tight coupling to Solidity meant there was no path to easily support other blockchain languages. Second, dumping results to stdout persisted as a problem, but adding a database to Slither creates unacceptable friction for the broader Slither user base. Introducing MuTON and mewt: The tree-sitter era MuTON, our newest mutation testing tool, provides first-class support for all three TON blockchain languages: Tolk, Tact, and FunC. We’re grateful to the TON Foundation for supporting its development. MuTON is built on mewt, a language-agnostic mutation testing core that also supports Solidity, Rust, and more. MuTON achieves language comprehension comparable to slither-mutate while supporting multiple languages by using Tree-sitter as its parser. Tree-sitter powers syntax highlighting in modern editors, building a concrete syntax tree that distinguishes language keywords from comments. This allows MuTON to target expressions like if-statements in a well-structured way, handling multi-line statements gracefully. Traditionally, integrating Tree-sitter grammars for new language support takes orders of magnitude longer than writing regex rules, but AI agents paired with bespoke skills invert this calculus, delivering Tree-sitter’s power with regex-like ease of extension. MuTON stores all mutants and test results in a SQLite database, a quality-of-life improvement that became evident while using slither-mutate but wasn’t feasible to retrofit. Results persist across sessions; campaigns can be paused and resumed without losing progress. If you accidentally close your terminal during a 24-hour campaign, your work survives. Persistent storage also enables flexible filtering and formatting: print only uncaught mutants in specific files, or translate results to SARIF for improved review. This flexibility helps humans and AI agents explore results, triage findings, and hunt for bugs. The future of mutation testing MuTON addresses many historical pain points, but significant friction remains. Three challenges stand between mutation testing and widespread adoption: configuring campaigns for reasonable runtimes, triaging results to separate signal from noise, and generating tests that encode requirements rather than accidents. AI agents, equipped with specialized skills, promise to transform each of these obstacles into routine tasks. Optimizing configuration Performance remains the biggest obstacle to mutation testing. If your test suite takes five minutes and you have 1,000 mutants, that’s 83 hours of unavoidable runtime. Mutation testing tools can’t fix slow tests, but smart configuration can dramatically reduce wasted time. MuTON already gives you powerful options to tune campaigns: target critical components instead of everything, use two-phase campaigns that run fast targeted tests first and then retest uncaught mutants with the full suite, configure per-target test commands so mutations in authentication code only trigger authentication tests, or restrict to high and medium severity mutations when time is tight. These tools work today and deliver real speedups. But the decision tree branches endlessly: should you split by component or severity? Two-phase or targeted tests? What timeout accounts for incremental recompilation? We’ve released a configuration optimization skill that guides AI agents through these choices, measuring your test suite, estimating runtimes, and proposing optimal configurations tailored to your project structure. Try it now—it’s available in our public skills repository and makes the process painless. Triaging results Not all uncaught mutants matter. Mutations that change x > 0 to x != 0 are semantic no-ops when x is an unsigned integer. A perfect mutator wouldn’t generate such mutations in the first place, but that would require deeper language-specific understanding than Tree-sitter provides. Manual triage traditionally requires slogging through hundreds of results, checking types, and understanding context to extract actionable insights. MuTON’s database and flexible filtering already make this dramatically easier. Filter by mutation type or specific files to highlight high-value results. More importantly, these filters make AI-assisted triage token-efficient in ways earlier tools dumping raw output to stdout never could. Even today, asking an agent to review filtered mutation results and summarize true positives delivers 80% of the insights for 1% of the manual work. We’re developing a triage skill that systematically guides agents through result analysis, identifying patterns such as clustered uncaught mutants (a strong bug indicator) versus isolated operator mutations in utility functions (likely false positives or low priority). The skill will help agents flag high-risk areas and explain why specific mutations matter, turning raw results into actionable security insights. The promise and peril of mutation-driven test generation At first glance, using mutation testing to guide AI agents in writing tests seems like an elegant solution: test mutants, find escapees, generate tests to catch them, repeat until coverage is complete. But this naive approach harbors a subtle danger: an uncritical agent doesn’t know whether it’s encoding correct behavior or propagating bugs into your test suite. When mutation testing reveals that changing priority >= 2 to priority > 2 alters behavior, should the agent write a test asserting that priority == 2 triggers an action? Maybe. Or maybe that’s a bug, and now you’ve corrupted your tests with the same incorrect logic, giving false confidence while doubling your maintenance burden. The real challenge isn’t generating tests that just catch mutants; it’s generating tests that encode requirements rather than implementation accidents. We believe the solution lies in building agents that are skeptical, that halt and ask questions when they encounter suspicious or ambiguous patterns, and that demand external validation before crystallizing behavior into tests. It’s a subtle problem that balances AI’s strengths with developers’ limited attention, but we’re working on it. Stay tuned. Dive in Ready to test your smart contracts? Install MuTON for TON languages, or mewt for Solidity, Rust, and more. Run a campaign and discover your blind spots. Found a bug in TON language support? File an issue in MuTON. See room for improvement in the core framework or other languages? Join us in the mewt repository. Both projects are open source and welcome contributions. Watch our skills repository for new skills that will guide AI agents through campaign setup and result analysis, transforming mutation testing from a manual slog into a routine part of the development process.
- How we made Trail of Bits AI-native (so far)on March 31, 2026 at 11:00 am
This post is adapted from a talk I gave at [un]prompted, the AI security practitioner conference. Thanks to Gadi Evron for inviting me to speak. You can watch the recorded presentation below or download the slides. Most companies hand out ChatGPT licenses and wait for the productivity numbers to move. We built a system instead. A year ago, about 5% of Trail of Bits was on board with our AI initiative. The other 95% ranged from passively skeptical to actively resistant. Today we have 94 plugins, 201 skills, 84 specialized agents, and on the right engagements, AI-augmented auditors finding 200 bugs a week. This post is the playbook for how we got there. We open sourced most of it, so you can steal it today. A recent Fortune article reported that a National Bureau of Economic Research study of 6,000 executives across the U.S., U.K., Germany, and Australia found AI had no measurable impact on employment or productivity. Two-thirds of executives said they use AI, but actual usage came out to 1.5 hours per week, and 90% of firms reported zero impact. Economists are calling it the new Solow paradox, referencing the pattern Robert Solow identified in 1987: “you can see the computer age everywhere but in the productivity statistics.” AI works. Most companies are using it wrong. They give people tools without changing the system. That’s the gap between AI-assisted and AI-native. One is a tool, the other is an operating system. What AI-native actually means “AI-native” gets thrown around a lot. The way I think about it, there are three levels: AI-assisted is where almost everyone starts. You give people access to ChatGPT or Claude. They use it to draft emails, generate boilerplate, summarize documents. It’s a productivity tool. The org doesn’t change. The workflows don’t change. You just do the same things a little faster. AI-augmented is where you start redesigning workflows. You’re not just using AI as a tool. You’re putting agents in the loop, changing how work actually flows. Maybe the AI does the first pass on a code review and the human does the second. The process itself is different. AI-native is the structural shift. The org is designed from the ground up assuming AI is a core participant. Not a tool you pick up, but a teammate that’s always there. Your knowledge management, your delivery model, your expertise, all designed to be consumed and amplified by agents. At Trail of Bits, what this means concretely: our security expertise compounds as code. Every engagement we do, the skills and workflows we build make the next engagement faster. Every engineer operates with an arsenal of specialized agents built from 14 years of audit knowledge. That’s not “we use AI.” That’s “AI is on the team.” What people are actually resisting When I first launched this initiative inside Trail of Bits, there was an incredible amount of pushback. Studies of technology adoption consistently show the same thing: the problem is never the software. It’s people’s unwillingness to accept that something else might be better than their intuition. I had to understand four specific psychological barriers before I could design a system that works within them. Self-enhancing bias. We overestimate our own judgment. Paul Meehl and Robyn Dawes showed that if you take the variables an expert says they use and build even a crude linear model, the model outperforms the expert. Not because it’s smarter, but because it applies the same weights every time. You don’t. You’re hungover some days, distracted others, and you never notice because you take credit for your wins and blame external factors for your misses. This gets worse with seniority. The more expert you are, the more you trust your gut, and the less you believe a machine could do better. As Jonathan Levav frames it: the more unique you feel you are, the more you resist a machine making decisions for you. Identity threat. In one study, researchers showed people the same kitchen automation device framed two ways: “does the cooking for you” versus “helps you cook better.” People who identified as cooks rejected the first framing and accepted the second, for the same device. There’s a symbolic dimension too: people don’t want robots giving them tattoos (human craft), but they’re fine with a tattoo-removing robot (instrumental, no symbolism). Security auditing is symbolic work. AI that replaces skill feels like an attack on who you are. Intolerance for imperfection. Dietvorst et al. ran a study where participants watched an algorithm outperform a human forecaster. But after seeing the algorithm make one error, they abandoned it and went back to the human, even though the human was demonstrably worse. We forgive our own mistakes but not the machine’s. Their follow-up found the fix: let people modify the algorithm. Even one adjustable parameter was enough to overcome the aversion. Opacity. A 2021 study in Nature Human Behaviour found that people’s subjective understanding of human judgment is high and AI judgment is low, but objective understanding of both is near zero. People feel like they understand how a doctor diagnoses. They can’t explain it either. The feeling of not understanding kills the feeling of control. The remedies that actually worked We designed the system around the resistance, not against it. The remedies that actually worked For self-enhancing bias, we built a maturity matrix. Nobody likes being told they’re at level 1. But that’s the point: you can’t argue you’re already good enough when there’s a visible ladder. It makes the conversation concrete instead of “I don’t think AI is useful.” It also creates social proof. When you see peers at level 2 or 3, the passive majority starts moving. For identity threat, we never asked anyone to stop being a security expert. We gave them a new way to express that identity. When a senior auditor writes a constant-time-analysis skill, they’re not being replaced. They’re becoming more permanent. Their expertise is encoded and reusable. That’s an identity upgrade, not a threat. The maturity matrix reinforces this: level 3 isn’t “uses AI the most.” It’s “invents new ways, builds tools.” The identity of the expert shifts from “I don’t need AI” to “I’m the one who makes the AI dangerous.” For intolerance for imperfection, we invested heavily in reducing the ways AI can fail embarrassingly. A curated marketplace means no random plugins with backdoors. Sandboxing means Claude Code can’t accidentally delete your work. Guardrails and footgun reduction mean fewer “AI did something stupid” stories circulating in Slack. If someone’s first AI experience is bad, you’ve lost them for months. For opacity, we wrote an AI Handbook that made everything concrete: here’s what’s approved, here’s what’s not, here are the exceptions, here’s who to ask. Clear rules restored the feeling of control. And underlying everything: we made adoption visible and fast. Deferred benefits kill adoption. If setup takes an hour and the first result is mediocre, you’ve confirmed every skeptic’s priors. Copy-pasteable configs, one-command setup, standardized toolchain, all designed so the first experience is fast and good. And the CEO going first matters more than people think. The passive 50% watches what leadership actually does, not what it says. The operating system model Here’s the actual system we built. Six parts, each designed to address the barriers I just described: Barrier Core problem What we built Self-enhancing bias “I’m already good enough” Maturity Matrix with visible levels and real consequences Identity threat “AI is replacing who I am” Skills repos + hackathons that reward building, not just using Intolerance for imperfection One bad experience = months lost Curated marketplace, sandboxing, guardrails Opacity / trust “I don’t understand how it decides” AI Handbook that explains the risk model, not just the rules Pick a standard toolchain so you can support it Write the rules so risk conversations stop being ad hoc Create a capability ladder so improvement is expected, measurable, and rewarded Run tight adoption sprints so the org keeps pace with releases Package the learnings into reusable artifacts (repos, configs, sandboxes) so the system compounds Make autonomy safe with sandboxing, guardrails, and hardened defaults This isn’t a strategy deck we wrote and handed to someone. We built every piece ourselves, open sourced most of it, and iterated on it in production with a 140-person company doing real client work. Standardize on tools Step one was boring but critical: we standardized. We got everyone on Claude Code, and we treat it like any other enterprise tool: supported configs, known-good defaults, and a clear path to “this is how we do it here.” If you skip this step, you can’t build anything else. You end up with 40 different workflows and zero leverage. Write the rules We wrote an AI Handbook. Not to teach people how to prompt. It’s there to remove ambiguity. The key part is the usage policy: what tools are approved, what isn’t, especially for sensitive data. Cursor can’t be used on client code (except blockchain engagements; use Claude Code or Continue.dev instead). Meeting recorders are disallowed for client meetings conducted under legal privilege. Now, when a client asks what we’re using on their codebase, everyone gives the same answer. The handbook doesn’t just list what’s approved. It explains the risk model behind each decision, so people understand why. That’s what addresses the opacity barrier: not “just trust this,” but “here’s our reasoning.” Once you have policy, you can safely push harder on adoption. Make it measurable We built an AI Maturity Matrix that makes AI usage a first-class professional capability, like “can you use Git” or “can you write tests.” Trail of Bits AI Maturity Matrix, as of March 2026 It’s not a vibe. It’s a ladder: clear levels, clear expectations, a clear path up, and real consequences for staying stuck. What level 3 looks like depends on your role. An engineer at level 3 builds agent systems that ship PRs and close issues autonomously. A sales rep at level 3 has agents producing pipeline reports and QBR prep without hand-holding. An auditor at level 3 runs agents that execute full analysis passes and produce findings, triage, and report drafts. This is how you avoid two failure modes: leadership wishing adoption into existence, and the org splitting into “AI people” and everyone else. Create an adoption engine We run hackathons as a management system: short, focused sprints of 2-3 days with one objective. They’re how we keep pace when the ecosystem changes every week. Claude Code Hackathon v2: Autonomous Agents One recent example: “Claude Code Hackathon v2: Autonomous Agents.” The two lines that mattered were: Objective: Ship the most impactful changes across our AI toolchain and public repos Twist: Engineers must work in bypass permissions mode (fully autonomous agent, not approve-every-action) That twist is intentional. It forces everyone to learn the real constraints: sandboxing, guardrails, and how to structure work so agents can succeed. A few design choices matter here: we focus on public repos so we can move fast and show real outcomes. We measure success by activity (issues filed/fixed, PRs reviewed/merged), not lines of code. Everyone works in pairs, and every change gets reviewed by a buddy. Even the “move fast” sprint has quality control built in. Capture the work as reusable artifacts Hackathons create motion. But motion doesn’t compound unless you capture it. The most important artifact is a skills repo. Skills are reusable, structured workflows, ideally with examples, constraints, and a way to verify output. We maintain an internal skills repo for company-specific workflows and an external skills repo so the broader community can validate and improve what we’re doing. We also created a curated marketplace, a “known good” place for third-party skills. Once you tell people “go use skills and plugins,” they’ll install random stuff. This is basic enterprise thinking applied to agent tooling: if you want adoption, you need a safe supply chain. We made defaults copy-pasteable. We built a repo that centralizes recommended Claude Code configuration so onboarding isn’t tribal knowledge. This is where we put known-good settings, recommended patterns for personal ~/.claude/CLAUDE.md, and anything we want to standardize. We made sandboxing the default. If you want autonomous agents, you need sandboxing. We give people multiple safe lanes: a devcontainer option, native macOS sandboxing, and Dropkit. The point isn’t that everyone uses the same sandbox. The point is everyone has a safe sandbox, and it’s easy to adopt. We reduced footguns. We hardened defaults through MDM. For example, we rolled out more secure package manager defaults via Jamf, including mandatory package cooldown policies. The easiest way to reduce risk is to make the default path the safe path. Finally, we connected agents to real tools. Once you have policy, guardrails, sandboxes, and skills, you can connect agents to real tools. One example we’ve published is an MCP server for Slither. Even if you don’t care about Slither specifically, the point is: MCP turns your internal tools into something agents can use reliably, and your org can govern. Results so far Let me give you some numbers on what this system actually produced. The numbers that got the room’s attention at [un]prompted Tooling scale: Across our internal and public skills repos, we have 94 plugins containing 201 skills, 84 specialized agents, 29 commands, 125 scripts, and over 414 reference files encoding domain expertise. That’s the compounding effect: every engagement, every auditor, every experiment adds to the arsenal. The breadth matters. We have skills for writing sales proposals, tracking project hours, onboarding new hires, prepping conference blog posts, and delivering government contract reports. The internal repo has 20+ plugins targeting specific vulnerability classes: ERC-4337, merkle trees, precision loss, slippage, state machines, CUDA/Rust review, integer arithmetic in Go. Each one packages expertise that used to live in someone’s head into something any auditor can invoke. Delivery impact: For certain clients where the codebase and scope allow it, we went from finding about 15 bugs a week to 200. An auditor runs a fleet of specialized agents doing targeted analysis across an entire codebase in parallel, then validates the results. About 20% of all bugs we report to clients are now initially discovered by AI in some form. They go into real client reports. An auditor validates every one, but the AI is surfacing things humans would have missed or wouldn’t have had time to look for. Business impact: Our sales team averages $8M in revenue per rep against a consulting industry benchmark of $2-4M. The sales team uses the same skills repos for proposal drafting, competitive positioning, conference prep, and lead enrichment. Same system, same compounding effect. And this is maybe a year into building the system seriously. The models are getting better every month. The skills repo grows every week. Open questions Here’s what we’re actively working on and don’t have great answers for yet. Private inference. We want local models for cost and confidentiality, but open models aren’t good enough yet. There’s still a significant gap versus the best closed models on coding benchmarks. We’re evaluating on-prem inference servers to run 230B+ models at full precision. Key insight: speed drives adoption more than capability. Nobody uses a slow model, even if it’s smart. In the meantime, private inference providers like Tinfoil.sh (confidential computing on NVIDIA GPUs, cryptographically verifiable) are getting compelling. Prompt injection and client code protection. This is an existential question for using AI on client code. The data the agent works on is inherently accessible to it. Today we use blunt instruments: sensitive clients mean no web access. Longer term, we’re looking at agent-native shells like nono and agentsh that enforce policy at the kernel level. Policy enforcement and continuous learning. We push settings via MDM, but we’re not yet pulling signal back. The goal is to turn the whole company into a feedback loop that improves the operating system weekly. One possible long-term architecture: a master MCP server between agents and internal resources, enforcing policy server-side. We’re not there yet. The future of consulting. This is the one that keeps me up at night. The consulting business model assumes you’re billing for time, and that time roughly correlates with expertise. But when some people can outperform others by orders of magnitude with the right agent setup, that correlation breaks. The question shifts from “how many hours did the auditor spend” to “did the auditor know where to point the agents and which findings are real.” We don’t have the answer yet. But the nature of how Trail of Bits offers services will probably change in the next 6 to 12 months. Audit scoping, pricing, deliverables, all of it is on the table. The firms that figure this out first will have a structural advantage, and the ones that keep billing by the hour will watch their margins compress as their competitors ship more in less time. We’re not waiting to find out which side we’re on. The replicable recipe If you want to copy this, copy the system, not the specific tools: Standardize on one agent workflow you can support Write an AI Handbook so risk decisions aren’t ad hoc Create a capability ladder so improvement is expected Run short adoption sprints that force hands-on usage Capture everything as reusable artifacts: skills + configs + curated supply chain Make autonomy safe with sandboxing + guardrails + hardened defaults That’s what we’ve done so far, and it’s already changed how fast we can ship and how quickly we can adapt. Resources All of our tooling is open source: trailofbits/skills – Our public skills repository trailofbits/skills-curated – Curated third-party skills marketplace trailofbits/claude-code-config – Recommended Claude Code configurations trailofbits/claude-code-devcontainer – Devcontainer for sandboxed development trailofbits/dropkit – macOS sandboxing for agents trailofbits/slither-mcp – MCP server for Slither We’re hiring! We’re looking for an AI Systems Engineer to work directly with me on accelerating everything in this post, and a Head of Application Security to lead a team of about 15 exceptionally overperforming consultants. Check out trailofbits.com/careers.
- Try our new dimensional analysis Claude pluginon March 25, 2026 at 11:00 am
We’re releasing a new Claude plugin for developing and auditing code that implements dimensional analysis, a technique we explored in our most recent blog post. Most LLM-based security skills ask the model to find bugs. Our new dimensional-analysis plugin for Claude Code takes a different approach: it uses the LLM to annotate your codebase with dimensional types, then flags mismatches mechanically. In testing against real audit findings, it achieved 93% recall versus 50% for baseline prompts. You can download and use our new dimensional-analysis plugin by running these commands: claude plugin marketplace add trailofbits/skills claude plugin install dimensional-analysis@trailofbits claude /dimensional-analysis How our plugin differs from most skills This plugin release is quite different from the wave of security analysis skills released over the past few weeks. The skills we’ve seen tend to take a relatively simple approach, where the LLM is primed with a set of vulnerability classes, exploration instructions, and example findings, and is then told to try to identify bugs within the scope of the skill. Unfortunately, these approaches tend to produce low-quality results, with precision, recall, and determinism that is often much poorer than simply asking an LLM to “find the bugs in this project.” What makes dimensional-analysis different is that instead of relying on LLM judgement to search for, identify, and rank vulnerabilities, it uses the LLM as a vocabulary-building/categorization machine that directly annotates the codebase. If the annotations are correct and a dimensional bug is present, that bug shows up as a mismatch between annotations instead of having to rely on an LLM’s judgement to determine how viable a finding is. In effect, this changes the calculus of how the LLM’s reasoning capability is being used, and produces much better results than baseline prompts that overly rely on LLM reasoning capabilities. Benchmarking We tested dimensional-analysis against a set of dimensional mismatch issues found during several unpublished audits and compared it to a baseline prompt using 10 samples per codebase. For this evaluation, the dimensional-analysis plugin had a recall rate of 93% with a standard deviation of 12%, versus the baseline prompt, which had a recall rate of 50% with a standard deviation of 20%. This means that dimensional-analysis performed both better and more consistently than the baseline prompt. How it works If you haven’t already, read our first blog post on the dimensional analysis technique. The plugin works over four main phases: dimension discovery, dimension annotation, dimension propagation, and dimension validation. In the first phase, a subagent performs dimension discovery, with the goal of identifying a vocabulary of fundamental base units that every numerical term in the system is composed of. During this process, it also identifies a set of common derived units for quick reference by later agents. Figure 1: A sample of a dimensional vocabulary for a protocol using Uniswap v4 hooks The dimensional vocabulary is persisted to DIMENSIONAL_UNITS.md, where it can be read by other agents or used during development if you choose to make the annotations a permanent part of your software development lifecycle. In the second phase, a group of subagents is launched to directly annotate the codebase using the dimensional vocabulary. Each subagent is provided with the DIMENSIONAL_UNITS.md file, a batch of files to annotate, and instructions to annotate state variables, function arguments, variable declarations, and any portions of complex arithmetic. These initial annotations are called “anchor” annotations. } else if (currentPrice < peakPrice) { // D18{1} = (D18{price} – D18{price}) * D18{1} / (D18{price} – D18{price}) imbalance = ((peakPrice – currentPrice) * imbalanceSlopeData.imbalanceSlopeBelowPeak) / (peakPrice – eclpParams.alpha.toUint256()); } else { // D18{1} = (D18{price} – D18{price}) * D18{1} / (D18{price} – D18{price}) imbalance = ((currentPrice – peakPrice) * imbalanceSlopeData.imbalanceSlopeAbovePeak) / (eclpParams.beta.toUint256() – peakPrice); } Figure 2: A sample of annotated arithmetic from Balancer v3 In the third phase, dimensions are “propagated” across each file to callers and callees. This phase adds extra annotations to low-priority files that didn’t receive annotations on the first pass, and performs the first set of checks to make sure that dimensions agree within the same code context and across files. It’s important to note that a dimensional mismatch at this stage doesn’t necessarily mean a vulnerability is present; sometimes it’s not possible to infer the precise dimension of a called function argument without reading the implementation of the function itself, and the system will over-generalize or make a poor guess. This third phase attempts to “repair” these over-generalized annotations and, if repair is not possible, flags them for triage in the final step. In the fourth and final phase, the plugin attempts to discover mismatches and perform triage. Dimensional mismatching is checked for during assignment, during arithmetic, across function boundaries, across return paths, and across external calls. Dimensional mismatches are compared against a severity classification based on the nature of the mismatch, and a final report is returned to the user. What’s next? If you’re a developer working on an arithmetic-heavy project like a smart contract or blockchain node, we highly recommend running this plugin, then committing DIMENSIONAL_UNITS.md along with all of the annotations created by the plugin. Besides finding bugs, these annotations can greatly improve how long it takes to build a thorough understanding of a complex codebase and help improve both human and LLM understanding of the semantic meaning of your project’s arithmetic expressions. While new tools are exciting, at this time we don’t believe that this tool can find every source of dimensional error. LLMs are probabilistic, which means there is always going to be some level of error. We’re interested in improving this plugin wherever possible, so if you run it and it misses a dimensional error, please open an issue on our GitHub.
- Spotting issues in DeFi with dimensional analysison March 24, 2026 at 11:00 am
Using dimensional analysis, you can categorically rule out a whole category of logic and arithmetic bugs that plague DeFi formulas. No code changes required, just better reasoning! One of the first lessons in physics is learning to think in terms of dimensions. Physicists can often spot a flawed formula in seconds just by checking whether the dimensions make sense. I once had a teacher who even kept a stamp that said “non-homogeneous formula” for that purpose (and it was used a lot on students’ work). Developers can use the same approach to spot incorrect arithmetic in smart contracts. In this post, we’ll start with the basics of dimensional analysis in physics and then apply the same reasoning to real DeFi formulas. We’ll also show you how this can be implemented in practice, using Reserve Protocol as an example. Along the way, we’ll see why developers need to think explicitly about dimensional safety when writing smart contracts, and why the DeFi ecosystem would benefit from tooling that can automatically catch these classes of bugs. Speaking of which, while putting together this post, we actually built a Claude plugin for this purpose (which we discuss in our follow-up post). Quantities and dimensions We will start with two formulas: $$\textit{Speed} = \textit{distance} + \textit{time}$$$$\textit{Speed} = \frac{\textit{distance}}{\textit{time}}$$Which of the two formulas is the correct way to calculate the speed of an object? Clearly, it’s the second one, but not just because you’ve memorized the correct formula. The deeper reason lies in dimensions. Physics recognizes seven fundamental quantities: length (meters), mass (grams), time (seconds), electric current (amps), thermodynamic temperature (kelvin), amount of substance (moles), and luminous intensity (candela). Every other physical concept, like speed, force, or energy, is a derived quantity, defined in terms of the fundamental ones. For example, this is how speed is defined: $$\textit{Speed} = \textit{distance} / \textit{time}$$And this is how it’s represented in dimensional terms: $$\textit{Speed}\text{(meters/second)} = \frac{\textit{length}\text{ (meters)}}{\textit{time}\text{ (seconds)}}$$The golden rule is simple: both sides of an equation must have the same dimension. And, just as important, you can’t add or subtract quantities with different dimensions. So if we reason through the incorrect speed formula in terms of dimensions, we’ll get this: $$\textit{Speed}\text{ (meters/second)} = \frac{\textit{length}\text{ (meters)}}{\textit{time}\text{ (seconds)}} = \textit{length}\text{ (meters)} + \textit{time}\text{ (seconds)}$$This is clearly nonsense. If dimensions could scream, they would. So we can easily say that this formula can’t be used to calculate anything, speed or otherwise. Note that even when dimensions check out, you must still use consistent units! Dimensional thinking in DeFi Now let’s shift the lens. Physics deals with meters, seconds, and kilograms, but DeFi has its own “dimensions”: tokens, prices, liquidity, and so on. Here’s where mistakes start to creep in. Imagine you’re coding an AMM and you write this: $$K = x + y$$Does that look right? It shouldn’t. Here, x might represent the number of “token A” and y the number of “token B.” Adding them together is just as meaningless as adding distance and time. They’re different dimensions. At this point, you might object: “Wait, this is exactly how Curve Stable Pools work!” And you’d be right. But the key is in the name: stable. In a stable pool, tokens are designed to maintain near-equal value. Under that assumption, token A and token B are treated as if they were the same “dimension.” This trick makes the formula workable in this special case. But outside of stable pools, blindly adding tokens together is as absurd as writing \(\textit{speed} = \textit{distance} + \textit{time}\). Understanding homogeneous formulas helps you not only find issues but also understand why a formula is structured the way it is. In physics, speed is a derived quantity built from the fundamental quantities of length and time. DeFi has its own derived quantities: liquidity, for example, is built from token balances. For example, in a Uniswap v3 pool with reserves x and y, liquidity is calculated as follows: $$\textit{Liquidity} = \sqrt{x \cdot y}$$Dimensionally, this calculation looks like this: $$\textit{Liquidity} = \sqrt{[A] \cdot [B]}$$Here, [A] is a dimension that represents the number of token A, and [B] is a dimension that represents the number of token B. On its own, “token A × token B” doesn’t have a direct interpretation, just like “meters × seconds” doesn’t. But within the invariant equation \(k = x \cdot y\), the \(x \cdot y\) part defines a conserved relationship that governs swaps. k and the liquidity are not base dimensions; they are derived ones, combining the balances of multiple tokens into a single pool-wide property. Why some price formulas don’t work Example 1 Suppose someone writes this incorrect formula in his protocol: $$\textit{Price} = \frac{\text{number of token A}}{\textit{liquidity}}$$We can easily spot the issue with dimensional analysis. This is an example of a correct and straightforward way to define a price: $$\text{Price of B in terms of A} = \frac{\text{amount of A}}{\text{amount of B}} = \frac{[A]}{[B]}$$If the formula \(\textit{Price} = \frac{\text{number of token A}}{\textit{liquidity}}\) were correct, the right side of the equation would have the same dimensions as the correct price definition above. But dimensionally, the right side of the formula is as follows: $$\frac{[A]}{\sqrt{[A] \cdot [B]}} = \frac{\sqrt{[A]} \cdot \sqrt{[A]}}{\sqrt{[A] \cdot [B]}} = \sqrt{\frac{[A]}{[B]}}$$That’s not a price; it’s the square root of a price. The formula produces something, but it’s not a price. Consequently, we have different dimensions on the right and left sides of the formula. This means the formula \(\textit{Price} = \frac{\text{number of token A}}{\textit{liquidity}}\) is incorrect. This is discernible without further knowledge of the DEX. Example 2 Let’s take another example that is harder to spot without dimensional analysis. Which of these formulas is incorrect? $$K = (\text{number of token A})^2 \cdot \text{Price of B in terms of A}$$ $$K = \frac{(\text{number of token A})^2}{\text{Price of B in terms of A}}$$ Here is a tip: K is often defined as \(\text{number of token A} \cdot \text{number of token B}\) . Dimensionally, this means \(K = [A] \cdot [B]\). Now that we have the dimensions of the left side of the equation, let’s check if one of the two formulas has the same dimensions on the right side. $$K = [A]^2 \cdot \frac{[A]}{[B]} = \frac{[A]^3}{[B]}$$ $$K = \frac{[A]^2}{\frac{[A]}{[B]}} = [A] \cdot [B]$$ So we can see that the first formula can’t be valid, and the second one is dimensionally valid! Example 3 For an example in a DeFi context, let’s consider a real vulnerability that we identified during the CAP Labs audit (TOB-CAP-17). function price(address _asset) external view returns (uint256 latestAnswer, uint256 lastUpdated) { address capToken = IERC4626(_asset).asset(); (latestAnswer, lastUpdated) = IOracle(msg.sender).getPrice(capToken); uint256 capTokenDecimals = IERC20Metadata(capToken).decimals(); uint256 pricePerFullShare = IERC4626(_asset).convertToAssets(capTokenDecimals); latestAnswer = latestAnswer * pricePerFullShare / capTokenDecimals; } Figure 1: Price calculation function in CAP ERC-4626 explicitly expects a number of assets as the only input of the convertToAssets function. But the CAP Labs implementation sends decimals! That’s exactly the kind of issue that can be identified with a quick dimensional analysis, even without knowing what the codebase does. Real-life best practices Some programming languages make dimensional safety a first-class feature. For instance, F# has a “units of measure” system: you can declare a value as float<m/s> or float<USD/token>, and the compiler will reject equations where the units don’t align. It’s enforced at compile time. Solidity lacks this feature, so developers must emulate it through comments and naming conventions. For example, Reserve Protocol’s unit comments are a textbook best practice. They codify dimensional reasoning in its codebase. All state variables and parameters are annotated with unit comments that define how values relate. This practice enforces that assignments in code must preserve matching dimensions, often with nearby comments showing unit equivalences. In Reserve Protocol contracts, each variable carries a comment like the one shown in figure 2. In this example, the comment indicates that the price is represented as a 27-decimal fixed-point unit of account per token. Because both the dimension (UoA/tok) and the numeric scale (D27) are documented, developers and auditors instantly know what a number represents and how to handle it. This eliminates ambiguity, prevents values with different scales from being mixed, and acts as a guardian against subtle formula bugs. /// Start a new rebalance, ending the currently running auction /// @dev If caller omits old tokens they will be kept in the basket for mint/redeem but skipped in the rebalance /// @dev Note that weights will be _slightly_ stale after the fee supply inflation on a 24h boundary /// @param tokens Tokens to rebalance, MUST be unique /// @param weights D27{tok/BU} Basket weight ranges for the basket unit definition; cannot be empty [0, 1e54] /// @param prices D27{UoA/tok} Prices for each token in terms of the unit of account; cannot be empty (0, 1e45] /// @param limits D18{BU/share} Target number of baskets should have at end of rebalance (0, 1e27] /// @param auctionLauncherWindow {s} The amount of time the AUCTION_LAUNCHER has to open auctions, can be extended /// @param ttl {s} The amount of time the rebalance is valid for function startRebalance( Figure 2: Example of a comment explaining the dimension of a price in Reserve Protocol smart contracts This approach is not limited to large or mature protocols. Any smart contract codebase can benefit from explicitly documenting dimensions and units. Developers should treat dimensional annotations as part of the protocol’s safety model rather than as optional documentation. Clearly labeling whether a variable represents tokens, prices, liquidity shares, or fixed-point scaled values makes code easier to review, safer to modify, and significantly simpler to audit. When designing a dimensional annotation system, a few general principles can help: Make dimensions explicit and consistent. Decide early how dimensions will be represented (for example, tok, UoA, shares, etc.) and apply the convention uniformly across the codebase. Always document scale together with dimension. In DeFi, mismatched decimals are often as dangerous as mismatched dimensions. Including fixed-point precision (such as D18 or D27) alongside dimensional annotations removes ambiguity. Annotate inputs, outputs, and state variables. Dimension safety breaks down if only storage variables are documented, but function parameters and return values are not. Prefer clarity over brevity. Slightly longer variable names or comments are far cheaper than subtle arithmetic bugs. Document conversions explicitly. Whenever values change dimension or scale (for example, shares to assets or tokens to unit of account), adding a short comment explaining the transformation greatly improves auditability. These conventions require discipline, but they improve dimensional safety in a language that does not natively support it. Toward dimensional safety in Solidity We’ve taken a first step toward automating this kind of analysis with a Claude plugin for dimensional checking, which we’ll introduce in a follow-up post. Beyond that, the ecosystem would benefit from deeper static analysis tooling that blends the semantic capabilities of LLMs. For example, a Slither-based linting or static analysis tool for Solidity could completely infer, propagate, and check “units” and “dimensions” across a codebase, flagging mismatches in the same way that Solidity warns about most incompatible types. In the meantime, document your protocol’s dimensions and decimals: note in comments what each variable represents, and be explicit about the scale and units of every stored or computed value. These small habits will make your formulas more readable, auditable, and robust. And try out our new Claude plugin for dimensional analysis. For more details, see our follow-up blog post announcing the plugin.
- Six mistakes in ERC-4337 smart accountson March 11, 2026 at 11:00 am
Account abstraction transforms fixed “private key can do anything” models into programmable systems that enable batching, recovery and spending limits, and flexible gas payment. But that programmability introduces risks: a single bug can be as catastrophic as leaking a private key. After auditing dozens of ERC‑4337 smart accounts, we’ve identified six vulnerability patterns that frequently appear. By the end of this post, you’ll be able to spot these issues and understand how to prevent them. How ERC-4337 works Before we jump into the common vulnerabilities that we often encounter when auditing smart accounts, here’s the quick mental model of how ERC-4337 works. There are two kinds of accounts on Ethereum: externally owned accounts (EOAs) and contract accounts. EOAs are simple key-authorized accounts that can’t run custom logic. For example, common flows like token interactions require two steps (approve/permit, then execute), which fragments transactions and confuses users. Contract accounts are smart contracts that can enforce rules, but cannot initiate transactions on their own. Before account abstraction, if you wanted wallet logic like spending limits, multi-sig, or recovery, you’d deploy a smart contract wallet like Safe. The problem was that an EOA still had to kick off every transaction and pay gas in ETH, so in practice, you were juggling two accounts: one to sign and one to hold funds. ERC-4337 removes that dependency. The smart account itself becomes the primary account. A shared EntryPoint contract and off-chain bundlers replace the EOA’s role, and paymasters let you sponsor gas or pay in tokens instead of ETH. Here’s how ERC-4337 works: Step 1: The user constructs and signs a UserOperation off-chain. This includes the intended action (callData), a nonce, gas parameters, an optional paymaster address, and the user’s signature over the entire message. Step 2: The signed UserOperation is sent to a bundler (think of it as a specialized relayer). The bundler simulates it locally to check it won’t fail, then batches it with other operations and submits the bundle on-chain to the EntryPoint via handleOps. Step 3: The EntryPoint contract calls validateUserOp on the smart account, which verifies the signature is valid and that the account can cover the gas cost. If a paymaster is involved, the EntryPoint also validates that the paymaster agrees to sponsor the fees. Step 4: Once validation passes, the EntryPoint calls back into the smart account to execute the actual operation. The following figure shows the EntryPoint flow diagram from ERC-4337: Figure 1: EntryPoint flow diagram from ERC-4337 If you’re not already familiar with ERC-4337 or want to dig into the details we’re glossing over here, it’s worth reading through the full EIP. The rest of this post assumes you’re comfortable with the basics. Now that we’ve covered the ERC-4337 attack surface, let’s explore the common vulnerability patterns we encounter in our audits. 1. Incorrect access control If anyone can call your account’s execute function (or anything that moves funds) directly, they can do anything with your wallet. Only the EntryPoint contract should be allowed to trigger privileged paths, or a vetted executor module in ERC-7579. A vulnerable implementation allows anyone to drain the wallet: function execute(address target, uint256 value, bytes calldata data) external { (bool ok,) = target.call{value: value}(data); require(ok, “exec failed”); } Figure 2: Vulnerable execute function While in a safe implementation, the execute function is callable only by entryPoint: address public immutable entryPoint; function execute(address target, uint256 value, bytes calldata data) external { require(msg.sender == entryPoint, “not entryPoint”); (bool ok,) = target.call{value: value}(data); require(ok, “exec failed”); } Figure 3: Safe execute function Here are some important considerations for access control: For each external or public function, ensure that the proper access controls are set. In addition to the EntryPoint access control, some functions need to restrict access to the account itself. This is because you may frequently want to call functions on your contract to perform administrative tasks like module installation/uninstallation, validator modifications, and upgrades. 2. Incomplete signature validation (specifically the gas fields) A common and serious vulnerability arises when a smart account verifies only the intended action (for example, the callData) but omits the gas-related fields: preVerificationGas verificationGasLimit callGasLimit maxFeePerGas maxPriorityFeePerGas All of these values are part of the payload and must be signed and checked by the validator. Since the EntryPoint contract computes and settles fees using these parameters, any field that is not cryptographically bound to the signature and not sanity-checked can be altered by a bundler or a frontrunner in transit. By inflating these values (for example, preVerificationGas, which directly reimburses calldata/overhead), an attacker can cause the account to overpay and drain ETH. preVerificationGas is the portion meant to compensate the bundler for work outside validateUserOp, primarily calldata size costs and fixed inclusion overhead. We use preVerificationGas as the example because it’s the easiest lever to extract ETH: if it isn’t signed or strictly validated/capped, someone can simply bump that single number and get paid more, directly draining the account. Robust implementations must bind the full UserOperation, including all gas fields, into the signature, and so enforce conservative caps and consistency checks during validation. Here’s an example of an unsafe validateUserOp function: function validateUserOp(UserOperation calldata op, bytes32 /*hash*/, uint256 /*missingFunds*/) external returns (uint256 validationData) { // Only checks that the calldata is “approved” require(_isApprovedCall(op.callData, op.signature), “bad sig”); return 0; } Figure 4: Unsafe validateUserOp function And here’s an example of a safe validateUserOp function: function validateUserOp(UserOperation calldata op, bytes32 userOpHash, uint256 /*missingFunds*/) external returns (uint256 validationData) { require(_isApprovedCall(userOpHash, op.signature), “bad sig”); return 0; } Figure 5: Safe validateUserOp function Here are some additional considerations: Ideally, use the userOpHash sent by the Entrypoint contract, which includes the gas fields by spec. If you must allow flexibility, enforce strict caps and reasonability checks on each gas field. 3. State modification during validation Writing state in validateUserOp and then using it during execution is dangerous since the EntryPoint contract validates all ops in a bundle before executing any of them. For example, if you cache the recovered signer in storage during validation and later use that value in execute, another op’s validation can overwrite it before yours runs. contract VulnerableAccount { address public immutable entryPoint; address public owner1; address public owner2; address public pendingSigner; modifier onlyEntryPoint() { require(msg.sender == entryPoint, “not EP”); _; } function validateUserOp(UserOperation calldata op, bytes32 userOpHash, uint256) external returns (uint256) { address signer = recover(userOpHash, op.signature); require(signer == owner1 || signer == owner2, “unauthorized”); // DANGEROUS: persists signer; can be clobbered by another validation pendingSigner = signer; return 0; } // Later: appends signer into the call; may use the WRONG (overwritten) signer function executeWithSigner(address target, uint256 value, bytes calldata data) external onlyEntryPoint { bytes memory payload = abi.encodePacked(data, pendingSigner); (bool ok,) = target.call{value: value}(payload); require(ok, “exec failed”); } } Figure 6: Vulnerable account that change the state of the account in the validateUserOp function In Figure 6, one of the two owners can validate a function, but use the other owner’s address in the execute function. Depending on how the execute function is supposed to work in that case, it can be an attack vector. Here are some important considerations for state modification: Avoid modifying the state of the account during the validation phase. Remember batch semantics: all validations run before any execution, so any “approval” written in validation can be overwritten by a later op’s validation. Use a mapping keyed by userOpHash to persist temporary data, and delete it deterministically after use, but prefer not persisting anything at all. 4. ERC‑1271 replay signature attack ERC‑1271 is a standard interface for contracts to validate signatures so that other contracts can ask a smart account, via isValidSignature(bytes32 hash, bytes signature), whether a particular hash has been approved. A recurring pitfall, highlighted by security researcher curiousapple (read the post-mortem here), is to verify that the owner signed a hash without binding the signature to the specific smart account and the chain. If the same owner controls multiple smart accounts, or if the same account exists across chains, a signature created for account A can be replayed against account B or on a different chain. The remedy is to use EIP‑712 typed data so the signature is domain‑separated by both the smart account address (as verifyingContract) and the chainId. At a minimum, the signed payload must include the account and chain so that a signature cannot be transplanted across accounts or networks. A robust pattern is to wrap whatever needs authorizing inside an EIP‑712 struct and recover against the domain; this automatically binds the signature to the correct account and chain. function isValidSignature(bytes32 hash, bytes calldata sig) external view returns (bytes4) { // Replay issue: recovers over a raw hash, // not bound to this contract or chainId. return ECDSA.recover(hash, sig) == owner ? MAGIC : 0xffffffff; } Figure 7: Example of a vulnerable implementation of EIP-1271 function isValidSignature(bytes32 hash, bytes calldata sig) external view returns (bytes4) { bytes32 structHash = keccak256(abi.encode(TYPEHASH, hash)); bytes32 digest = _hashTypedDataV4(structHash); return ECDSA.recover(digest, sig) == owner ? MAGIC : 0xffffffff; } Figure 8: Safe implementation of EIP-1271 Here are some considerations for ERC-1271 signature validations: Always verify EIP‑712 typed data so the domain binds signatures to chainId and the smart account address. Enforce exact ERC‑1271 magic value return (0x1626ba7e) on success; anything else is failure. Test negative cases explicitly: same signature on a different account, same signature on a different chain, and same signature after nonce/owner changes. 5. Reverts don’t save you in ERC‑4337 In ERC-4337, once validateUserOp succeeds, the bundler gets paid regardless of whether execution later reverts. This is the same model as normal Ethereum transactions, where miners collect fees even on failed txs, so planning to “revert later” is not a safety net. The success of validateUserOp commits you to paying for gas. This has a subtle consequence: if your validation is too permissive and accepts operations that will inevitably fail during execution, a malicious bundler can submit those operations repeatedly, each time collecting gas fees from your account without anything useful happening. A related issue we’ve seen in audits involves paymasters that pay the EntryPoint from a shared pool during validateUserOp, then try to charge the individual user back in postOp. The problem is that postOp can revert (bad state, arithmetic errors, risky external calls), and a revert in postOp does not undo the payment that already happened during validation. An attacker can exploit this by repeatedly passing validation while forcing postOp failures by withdrawing his ETH from the pool during the execution of the userOp, for example, and draining the shared pool. The robust approach is to never rely on postOp for core invariants. Debit fees from a per-user escrow or deposit during validation, so the money is secured before execution even begins. Treat postOp as best-effort bookkeeping: keep it minimal, bounded, and designed to never revert. Here are some important considerations for ERC-4337: Make postOp minimal and non-reverting: avoid external calls and complex logic, and instead treat it as best-effort bookkeeping. Test both success and revert paths. Consider that once the validateUserOp function returns a success, the account will pay for the gas. 6. Old ERC‑4337 accounts vs ERC‑7702 ERC‑7702 allows an EOA to temporarily act as a smart account by activating code for the duration of a single transaction, which effectively runs your wallet implementation in the EOA’s context. This is powerful, but it opens an initialization race. If your logic expects an initialize(owner) call, an attacker who spots the 7702 delegation can frontrun with their own initialization transaction and set themselves as the owner. The straightforward mitigation is to permit initialization only when the account is executing as itself in that 7702‑powered call. In practice, require msg.sender == address(this) during initialization. function initialize(address newOwner) external { // Only callable when the account executes as itself (e.g., under 7702) require(msg.sender == address(this), “init: only self”); require(owner == address(0), “already inited”); owner = newOwner; } Figure 9: Example of a safe initialize function for an ERC-7702 smart account This works because, during the 7702 transaction, calls executed by the EOA‑as‑contract have msg.sender == address(this), while a random external transaction cannot satisfy that condition. Here are some important considerations for ERC-7702: Require msg.sender == address(this) and owner == address(0) in initialize; make it single‑use and impossible for external callers. Create separate smart accounts for ERC‑7702–enabled EOAs and non‑7702 accounts to isolate initialization and management flows. Quick security checks before you ship Use this condensed list as a pre-merge gate for every smart account change. These checks block some common AA failures we see in audits and production incidents. Run them across all account variants, paymaster paths, and gas configurations before you ship. Use the EntryPoint’s userOpHash for validation. Restrict execute/privileged functions to EntryPoint (and self where needed). Keep validateUserOp stateless: don’t write to storage. Force EIP‑712 for ERC‑1271 and other signed messages. Make postOp minimal, bounded, and non‑reverting. For ERC‑7702, allow init only when msg.sender == address(this), once. Add multiple end-to-end tests on success and revert paths. If you need help securely implementing smart accounts, contact us for an audit.
- mquire: Linux memory forensics without external dependencieson February 25, 2026 at 12:00 pm
If you’ve ever done Linux memory forensics, you know the frustration: without debug symbols that match the exact kernel version, you’re stuck. These symbols aren’t typically installed on production systems and must be sourced from external repositories, which quickly become outdated when systems receive updates. If you’ve ever tried to analyze a memory dump only to discover that no one has published symbols for that specific kernel build, you know the frustration. Today, we’re open-sourcing mquire, a tool that eliminates this dependency entirely. mquire analyzes Linux memory dumps without requiring any external debug information. It works by extracting everything it needs directly from the memory dump itself. This means you can analyze unknown kernels, custom builds, or any Linux distribution, without preparation and without hunting for symbol files. For forensic analysts and incident responders, this is a significant shift: mquire delivers reliable memory analysis even when traditional tools can’t. The problem with traditional memory forensics Memory forensics tools like Volatility are essential for security researchers and incident responders. However, these tools require debug symbols (or “profiles”) specific to the exact kernel version in the memory dump. Without matching symbols, analysis options are limited or impossible. In practice, this creates real obstacles. You need to either source symbols from third-party repositories that may not have your specific kernel version, generate symbols yourself (which requires access to the original system, often unavailable during incident response), or hope that someone has already created a profile for that distribution and kernel combination. mquire takes a different approach: it extracts both type information and symbol addresses directly from the memory dump, making analysis possible without any external dependencies. How mquire works mquire combines two sources of information that modern Linux kernels embed within themselves: Type information from BTF: BPF Type Format is a compact format for type and debug information originally designed for eBPF’s “compile once, run everywhere” architecture. BTF provides structural information about the kernel, including type definitions for kernel structures, field offsets and sizes, and type relationships. We’ve repurposed this for memory forensics. Symbol addresses from Kallsyms: This is the same data that populates /proc/kallsyms on a running system—the memory locations of kernel symbols. By scanning the memory dump for Kallsyms data, mquire can locate the exact addresses of kernel structures without external symbol files. By combining type information with symbol locations, mquire can find and parse complex kernel data structures like process lists, memory mappings, open file handles, and cached file data. Kernel requirements BTF support: Kernel 4.18 or newer with BTF enabled (most modern distributions enable it by default) Kallsyms support: Kernel 6.4 or newer (due to format changes in scripts/kallsyms.c) These features have been consistently enabled on major distributions since they’re requirements for modern BPF tooling. Built for exploration After initialization, mquire provides an interactive SQL interface, an approach directly inspired by osquery. This is something I’ve wanted to build ever since my first Querycon, where I discussed forensics capabilities with other osquery maintainers. The idea of bringing osquery’s intuitive, SQL-based exploration model to memory forensics has been on my mind for years, and mquire is the realization of that vision. You can run one-off queries from the command line or explore interactively: $ mquire query –format json snapshot.lime ‘SELECT comm, command_line FROM tasks WHERE command_line NOT NULL and comm LIKE “%systemd%” LIMIT 2;’ { “column_order”: [ “comm”, “command_line” ], “row_list”: [ { “comm”: { “String”: “systemd” }, “command_line”: { “String”: “/sbin/init splash” } }, { “comm”: { “String”: “systemd-oomd” }, “command_line”: { “String”: “/usr/lib/systemd/systemd-oomd” } } ] } Figure 1: mquire listing tasks containing systemd The SQL interface enables relational queries across different data sources. For example, you can join process information with open file handles in a single query: mquire query –format json snapshot.lime ‘SELECT tasks.pid, task_open_files.path FROM task_open_files JOIN tasks ON tasks.tgid = task_open_files.tgid WHERE task_open_files.path LIKE “%.sqlite” LIMIT 2;’ { “column_order”: [ “pid”, “path” ], “row_list”: [ { “path”: { “String”: “/home/alessandro/snap/firefox/common/.mozilla/firefox/ 4f1wza57.default/cookies.sqlite” }, “pid”: { “SignedInteger”: 2481 } }, { “path”: { “String”: “/home/alessandro/snap/firefox/common/.mozilla/firefox/ 4f1wza57.default/cookies.sqlite” }, “pid”: { “SignedInteger”: 2846 } } ] } Figure 2: Finding processes with open SQLite databases This relational approach lets you reconstruct complete file paths from kernel dentry objects and connect them with their originating processes—context that would require multiple commands with traditional tools. Current capabilities mquire currently provides the following tables: os_version and system_info: Basic system identification tasks: Running processes with PIDs, command lines, and binary paths task_open_files: Open files organized by process memory_mappings: Memory regions mapped by each process boot_time: System boot timestamp dmesg: Kernel ring buffer messages kallsyms: Kernel symbol addresses kernel_modules: Loaded kernel modules network_connections: Active network connections network_interfaces: Network interface information syslog_file: System logs read directly from the kernel’s file cache (works even if log files have been deleted, as long as they’re still cached in memory) log_messages: Internal mquire log messages mquire also includes a .dump command that extracts files from the kernel’s file cache. This can recover files directly from memory, which is useful when files have been deleted from disk but remain in the cache. You can run it from the interactive shell or via the command line: mquire command snapshot.lime ‘.dump /output/directory’ For developers building custom analysis tools, the mquire library crate provides a reusable API for kernel memory analysis. Use cases mquire is designed for: Incident response: Analyze memory dumps from compromised systems without needing to source matching debug symbols. Forensic analysis: Examine what was running and what files were accessed, even on unknown or custom kernels. Malware analysis: Study process behavior and file operations from memory snapshots. Security research: Explore kernel internals without specialized setup. Limitations and future work mquire can only access kernel-level information; BTF doesn’t provide information about user space data structures. Additionally, the Kallsyms scanner depends on the data format from the kernel’s scripts/kallsyms.c; if future kernel versions change this format, the scanner heuristics may need updates. We’re considering several enhancements, including expanded table support to provide deeper system insight, improved caching for better performance, and DMA-based external memory acquisition for real-time analysis of physical systems. Get started mquire is available on GitHub with prebuilt binaries for Linux. To acquire a memory dump, you can use LiME: insmod ./lime-x.x.x-xx-generic.ko ‘path=/path/to/dump.raw format=padded’ Then you can run mquire: # Interactive session $ mquire shell /path/to/dump.raw # Single query $ mquire query /path/to/dump.raw ‘SELECT * FROM os_version;’ # Discover available tables $ mquire query /path/to/dump.raw ‘.schema’ We welcome contributions and feedback. Try mquire and let us know what you think.









