"Trust but verify" is a common adage. "Hindsight is 20/20" is another one. The best bugs are those hiding in plain sight.
Compiler bugs are located deep in the supply chain, making their effects far more widespread than normal protocol bugs. Numerous contracts across different chains were compiled with vulnerable Vyper versions - it was a race against blackhats.
Here's how it all happened.
Timeline
As a note, I'll use the "we" pronoun loosely here. I think I personally made some insightful contributions towards the initial vulnerability discovery but countless others helped far more throughout the entire process.
13:10 UTC pETH/ETH was drained of $11M.
13:19 UTC Michal posted in ETHSecurity about a sudden drop in pETH price.
Igor first noticed something was off. Thanks to him, we dug deeper.
But how did the bot reenter into add_liquidity() from remove_liquidity()?
14:01 UTC A warroom was formed around this comment.
14:07 UTC We decompiled the JPEGd contract with our favorite decompiler and noted a difference in reentrancy guard storage slot.
// Dispatch table entry for add_liquidity(uint256[2],uint256)
label_0057:
if (storage[0x00]) { revert(memory[0x00:0x00]); }
storage[0x00] = 0x01;
// Dispatch table entry for remove_liquidity(uint256,uint256[2])
label_1AF3:
if (storage[0x02]) { revert(memory[0x00:0x00]); }
storage[0x02] = 0x01;
14:27 UTC We confirmed this behavior with a simple local test contract.
@external
@nonreentrant("lock")
def test(addr: address) -> bool:
return True
@external
@nonreentrant("lock")
def test2(addr: address) -> bool:
return False
This was not just another reentrancy bug.
At this point, we realized just how impactful this would be. There was a blackout of information, and we deleted public messages on the nature of the vulnerability.
14:37 UTC Wavey helped identify the vulnerable commit and affected versions. This was also confirmed by me and Charles by manually inspecting the Vyper compiler output.
It was a race with the hackers.
Thankfully, people were still confusing this for read-only reentrancy. Taken from the "Web3 Security Alerts" channel.
Alchemix and Metronome DAO also been hacked due to this read-only reentrancy bug: https://twitter.com/hexagate_/status/1685677801813217280
Michael identified alETH and msETH pools, which were also running 0.2.15, as being also potentially vulnerable.
14:50 UTC msETH/ETH was drained.
15:34 UTC alETH/ETH was drained.
15:43 UTC We identified that CRV/ETH was vulnerable, compiled using Vyper version 3.0.0. It was critical that we kept the nature of affected contracts secret for as long as possible.
16:11 UTC We began working on a whitehat exploit.
Unfortunately, too many groups were doing independent research in parallel and rumors were spreading. At 16:44 UTC, we decided to release a public statement on affected versions.
By 18:32 UTC, we had a proof of concept exploit to be used in a potential whitehat recovery. bpak from Chainlight was also working on an exploit in parallel, and shared it at 19:06 UTC.
Five minutes later at 19:11 UTC, somebody else stole the funds.
The attack structure was largely different from either of our proofs of concept, so it was unlikely to have been a leak from our group. Regardless, this was pretty demoralizing.
Nevertheless, there was more ground to cover.
21:26 UTC Addison proposed an ambitious plan to recover the remaining assets in the CRVETH pool.
if you send like 30k crv to the crv/eth pool
you can then update admin fee
and then the crv/eth rate is like .15 eth per crv
so you can basically drain whole pool for few hundred K crv
21:52 UTC bpak had produced a working proof of concept which could recover 3100 ETH.
Ten minutes later at 22:02 UTC, we were beaten again. By some freak concidence, the CRV admin fee bot had claimed fees and the pool was drained1.
Blame
Blame is a strong word. It's not productive to point fingers. At the same time, I think it's useful to think about what could have went better.
Races
In both cases, whitehat efforts were beaten by less than half an hour. Sometimes every second really does count.
There likely could have been better preparation and resources for executing on these attacks. At the same time, this seems like a double-edged sword. Is it really a good idea to aggregate information related how to execute a hack? Who should we trust?
On the other hand, I think the process was quite efficient. We went from initial suspicions to identifying vulnerable variants in 2 hours and 4 minutes.
Information Leakage
I was both an auditor and a whitehat.
There's a strong culture of publishing in auditing. We're paid for technical thought leadership and deep understanding of vulnerabilities. One way to demonstrate this is by publishing the "scoop" on hacks in the wild. Researchers cost a lot and the return on investment is publicity.
On the other hand, there's a compelling argument that early disclosure of the affected versions had a material impact on the whitehat recovery.
Half an hour more could have saved $18M.
Auditors don't pay for externalities created by their reporting. Instead, they get rewarded with likes, retweets, and publicity. Seems like a hard problem.
Next Steps
I disagree with takes like "we need formal verification to solve this". This bug could have been caught with a unit test. Formal verification is very useful for many bug classes, but I'm not convinced it's as useful for relatively simple, non-optimizing compilers.
It's important to note that this bug was patched since November 2021.
I think this Vyper 0day is less about the skill of the Vyper team or the language itself but more about processes.
The bug was a fixed many versions of Vyper ago, the actual oversight was not realizing the potential impact to projects at the time it was fixed.
Unfortunately, public goods get easily forgotten. With immutable contracts, projects can have implicit dependencies on code written years ago. Protocol developers and security experts should stay up to date on security developments across the entire execution stack.
Footnotes
- Thankfully, these funds were later returned. ↩