This blog post is meant as a security focused introduction to Solana written. We explore how exactly Solana's runtime operates, what degree of control an attacker has, and any relevant security boundaries. That being said, we believe this is an important resource for all developers. With vulnerabilities putting millions in assets at risk, understanding what happens under the hood, even in passing, is crucial to writing safer code.
As an aside, we would recommend also checking out Neodyme's Security Workshop. This presents a good introduction to many different Solana vulnerability classes and is good for diving right in. In contrast, this blog post aims start from fundamentals, presenting an overview of the Solana execution model from a security researcher's perspective. With a strong understanding of how Solana contracts are executed, we believe that an astute researcher should be able to independently find such vulnerability classes -- and more.
Execution Model
Understanding is crucial to finding vulnerabilities. When approaching a new contract, you should strive for a deep understanding of the contract, its various interactions, and any implicit assumptions.
Hence, the first step is to understand how Solana programs even work. The Solana documentation on programming model is another good resource.
Solana programs are eBPF ELF files which are then loaded by an onchain program, the BPF loader. The ELF data is stored as part of an account onchain. An account can be thought of as a file, where the name of the account is the pubkey. Accounts in Solana are referenced by their pubkey. This relationship between pubkeys and accounts is a one-to-one mapping. Like files, accounts can have both data - arbitrary raw bytes - and additional metadata such as writable or executable that we will cover later. For more information on accounts, see the documentation.
At a high level, interactions with onchain programs happen in the form of a program invocation. A Solana invocation specifies the following information
- the program to call
- a list of accounts
- a list of bytes, instruction data
Note how this execution model has no conception of methods. While naturally it might be useful to have different methods to call to perform different functions, this is not implemented at the execution level. Instead, this is done by parsing and interpreting the instruction data, for example, with an enum.
The memory map is predefined and is as follows:
- Program code starts at 0x100000000
- Stack data starts at 0x200000000
- Heap data starts at 0x300000000
- Program input parameters start at 0x400000000
Because most Solana programs are written in Rust, memory corruption is not a very common issue. Hence, we won't be diving further into memory layout or other lower level details.
BPF programs define a common entrypoint.
#[no_mangle]
pub unsafe extern "C" fn entrypoint(input: *mut u8) -> u64;
This method will be called with a binary blob representing the serialized instruction and account data. Solana also provides utility functions to deserialize this data.
pub unsafe fn deserialize<'a>(input: *mut u8) -> (&'a Pubkey, Vec<AccountInfo<'a>>, &'a [u8]) {
let mut offset: usize = 0;
// Number of accounts present
#[allow(clippy::cast_ptr_alignment)]
let num_accounts = *(input.add(offset) as *const u64) as usize;
offset += size_of::<u64>();
Note that this binary blob is not attacker controlled. Instead, this data is serialized by another program called the BPF Loader which is part of the Solana runtime.
/// Deserialize the input arguments
///
/// The integer arithmetic in this method is safe when called on a buffer that was
/// serialized by runtime. Use with buffers serialized otherwise is unsupported and
/// done at one's own risk.
That being said, an attacker has a very large degree of control over this data. Let's take a closer look at what exactly this data is.
Recall the return signature of the deserialize
function.
(&'a Pubkey, Vec<AccountInfo<'a>>, &'a [u8])
The first part of the tuple represents the id of the running program. This is often used to check for proper ownership of accounts. For example, it's generally only safe to operate on accounts that you own to ensure data integrity.
The second part of the tuple is a list of account information.
pub struct AccountInfo<'a> {
pub key: &'a Pubkey,
pub is_signer: bool,
pub is_writable: bool,
pub lamports: Rc<RefCell<&'a mut u64>>,
pub data: Rc<RefCell<&'a mut [u8]>>,
pub owner: &'a Pubkey,
pub executable: bool,
pub rent_epoch: Epoch,
}
This is the associated metadata with any given account. In other words, this represents all of the information that a Solana onchain program can know about an account. Some of the more important fields include:
- key: The pubkey corresponding to this account
- is_signer: If there is a signature for this account, often used to allow privileged operations. See the docs.
- is_writable: If the account can be modified, both with respect to lamports and account data
- lamports: A lamport is 0.000000001 sol. Accounts have an associated amount of lamports.
While which accounts to pass in are entirely attacker controlled, the metadata associated with the account is not. For example, you can only specify an account as is_signer
if you can generate a valid signature for it. Account data is also only modifiable by the account owner.
Instruction data is a list of bytes and is entirely attacker controlled.
To summarize, we are able to call the entrypoint
of any program with:
- Any accounts we choose
- Any instruction data
You might already see how type-confusion is (or was before Anchor) a huge issue on Solana. Because there is no execution-level typing of accounts, it's very easy for a malicious user to pass in an account of the wrong type. Some solutions to implementing type information include hardcoding the pubkey of the account, storing a type tag, or both.
Onchain Programs
Solana also provides a number of native programs. An important thing to remember is that these native programs operate at a less privileged level than the execution model. In other words, any restrictions imposed by the execution model, such as not being able to write to readonly accounts, applies to the native programs as well.
A full list of programs can be found here.
The primary native program you'll likely interact with is the System Program.
11111111111111111111111111111111
These instructions are processed in system_instruction_processor.rs
match instruction {
SystemInstruction::CreateAccount {
lamports,
space,
owner,
} => {
A full list of the available instructions can be found by reading the source.
In to demonstrate some of the previous concepts, we will go over two instructions, CreateAccount
and Transfer
.
CreateAccount
requires the lamports, space, and owner to initialize the account with.
pub fn create_account(
from_pubkey: &Pubkey,
to_pubkey: &Pubkey,
lamports: u64,
space: u64,
owner: &Pubkey,
) -> Instruction {
let account_metas = vec![
AccountMeta::new(*from_pubkey, true),
AccountMeta::new(*to_pubkey, true),
];
Instruction::new_with_bincode(
system_program::id(),
&SystemInstruction::CreateAccount {
lamports,
space,
owner: *owner,
},
account_metas,
)
}
Both the from and to pubkeys are specified as signers (the boolean passed into AccountMeta). This means that we need to have a valid signature for both.
AccountMeta::new(*from_pubkey, /*is_signer=*/true),
AccountMeta::new(*to_pubkey, true),
This is because the lamports
used to create the account will come from the from_account
, thus lowering the balance of that account and requiring a signature.
Internally CreateAccount will call into allocate_and_assign
, which allocates space for the account and assigns ownership.
allocate(invoke_context, signers, to_account, to_address, space)?;
assign(invoke_context, signers, to_account, to_address, owner)
Note that you can also do these steps separately with the Assign
and Allocate
instructions.
Assign {
owner: Pubkey,
},
...
Allocate {
space: u64,
},
In contrast, consider the instruction for Transfer
.
pub fn transfer(from_pubkey: &Pubkey, to_pubkey: &Pubkey, lamports: u64) -> Instruction {
let account_metas = vec![
AccountMeta::new(*from_pubkey, true),
AccountMeta::new(*to_pubkey, false),
];
Instruction::new_with_bincode(
system_program::id(),
&SystemInstruction::Transfer { lamports },
account_metas,
)
}
Note how only from_pubkey
needs a signature. Intuitively this makes sense because we don't need permission to transfer funds into an account (at least in Solana's permission model).
The transfer
implementation in system_instruction_processor.rs
indeed checks for the signature.
if !instruction_context
.is_signer(instruction_context.get_number_of_program_accounts() + from_account_index)?
{
ic_msg!(
invoke_context,
"Transfer: `from` account {} must sign",
instruction_context.get_instruction_account_key(
invoke_context.transaction_context,
from_account_index
)?,
);
return Err(InstructionError::MissingRequiredSignature);
}
Note that because the System Program runs at a higher level than the execution runtime, the System Program can only transfer funds out of accounts that it owns. In other words, transfer
does not work on arbitrary Solana accounts, even if you can generate a valid signature for it. By default all user pubkeys are owned by the System Program.
$ solana account 6ZRCB7AAqGre6c72PRz3MHLC73VMYvJ8bi9KHf1HFpNk
Public Key: 6ZRCB7AAqGre6c72PRz3MHLC73VMYvJ8bi9KHf1HFpNk
Balance: 1152042.467511623 SOL
Owner: 11111111111111111111111111111111
Executable: false
Rent Epoch: 288
An example of an account which is not owned by the System Program are the token accounts which store how many tokens a user has.
$ solana account 4W4dtYi4rXTegfR6byhmP17VJo4MaUWWCt3zzLnZVKGZ
Public Key: 4W4dtYi4rXTegfR6byhmP17VJo4MaUWWCt3zzLnZVKGZ
Balance: 0.00203928 SOL
Owner: TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA
Executable: false
Rent Epoch: 288
Length: 165 (0xa5) bytes
0000: 06 83 10 86 1a 98 32 7d 05 50 57 4d 84 41 8a a6 ......2}.PWM.A..
0010: e1 0c 33 52 dd aa 7f d7 f5 81 52 cc ee b2 38 87 ..3R......R...8.
0020: 52 98 60 10 57 37 39 df 4b 58 ba 50 e3 9c f3 f3 R.`.W79.KX.P....
0030: 35 b8 9c c7 d1 cb 1d 32 b5 de 04 ef a0 68 c9 39 5......2.....h.9
0040: 40 0d 03 00 00 00 00 00 00 00 00 00 00 00 00 00 @...............
0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0060: 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 ................
0070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0090: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00a0: 00 00 00 00 00
This is an example of a program derived addresses.
An astute reader might also notice that the original pubkey is embedded in the token account data starting at position 0x20. Intuitively the token account must store it's owner somewhere, and this is often done by storing the owner pubkey in the account data.
Closing Thoughts
There are many intricacies to Solana's programming model which we will perhaps explore in future blog posts. For example, how do user accounts created with solana-keygen
work? What exactly is the account validation scheme?
While it may be tempting to treat Solana as a blackbox, the code itself is all open source. We believe a crucial part of security is building a deep understanding of the underlying system. This requires digging into the Solana runtime. We hope this and future posts present an interesting perspective on the Solana runtime.
Please reach out if you found this useful or have any additional thoughts to share.