Jun 10, 2025

Cosmos Security: An Otter's Guide

From infinite loops and map determinism to AnteHandler missteps and storage key collisions, we highlight real-world vulnerabilities and actionable advice for building safer Cosmos-based projects.

Heading image of Cosmos Security: An Otter's Guide

Introduction

The Cosmos SDK is an "L1 toolkit" for developers. It provides an open-source tool that enhances the ability to build application-specific L1 chains, all while prioritizing flexibility and control over the entire runtime environment. Unfortunately, with the convenience of the Cosmos SDK, security can be an oversight.

In this comprehensive blog post, we break down security issues that are often overseen by developers, supported by real-world examples from live projects. Our goal is to provide a practical exploration of security vulnerabilities while also offering insights on how developers can identify and address these issues on their own.

It's Loopin' Time

There are notable differences in building app-specific L1s using the SDK and building contracts on established L1 chains. It is especially crucial to recognize that maintaining the stability of a blockchain is dependent on the developer.

Below, we begin to demonstrate the differences between writing smart contracts with Solidity vs developing L1 with the Cosmos SDK.

Here is a simple example for reference:

function sumWithStride(
    uint64 start,
    uint64 stride,
    uint64[] memory arr
) public returns (uint64) {
    uint64 idx = start;
    uint64 sum = 0;
    uint64 end = arr.length;

    while (idx < end) {
        sum += arr[idx];
        idx += stride;
    }
    return sum;
}
type MsgSumWithStrideParams struct {
    Start uint64
    Stride uint64
    Arr []uint64
}

type MsgSumWithStrideResponse struct {
    Sum uint64
}

func (ms msgServer) SumWithStride(
    goCtx context.Context,
    msg *MsgSumWithStrideParams,
) (*MsgSumWithStrideResponse, error) {
    sum := uint64(0)
    end := uint64(len(msg.Arr))
    for idx := msg.Start; idx < end; idx += msg.Stride {
        sum += msg.Arr[idx]
    }
    return &MsgSumWithStrideResponse{Sum: sum}, nil
}

The provided Solidity / Cosmos snippets feature a public function that calculates the sums of an array using a provided starting idx and a stride. It is crucial to note that this function lacks robustness. A keen observer might have already identified that if the user supplies a stride value of 0, the code will result in an infinite loop.

While an infinite loop is not ideal for Solidity, it may still be tolerable. The underlying blockchain on which a smart contract operates is responsible for monitoring the gas and computation budget. It will intervene and terminate the execution at a certain point. Interestingly, those types of "unhandled error" patterns are quite common occurrences in contracts.

However, the same logic does not directly apply to Cosmos. In Cosmos, users are responsible for implementing the entire L1, and there is no underlying computation budget tracker that automatically stops code execution. As a result, any potential logic DoS or infinite loop can directly lead to the custom Cosmos L1 chain halting or stalling.

This toy scenario captures the importance of attention to error handling, edge cases, and overall robustness in Cosmos.

Real-World Examples

Now, let's examine a few real-world instances.

In the case of this CosmWasm bug, the helper method write_to_contract negligently calls the untrusted Wasm function "allocate".

Permalink for snippet

fn write_to_contract<A: BackendApi, S: Storage, Q: Querier>(
    env: &Environment<A, S, Q>,
    input: &[u8],
) -> VmResult<u32> {
    let out_size = to_u32(input.len())?;
    let result = env.call_function1("allocate", &[out_size.into()])?;
    let target_ptr = ref_to_u32(&result)?;
    if target_ptr == 0 {
        return Err(CommunicationError::zero_address().into());
    }
    write_region(&env.memory(), target_ptr, input)?;
    Ok(target_ptr)
}

As users have complete control over allocate, there is a possibility to call back write_to_contract repeatedly through other imported functions. This can result in the depletion of the host stack and ultimately lead to a DoS.

Additional real-world examples include not returning proper values for malformed txs.

Order Was the Dream of Man

Different from solidity, which is a domain-specific language for smart contracts, Golang is not. Therefore, developers must be mindful of specific footguns. One notable instance is non-determinism.

Consider a scenario where there is a requirement to emit an event for every entry in a map. It might be tempting to implement this as demonstrated below:

type ObjectMap map[string]string

func EmitEntries(objectMap ObjectMap) {
    for key, value := range objectMap {
        ctx.EventManager.EmitEvent(
            sdk.NewEvent(
                "MapContext",
                sdk.NewAttribute(key, value),
            )
        )
    }
}

It's important to note that Golang map iterators are unordered by design. As stated below in the Golang documentation citation, running the same code with different validators may result in varying event orders, potentially causing consensus problems.

When iterating over a map with a range loop, the iteration order is not specified and is not guaranteed to be the same from one iteration to the next.

To correctly implement iteration orders, developers must explicitly sort the keys of the map and then fetch the values using the sorted key array before emitting them.

type ObjectMap map[string]string

func EmitEntries(objectMap ObjectMap) {
    var keys []string
    for key := range objectMap {
        keys = append(keys, key)
    }
    sort.Strings(keys)

    for _, key := range keys {
        ctx.EventManager.EmitEvent(
            sdk.NewEvent(
                "MapContext",
                sdk.NewAttribute(key, objectMap[key]),
            )
        )
    }
}

The combination of hidden code within external Golang dependencies makes it difficult to avoid language-wise quirks fully. It is crucial to remain vigilant and avoid underestimating the gravity of this lingering bug class.

Real-World Examples

Real-world examples of map causing determinism problems can be found here, specifically, where the result of buildCommitInfo is inconsistent due to iteration over the rs.stores map.

Permalink for snippet

func (rs *Store) buildCommitInfo(
    version int64
) *types.CommitInfo {
    storeInfos := []types.StoreInfo{}
    for key, store := range rs.stores {
        if store.GetStoreType() == types.StoreTypeTransient {
            continue
        }
        storeInfos = append(storeInfos, types.StoreInfo{
            Name:     key.Name(),
            CommitId: store.LastCommitID(),
        })
    }
    return &types.CommitInfo{
        Version:    version,
        StoreInfos: storeInfos,
    }
}

Other factors contributing to determinism issues are the usage of time-sensitive functions and race conditions.

Thou Shalt Not Pass...Or Should You?

When developing smart contracts, it is common to delegate certain low-level tasks (such as parsing msg.value, msg.sender, and collecting transaction fees) to the underlying blockchain.

On Cosmos, there is no blockchain to rely on since it is the L1 itself! To simplify the development of middleware-like functionalities, Cosmos-SDK introduces AnteHandler decorators to help accomplish this. While there are pre-written decorators, all other data extraction from transactions and blockchain states must be carried out by the developers themselves.

To provide context, let's first understand how an AnteHandler is processed. Each AnteHandler is a state transition function that can:

  1. Transform the block state in relation to transaction and execution context.
  2. Determine the course of action for the transaction.
    1. Pass the transaction to the next AnteHandler.
    2. Return error for transaction.

The bad news is that developing an AnteHandler is not the easiest task. For instance, let's consider a scenario where we need to ensure all signers involved in a transaction have a balance greater than X at the time of transaction execution.

The AnteHandle implementation may look something like this:

const (
    MIN_BALANCE = 100
)

func (abd AccountBalanceDecorator) AnteHandle(
    ctx sdk.Context,
    tx sdk.Tx,
    simulate bool,
    next sdk.AnteHandler,
) (sdk.Context, error) {
    sigTx, ok := tx.(authsigning.SigVerifiableTx)
    if !ok {
        return ctx, errorsmod.Wrap(
            sdkerrors.ErrTxDecode,
            "invalid tx type",
        )
    }

    signers := sigTx.GetSigners()
    for i, signer := range signers {
        balance := abd.bk.getBalance(ctx, signer, ATOM)
        if balance.Amount < MIN_BALANCE {
            return ctx, errorsmod.Wrap(
                ErrInsufficientBalance,
                "Insufficient Balance",
            )
        }
    }

    return next(ctx, tx, simulate)
}

Where should this custom AnteHandler be placed relative to other AnteHandlers provided by cosmos-sdk? Considering that we are only concerned with transactions that satisfy our check, inserting it right after the SetUpContextDecorator should work, right?

Permalink for snippet

anteDecorators := []sdk.AnteDecorator{
    NewSetUpContextDecorator(), // outermost AnteDecorator. SetUpContext must be called first
    // INSERT HERE
    NewExtensionOptionsDecorator(options.ExtensionOptionChecker),
    NewValidateBasicDecorator(),
    NewTxTimeoutHeightDecorator(),
    NewValidateMemoDecorator(options.AccountKeeper),
    NewConsumeGasForTxSizeDecorator(options.AccountKeeper),
    NewDeductFeeDecorator(options.AccountKeeper, options.BankKeeper, options.FeegrantKeeper, options.TxFeeChecker),
    NewSetPubKeyDecorator(options.AccountKeeper), // SetPubKeyDecorator must be called before all signature verification decorators
    NewValidateSigCountDecorator(options.AccountKeeper),
    NewSigGasConsumeDecorator(options.AccountKeeper, options.SigGasConsumer),
    NewSigVerificationDecorator(options.AccountKeeper, options.SignModeHandler),
    NewIncrementSequenceDecorator(options.AccountKeeper),
}

Unfortunately, that order wouldn't work. This is because there are other AnteHandlers, such as SigGasConsumeDecorator and ConsumeGasForTxSizeDecorator, that modify account balances. By placing our decorator at the very start of the chain, we might pass the check and later have the signers' balances deducted before reaching the end of the decorator chain and starting transaction execution. Consequently, the invariance we intended to ensure may no longer hold, rendering our check useless.

The easiest "mitigation" is to move our decorator down into the chain list. We say this lightly because it's important to consider various factors such as whether nested msgs are allowed (e.g. the authz module is present), as this precaution alone might not be enough to fully resolve the issue. Without a comprehensive understanding of the entire system, there is a risk that mistakes will still be made in the AnteHandle chain.

Real-World Examples

An instance of AnteHandler misuse is a Theft of Fund bug that was exploited in a Cronos contract.

In this scenario, msgs are multiplexed to different AnteHandler sets through the user-controlled ExtensionOptionsEthereumTx option. However, due to a lack of tx validation, if a MsgEthereumTx does not have ExtensionOptionsEthereumTx specified, it will be routed to non-Ethereum AnteHandlers, failing to collect fees from users as intended. Consequently, attackers can exploit the fee refund at the end of transaction processing to steal funds.

Permalink for snippet

func NewAnteHandler(
    ak evmtypes.AccountKeeper,
    bankKeeper evmtypes.BankKeeper,
    evmKeeper EVMKeeper,
    feeGrantKeeper authante.FeegrantKeeper,
    channelKeeper channelkeeper.Keeper,
    signModeHandler authsigning.SignModeHandler,
) sdk.AnteHandler {
    return func(
        ctx sdk.Context, tx sdk.Tx, sim bool,
    ) (newCtx sdk.Context, err error) {
        var anteHandler sdk.AnteHandler

        defer Recover(ctx.Logger(), &err)

        txWithExtensions, ok := tx.(authante.HasExtensionOptionsTx)
        if ok {
            opts := txWithExtensions.GetExtensionOptions()
            if len(opts) > 0 {
                switch typeURL := opts[0].GetTypeUrl(); typeURL {
                case "/ethermint.evm.v1.ExtensionOptionsEthereumTx":
                    // handle as *evmtypes.MsgEthereumTx

                    anteHandler = sdk.ChainAnteDecorators(
                        NewEthSetUpContextDecorator(), // outermost AnteDecorator. SetUpContext must be called first
                        ...
                        NewEthIncrementSenderSequenceDecorator(ak), // innermost AnteDecorator.
                    )

                default:
                    return ctx, stacktrace.Propagate(
                        sdkerrors.Wrap(sdkerrors.ErrUnknownExtensionOptions, typeURL),
                        "rejecting tx with unsupported extension option",
                    )
                }

                return anteHandler(ctx, tx, sim)
            }
        }

        // SHOULD CHECK TX IS NOT MsgEthereumTx HERE

        switch tx.(type) {
        case sdk.Tx:
            anteHandler = sdk.ChainAnteDecorators(
                authante.NewSetUpContextDecorator(), // outermost AnteDecorator. SetUpContext must be called first
                 ...
                authante.NewIncrementSequenceDecorator(ak), // innermost AnteDecorator
            )
        default:
            return ctx, stacktrace.Propagate(
                sdkerrors.Wrapf(sdkerrors.ErrUnknownRequest, "invalid transaction type: %T", tx),
                "transaction is not an SDK tx",
            )
        }

        return anteHandler(ctx, tx, sim)
    }
}

Additional examples of incorrect AnteHandler usage include yet more bypassable checks and loss of funds and incorrect data passing between blockchains.

Errors? Panics? I can handle it

Smart contract developers are used to not properly handling errors. This is acceptable since most underlying blockchains revert all state changes when execution fails.

Cosmos is designed to provide a similar experience. Whenever some message handler returns an error, changes to the persistent state are dropped. Panics are handled similarly, where a recovery handler is wrapped around the message execution to convert panics into errors for a downstream process.

This design is pretty neat and allows developers to write code in a rather lazy way. For instance, the following code works perfectly fine. If k.keeper.TotalReward() returns zero, the msg execution will simply rollback as if nothing has happened.

func (k msgServer) AllocateReward(
    goCtx context.Context,
    msg *types.MsgAllocateReward)
(*types.MsgAllocatRewardResponse, error) {

    RewardPerShare := k.keeper.Shares() /  k.keeper.TotalReward()
    k.keeper.DistributeReward(RewardPerShare)

    return &types.MsgAllocateRewardResponse, nil
}

However, the same assumption does not always hold. Certain parts of Cosmos, such as PreBlocker, BeginBlocker, and EndBlocker, are not protected by the error handling mechanism. So, if we move the reward distribution logic into BeginBlocker to automatically distribute rewards at the start of each block, panics raised by division by 0 will halt the chain.

func BeginBlocker(ctx context.Context, keeper keeper.Keeper) error {

    RewardPerShare := keeper.Shares() /  keeper.TotalReward()
    keeper.DistributeReward(RewardPerShare)

 return nil
}

Real-World Examples

Recently, developers have become increasingly aware of unprotected ABCI functions, but this doesn't stop DoS bugs from manifesting. So what is the catch?

The problem lies in the lack of proper understanding of utility functions. The example here implements a bridge that mints wrapped BTC tokens in the PreBlocker when bridging events are observed. Notably, errors returned by bankKeeper.SendCoinsFromModuleToAccount will be bubbled up through PreBlocker and halt the chain. It turns out an attacker can force SendCoinsFromModuleToAccount to return an error by setting recipient to some BlockedAddr,rendering the code susceptible to DoS attacks.

Permalink for snippet

func (pbh *PreBlockHandler) PreBlocker() sdk.PreBlocker {
    return func(
        ctx sdk.Context,
        req *cmtabci.RequestFinalizeBlock,
    ) (*sdk.ResponsePreBlock, error) {
        ...
        err := pbh.bridgeKeeper.AcceptAssetsLocked(ctx, events)
        if err != nil {
            return nil, fmt.Errorf("cannot accept AssetsLocked events: %w", err)
        }
        ...
    }
}

func (k Keeper) AcceptAssetsLocked(
    ctx sdk.Context,
    events types.AssetsLockedEvents,
) error {
    ...
    for _, event := range events {
        recipient, err := sdk.AccAddressFromBech32(event.Recipient)
        if err != nil {
            return fmt.Errorf("failed to parse recipient address: %w", err)
        }

        if bytes.Equal(event.TokenBytes(), sourceBTCToken) {
            err = k.mintBTC(ctx, recipient, event.Amount)
            if err != nil {
                return fmt.Errorf(
                    "failed to mint BTC for event %v: %w",
                    event.Sequence,
                    err,
                )
            }
        } else {
            ...
        }
    }
    ...
}

func (k Keeper) mintBTC(
    ctx sdk.Context,
    recipient sdk.AccAddress,
    amount math.Int,
) error {
    ...
    err = k.bankKeeper.SendCoinsFromModuleToAccount(
        ctx,
        types.ModuleName,
        recipient,
        coins,
    )
    if err != nil {
        return fmt.Errorf("failed to send coins: %w", err)
    }
    ...
}
func (k BaseKeeper) SendCoinsFromModuleToAccount(
 ctx context.Context, senderModule string, recipientAddr sdk.AccAddress, amt sdk.Coins,
) error {
 ...
 if k.BlockedAddr(recipientAddr) {
  return errorsmod.Wrapf(sdkerrors.ErrUnauthorized, "%s is not allowed to receive funds", recipientAddr)
 }
 ...
}

This shows even well-known bug classes still resurface from time to time due to unforeseen invariant violations. Additional examples include improper decimal handling in the group module.

Same, Same... But Different

Cosmos exposes several consensus-level interfaces, such as PrepareProposal, ProcessProposal, ExtendVote, and VerifyVoteExtension. These ABCI methods allow developers to customize how blocks are constructed, as well as inject supplementary data into each block.

Two of the best-known attack surfaces are

  1. PrepareProposal (ExtendVote) outputs being rejected due to ProcessProposal (VerifyVoteExtension) over-validating, resulting in liveness failures.
  2. Malicious proposals and vote extensions not created through the PrepareProposal (ExtendVote) are accepted due to ProcessProposal (VerifyVoteExtension) under-validating.

In essence, any difference in pairs of handlers will likely lead to security issues.

There are also a few lesser known variants of these issues. One instance is the validation of VoteExtensions within PrepareProposal. To provide context, we start with a primer on the CometBTF consensus and vote extensions.

Consensus starts with a leader creating a proposal and then broadcasting it to each validator. Validators then proceed to vote on whether or not to accept the proposal. During the voting phase, ExtendVote is called to attach additional data to the votes. Once a validator collects enough valid votes that pass VerifyVoteExtension, a proposal is considered accepted and can be committed. After committing the proposal, a new leader starts to create the next proposal, bringing us back to the point where we started.

So, where are the attached vote extension data used? It turns out a leader should include the vote extensions of the previous consensus round in its proposal. It might be tempting to conclude that all vote extensions an honest leader accepted have passed the VerifyVoteExtension check and are therefore valid. Thus, we can directly inject all vote extensions into our proposal.

Unfortunately, CometBTF directly accepts late precommits without passing them through VerifyVoteExtension. This exposes a time window where Byzantine validators can smuggle malicious vote into the next leader's cache, luring the leader into including invalid vote extensions into its Proposal.

func (cs *State) addVote(vote *types.Vote, peerID p2p.ID) (added bool, err error) {
    ...

    // A precommit for the previous height?
    // These come in while we wait timeoutCommit
    if vote.Height+1 == cs.Height && vote.Type == types.PrecommitType {
        ...
        // Late precommits are not checked by VerifyVoteExtension
        added, err = cs.LastCommit.AddVote(vote)
        ...
        return added, err
    }
    extEnabled := cs.state.ConsensusParams.Feature.VoteExtensionsEnabled(vote.Height)
    if extEnabled {
        ...
        if vote.Type == types.PrecommitType && !vote.BlockID.IsNil() &&
            !bytes.Equal(vote.ValidatorAddress, myAddr) { // Skip the VerifyVoteExtension call if the vote was issued by this validator.
            ...
            err := cs.blockExec.VerifyVoteExtension(context.TODO(), vote)
            ...
        }
    } else if {
        ...
    }
    ...
}

If developers are not aware of the subtle details regarding vote extension handling in CometBTF, it is quite easy to overlook implementing protections against these attacks.

Real-World Examples

An example of the bug we just described is shown here. PrepareProposal only checks that each vote is properly signed by a validator in ValidateVoteExtension but does not verify it against the rules in VerifyVoteExtention. Therefore leaving the leader vulnerable to accepting malicious vote extensions in their proposals.

Permalink for snippet

func (h *Handlers) PrepareProposalHandler() sdk.PrepareProposalHandler {
    return func(ctx sdk.Context, req *abcitypes.RequestPrepareProposal) (*abcitypes.ResponsePrepareProposal, error) {
        ...
        var injection []byte
        if req.Height > ctx.ConsensusParams().Abci.VoteExtensionsEnableHeight && collectSigs {
            //Fails to verify vote extensions with VerifyVoteExtension rules
            err := baseapp.ValidateVoteExtensions(ctx, h.stakingKeeper, req.Height, ctx.ChainID(), req.LocalLastCommit)
            if err != nil {
                return nil, err
            }
            injection, err = json.Marshal(req.LocalLastCommit)
            if err != nil {
                h.logger.Error("failed to marshal extended votes", "err", err)
                return nil, err
            }
            ...
        }
        defaultRes, err := h.defaultPrepareProposal(ctx, req)
        ...
        proposalTxs := defaultRes.Txs
        if injection != nil {
            proposalTxs = append([][]byte{injection}, proposalTxs...)
            h.logger.Debug("injected local last commit", "height", req.Height)
        }
        return &abcitypes.ResponsePrepareProposal{
            Txs: proposalTxs,
        }, nil
    }
}

Aside from the more complex variant, pure validation mismatches are also still prevalent despite being a well-known attack surface. This stems from Proposal (Vote) rejections by various obscure checks hidden within CometBTF. For example, this commit fixes a bug where PrepareProposal may return a Proposal larger than MaxTxBytes, which will later get rejected by CometBTF.

The Keymaker

States (persistent storage) are another crucial component in state machines. Cosmos relies on a custom key-value storage calledKVStore to handle states efficently. In KVStore, keys and values are both represented as simple byte slices, requiring developers to handle serialization and deserialization of more intricate structures when working with storage.

The complexity behind proper data serialization often results in flawed code and security vulnerabilities. Below, we showcase relatively simple (but buggy) implementations and progressively address and mitigate the issues until the code is deemed safe from exploits.

Let's start by considering a scenario where we need to store the positionMap structure mentioned below into storage.

type VaultId uint64
type Username string
type PositionName string
type Position struct {
    data []byte
}
type PositionMap :=
    map[VaultId]map[Username]map[PositionName]Position

Given that there are two levels of keys in PositionMap, we should try to serialize these three map keys into a hierarchically searchable storage key. The most straightforward mitigation is to convert all fields into strings and concat them together.

storageKey := fmt.Sprintf(
    "%d%s%s",
    vaultId,
    username,
    positionName,
)

Although plain concatenation allows us to easily construct a storage key, it becomes apparent that this implementation is prone to key collisions.

vaultId = 1,  username = "2a", positionName = "b"
    => storageKey = "12ab"

vaultId = 12, username = "a",  positionName = "b"
    => storageKey = "12ab"

So, how can we mitigate this issue? Perhaps we can add a field separator between each field, which would resemble the following:

const (
    Seperator = "|"
)

storageKey := fmt.Sprintf(
    "%d%s%s%s%s",
    vaultId,
    Seperator,
    username,
    Seperator,
    positionName,
)

Inserting a separator helps prevent most accidental collisions, but does it completely solve the problem?

Sadly, it doesn't. Since the username and vaultName are both strings that may contain arbitrary characters (including the separator), collisions can still happen.

vaultId = 1, username = "a|", positionName = "b"
    => storageKey = "1|a||b"

vaultId = 1, username = "a",  positionName = "|b"
    => storageKey = "1|a||b"

To further mitigate this, we could encode all fields to ensure that the separator is excluded in individual fields, thus making field injections impossible.

const (
    Seperator = "|"
)


usernameEncoded := make(
    []byte,
    hex.EncodedLen(len(username)),
)
hex.Encode(usernameEncoded, username)

positionNameEncoded := make(
    []byte,
    hex.EncodedLen(len(positionName)),
)
hex.Encode(positionNameEncoded, positionName)

storageKey := fmt.Sprintf(
    "%d%s%s%s%s",
    vaultId,
    Seperator,
    usernameEncoded,
    Seperator,
    positionNameEncoded
)

We did it. We finally eliminated all potential storageKey collisions.

Until now, our focus has primarily been on storing a single structure. We recognize that in real-world applications, we frequently encounter scenarios where multiple structures must be stored as persistent states.

In the Cosmos framework, it is common for each Module to own a few KVStore and have individual Keepers managing access to storages. It's also important to note that each KVStore should be independent from one another, alleviating developers from having to worry about key collisions between different Modules.

With that being said, what if we have to maintain more than one structure within the same KVStore?

To demonstrate this scenario, we introduce the NameToAddressMap structure, which will be stored in the same KVStore we previously used.

type VaultId uint64
type Username string

type PositionName string
type Position struct {
    data []byte
}
type PositionMap :=
    map[VaultId]map[Username]map[PositionName]Position

type AddressName string
type Address struct {
 data []byte
}
type AddressMap :=
    map[VaultId]map[Username]map[AddressName]Address

Referencing previous examples, it is necessary to sanitize/encode each key field and add seperators between fields to prevent key collisions. By putting these measures into practice, we present the following implementation below:

const (
    Seperator = "|"
)


func PositionMapKey(
    vaultId uint64,
    username, positionName []byte,
) (key []byte) {
    usernameEncoded := make(
        []byte,
        hex.EncodedLen(len(username)),
    )
    hex.Encode(usernameEncoded, username)

    positionNameEncoded := make(
        []byte,
        hex.EncodedLen(len(positionName)),
    )
    hex.Encode(positionNameEncoded, positionName)

    key := fmt.Sprintf(
        "%d%s%s%s%s",
        vaultId,
        Seperator,
        usernameEncoded,
        Seperator,
        positionNameEncoded,
    )
}


func AddressMapKey(
    vaultId uint64,
    username, addressName []byte
) (key []byte) {
    usernameEncoded := make(
        []byte,
        hex.EncodedLen(len(username)),
    )
    hex.Encode(usernameEncoded, username)

    addressNameEncoded := make(
        []byte,
        hex.EncodedLen(len(addressName)),
    )
    hex.Encode(addressNameEncoded, addressName)

    key := fmt.Sprintf(
        "%d%s%s%s%s",
        vaultId,
        Seperator,
        usernameEncoded,
        Seperator,
        addressNameEncoded,
    )
}

Unfortunately, when dealing with more than one storage entry within the same KVStore, the previous implementation is not enough to guarantee key uniqueness. While it still effectively prevents key collisions within each individual structure, it does not prevent cross-structure key collisions.

vaultId = 1, username = "a", positionName = "b"
    => PositionMapKey = "1|a|b"

vaultId = 1, username = "a", addressName = "b"
    => AddressMapKey = "1|a||b"

To prevent this, add a structure-specific prefix to the start of each key to act as a domain separator.

const (
    Seperator = "|"
    PositionMapPrefix = "\x01"
    AddressMapPrefix = "\x02"
)


func PositionMapKey(
    vaultId uint64,
    username, positionName []byte,
) (key []byte) {
    usernameEncoded := make(
        []byte,
        hex.EncodedLen(len(username)),
    )
    hex.Encode(usernameEncoded, username)

    positionNameEncoded := make(
        []byte,
        hex.EncodedLen(len(positionName)),
    )
    hex.Encode(positionNameEncoded, positionName)

    key := fmt.Sprintf(
        "%s%d%s%s%s%s",
        PositionMapPrefix,
        vaultId,
        Seperator,
        usernameEncoded,
        Seperator,
        positionNameEncoded,
    )
}


func AddressMapKey(
    vaultId uint64,
    username, addressName []byte,
) (key []byte) {
    usernameEncoded := make(
        []byte,
        hex.EncodedLen(len(username)),
    )
    hex.Encode(usernameEncoded, username)

    addressNameEncoded := make(
        []byte,
        hex.EncodedLen(len(addressName)),
    )
    hex.Encode(addressNameEncoded, addressName)

    key := fmt.Sprintf(
        "%s%d%s%s%s%s",
        AddressMapPrefix,
        vaultId,
        Seperator,
        usernameEncoded,
        Seperator,
        addressNameEncoded,
    )
}

We now have a proper example of how to serialize storage keys.

Nonetheless, there is more to storage than just this. As previously mentioned, storages are expected to support their original functionalities. In the case of map, data should still be retrievable through original keys.

Let's look at a case where we want to retrieve all map[Username]map[PositionName]Position associated with a VaultId from the storage. How can we safely accomplish this?

Fortunately, the Cosmos-SDK provides APIs to fetch all entries associated with a storageKey prefix. Below is an example of an attempt to fetch data with vaultId:

func FetchPositionMapWithVaultId(
    vaultId uint64,
) ([]map[Username]map[PositionName]Position) {
    values := map[Username]map[PositionName]Position{}
    i := sdk.KVStorePrefixIterator(
        kvStore,
        fmt.Sprintf("%s%d", PositionMapPrefix, vaultId)
    )
    for ; i.Valid(); i.Next() {
        k := strings.split(i.Key(), Seperator)

        username := make([]byte, hex.DecodedLen(k[0]))
        _, err := hex.Decode(username, k[0])
        if err != nil {
            return nil, err
        }

        positionName := make([]byte, hex.DecodedLen(k[1]))
        _, err := hex.Decode(positionName, k[1])
        if err != nil {
            return nil, err
        }

        if entry, ok := values[username]; !ok {
            values[username] = make(map[PositionName])
        }

        values[username][positionName] = Position {
            data: iterator.Value(),
        }
    }
    return values
}

By now, you may have already noticed that this implementation suffers from field malleability issues. Imagine a scenario where both vaultId = 1 and vaultId = 10 coexist. If we try to fetch data under vaultId = 1, all entries under vaultId = 10 will also be returned simply because 1 is a prefix of 10. To fix this, we must once again append the Separator to the iterator prefix.

i := sdk.KVStorePrefixIterator(
    kvStore,
    fmt.Sprintf("%s%d%s", PositionMapPrefix, vaultId, Seperator),
)

At first, identifying these serialization issues may seem easy. Once data structures and KVStore usage grow increasingly more complex, developers can unintentionally overlook storage key parsing mistakes.

Storage keys continue to be a tedious and persistent issue when building on Cosmos. It is crucial to approach development with awareness and care to prevent bugs from creeping into code.

Real-World Examples

The Cosmos-SDK previously lacked protection against KVStore key collisions. This prior oversight allowed developers to unintentionally create two KVStores that were not independent of each other.

Permalink for snippet

func NewKVStoreKeys(names ...string) map[string]*KVStoreKey {
    keys := make(map[string]*KVStoreKey)
    for _, name := range names {
        keys[name] = NewKVStoreKey(name)
    }

    return keys
}

Thanks to the diligence of core developers, checks are now enforced and the Cosmos-SDK will refuse to run if any KVStore keys are prefix of each other. This implementation alleviates developers from having to worry about key collisions on the KVStore level.

Additional storage key issues like subtle bugs in the Cosmos-SDK have resulted in incorrect iterator behavior.

Notably, gradual adoption of the collections storage helpers since Cosmos v0.50 has made it a lot more difficult to write buggy code. This demonstrates the importance of keeping up to date with the latest SDK development to leverage architectural security improvements.

Conclusion

The Cosmos SDK is a powerful tool for those who want to create custom blockchains. However, this flexibility brings about great responsibility. Developers must pay close attention to nuances, as these can expose a large number of potential attack surfaces.

To recap, we discussed some of the more basic parts of Cosmos-SDK, showcasing common mistakes developers tend to make. Yet, it is important to note that we've only covered the tip of the iceberg. Other attack surfaces, such as authentications in relation to the IBC interface, are fundamentals absolutely worth looking into.