I noticed core-rs stores state tree prefixes in hexadecimal serialization both on-chain and in the database. So, instead of storing hex path 0x0000…, it would store 0x30303030…
While it is a simpler way of storing a sequence of nibbles, it is obviously not as efficient as it could be. For example, the in-memory representation of “AddressNibbles” uses contiguous sequences of nibbles with an 8-bit length indicator. Ethereum goes even further and uses a 4-bit odd/even indicator.
I’m going to explore a few ways how we can improve efficiency, in this post.
If time allows it, I’ll write up some proof-of-concept Go code.
A more efficient encoding for the database
I can imagine historical reasons for storing prefixes with hex encoding. Perhaps browser engines imposed restrictions on core-js that only ASCII-compatible keys can be stored in the database. Since all the database code now directly depends on LMDB, I don’t think this is a problem anymore.
Whether changing the key encoding on Nimiq 1.0 nodes is worth doing is debatable. core-rs nodes have hit production usage already and such a change would require a full re-sync or complex upgrading code.
However, for Nimiq 2.0 (core-rs-albatross) it seems to make sense. There is no blockchain to lose and it’s in heavy development anyways. Cutting the key size in half could allow for some good performance gains.
A comment by @paberr in the core-rs database code supports this, by the way
impl AsDatabaseBytes for AddressNibbles {
fn as_database_bytes(&self) -> Cow<[u8]> {
// TODO: Improve AddressNibbles, so that no serialization is needed.
let v = Serialize::serialize_to_vec(&self);
Cow::Owned(v)
}
}
Here are a few encoding suggestions.
Prefixing a 4-bit odd/even indicator
- The first nibble (“prefix”) is the indicator.
- Prefix
0b0000
/0x0
means an odd number of nibbles follows, - Prefix
0b1000
/0x8
means an even number of nibbles follows. - A single
0x0
nibble is used as padding at the end.
Example encodings:
-
[]
=>80
-
[1]
=>01
-
[8, 4]
=>8840
-
[8, 8, 4]
=>0884
Converting between the compressed form and the nibble list is fairly simple,
and this encoding is space-efficient. However, it would create an unnatural
split on the key index, and make in-order iteration impossible.
Suffixing a 4-bit odd/even indicator
- The last nibble (“suffix”) is the indicator.
- Suffix
0b0000
/0x0
means an odd number of nibbles precedes, - Suffix
0b11111111
/0xFF
means an even number of nibbles precedes.
Example encodings:
-
[]
=>FF
-
[1]
=>10
-
[8, 4]
=>84FF
-
[8, 8, 4]
=>8840
Conversion is slightly more complicated here in this case.
I like this one the most.
Prefixing the 8-bit length
This encoding is a bit more exotic, but it would allow for efficiently iterating the list of accounts/leaves in-order, without touching branches (select by prefix 0x28...
).
It’s probably also less efficient that the other proposed encodings.
Changing the “on-chain” tree
Changing the accounts tree itself is probably not productive.
However it might decrease the amount of data sent over the wire when doing a full light sync.