Clarify interpretation of literals.

This is based on discussions on Gitter.
This commit is contained in:
Alessandro Coglio 2021-09-15 16:52:56 -07:00 committed by chriseth
parent c9f98f2cc2
commit 0ac441ac73

View File

@ -174,9 +174,32 @@ whitespace, i.e. there is no terminating ``;`` or newline required.
Literals
--------
As literals, you can use integer constants in decimal or hexadecimal notation
or strings as ASCII (`"abc"`) or HEX strings (`hex"616263"`) of up to
32 bytes length.
As literals, you can use:
- Integer constants in decimal or hexadecimal notation.
- ASCII strings (e.g. ``"abc"``), which may contain hex escapes ``\xNN`` and Unicode escapes ``\uNNNN`` where ``N`` are hexadecimal digits.
- Hex strings (e.g. ``hex"616263"``).
In the EVM dialect of Yul, literals represent 256-bit words as follows:
- Decimal or hexadecimal constants must be less than ``2**256``.
They represent the 256-bit word with that value as an unsigned integer in big endian encoding.
- An ASCII string is first viewed as a byte sequence, by viewing
a non-escape ASCII character as a single byte whose value is the ASCII code,
an escape ``\xNN`` as single byte with that value, and
an escape ``\uNNNN`` as the UTF-8 sequence of bytes for that code point.
The byte sequence must not exceed 32 bytes.
The byte sequence is padded with zeros on the right to reach 32 bytes in length;
in other words, the string is stored left-aligned.
The padded byte sequence represents a 256-bit word whose most significant 8 bits are the ones from the first byte,
i.e. the bytes are interpreted in big endian form.
- A hex string is first viewed as a byte sequence, by viewing
each pair of contiguous hex digits as a byte.
The byte sequence must not exceed 32 bytes (i.e. 64 hex digits), and is treated as above.
When compiling for the EVM, this will be translated into an
appropriate ``PUSHi`` instruction. In the following example,
@ -184,8 +207,7 @@ appropriate ``PUSHi`` instruction. In the following example,
bitwise ``and`` with the string "abc" is computed.
The final value is assigned to a local variable called ``x``.
Strings are stored left-aligned and cannot be longer than 32 bytes.
The limit does not apply to string literals passed to builtin functions that require
The 32-byte limit above does not apply to string literals passed to builtin functions that require
literal arguments (e.g. ``setimmutable`` or ``loadimmutable``). Those strings never end up in the
generated bytecode.