diff --git a/docs/yul.rst b/docs/yul.rst index d0126e65b..e4819ff22 100644 --- a/docs/yul.rst +++ b/docs/yul.rst @@ -174,9 +174,32 @@ whitespace, i.e. there is no terminating ``;`` or newline required. Literals -------- -As literals, you can use integer constants in decimal or hexadecimal notation -or strings as ASCII (`"abc"`) or HEX strings (`hex"616263"`) of up to -32 bytes length. +As literals, you can use: + +- Integer constants in decimal or hexadecimal notation. + +- ASCII strings (e.g. ``"abc"``), which may contain hex escapes ``\xNN`` and Unicode escapes ``\uNNNN`` where ``N`` are hexadecimal digits. + +- Hex strings (e.g. ``hex"616263"``). + +In the EVM dialect of Yul, literals represent 256-bit words as follows: + +- Decimal or hexadecimal constants must be less than ``2**256``. + They represent the 256-bit word with that value as an unsigned integer in big endian encoding. + +- An ASCII string is first viewed as a byte sequence, by viewing + a non-escape ASCII character as a single byte whose value is the ASCII code, + an escape ``\xNN`` as single byte with that value, and + an escape ``\uNNNN`` as the UTF-8 sequence of bytes for that code point. + The byte sequence must not exceed 32 bytes. + The byte sequence is padded with zeros on the right to reach 32 bytes in length; + in other words, the string is stored left-aligned. + The padded byte sequence represents a 256-bit word whose most significant 8 bits are the ones from the first byte, + i.e. the bytes are interpreted in big endian form. + +- A hex string is first viewed as a byte sequence, by viewing + each pair of contiguous hex digits as a byte. + The byte sequence must not exceed 32 bytes (i.e. 64 hex digits), and is treated as above. When compiling for the EVM, this will be translated into an appropriate ``PUSHi`` instruction. In the following example, @@ -184,8 +207,7 @@ appropriate ``PUSHi`` instruction. In the following example, bitwise ``and`` with the string "abc" is computed. The final value is assigned to a local variable called ``x``. -Strings are stored left-aligned and cannot be longer than 32 bytes. -The limit does not apply to string literals passed to builtin functions that require +The 32-byte limit above does not apply to string literals passed to builtin functions that require literal arguments (e.g. ``setimmutable`` or ``loadimmutable``). Those strings never end up in the generated bytecode.