From 1d6b42eaa400f0b59cb74c541fe2d3c63bdeffee Mon Sep 17 00:00:00 2001 From: chriseth Date: Thu, 9 Jan 2020 18:35:22 +0100 Subject: [PATCH] Combine Yul documentation sections. --- docs/assembly.rst | 618 ++-------------------------- docs/contributing.rst | 2 +- docs/yul.rst | 908 ++++++++++++++++++++++++++++-------------- 3 files changed, 650 insertions(+), 878 deletions(-) diff --git a/docs/assembly.rst b/docs/assembly.rst index 2bc89ad57..67c1ab6f6 100644 --- a/docs/assembly.rst +++ b/docs/assembly.rst @@ -1,41 +1,20 @@ -################# -Solidity Assembly -################# +.. _inline-assembly: + +############### +Inline Assembly +############### .. index:: ! assembly, ! asm, ! evmasm -Solidity defines an assembly language that you can use without Solidity and also -as "inline assembly" inside Solidity source code. This guide starts with describing -how to use inline assembly, how it differs from standalone assembly -(sometimes also referred to by its proper name "Yul"), and -specifies assembly itself. - -.. _inline-assembly: - -Inline Assembly -=============== You can interleave Solidity statements with inline assembly in a language close -to the one of the virtual machine. This gives you more fine-grained control, -especially when you are enhancing the language by writing libraries. +to the one of the Ethereum virtual machine. This gives you more fine-grained control, +which is especially useful when you are enhancing the language by writing libraries. -As the EVM is a stack machine, it is often hard to address the correct stack slot -and provide arguments to opcodes at the correct point on the stack. Solidity's inline -assembly helps you do this, and with other issues that arise when writing manual assembly. +The language used for inline assembly in Solidity is called `Yul `_ +and it is documented in its own section. This section will only cover +how the inline assembly code can interface with the surrounding Solidity code. -For inline assembly, the stack is actually not visible at all, but if you look -closer, there is always a very direct translation from inline assembly to -the stack based EVM opcode stream. - -Inline assembly has the following features: - -* functional-style opcodes: ``mul(1, add(2, 3))`` -* assembly-local variables: ``let x := add(2, 3) let y := mload(0x40) x := add(x, y)`` -* access to external variables: ``function f(uint x) public { assembly { x := sub(x, 1) } }`` -* loops: ``for { let i := 0 } lt(i, x) { i := add(i, 1) } { y := mul(2, y) }`` -* if statements: ``if slt(x, 0) { x := sub(0, x) }`` -* switch statements: ``switch x case 0 { y := mul(x, 2) } default { y := 0 }`` -* function calls: ``function f(x) -> y { switch x case 0 { y := 1 } default { y := mul(x, f(sub(x, 1))) } }`` .. warning:: Inline assembly is a way to access the Ethereum Virtual Machine @@ -43,24 +22,14 @@ Inline assembly has the following features: features and checks of Solidity. You should only use it for tasks that need it, and only if you are confident with using it. -Syntax ------- -Assembly parses comments, literals and identifiers in the same way as Solidity, so you can use the -usual ``//`` and ``/* */`` comments. There is one exception: Identifiers in inline assembly can contain -``.``. Inline assembly is marked by ``assembly { ... }`` and inside -these curly braces, you can use the following (see the later sections for more details): +An inline assembly block is marked by ``assembly { ... }``, where the code inside +the curly braces is code in the `Yul `_ language. - - literals, i.e. ``0x123``, ``42`` or ``"abc"`` (strings up to 32 characters) - - opcodes in functional style, e.g. ``add(1, mload(0))`` - - variable declarations, e.g. ``let x := 7``, ``let x := add(y, 3)`` or ``let x`` (initial value of 0 is assigned) - - identifiers (assembly-local variables and externals if used as inline assembly), e.g. ``add(3, x)``, ``sstore(x_slot, 2)`` - - assignments, e.g. ``x := add(y, 3)`` - - blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }`` +The inline assembly code can access local Solidity variables as explained below. -Inline assembly manages local variables and control-flow. Because of that, -opcodes that interfere with these features are not available. This includes -the ``dup`` and ``swap`` instructions as well as ``jump`` instructions and labels. +Different inline assembly blocks share no namespace, i.e. it is not possible +to call a Yul function or access a Yul variable defined in a different inline assembly block. Example ------- @@ -146,238 +115,20 @@ efficient code, for example: } -.. _opcodes: - -Opcodes -------- - -This document does not want to be a full description of the Ethereum virtual machine, but the -following list can be used as a quick reference of its opcodes. - -If an opcode takes arguments, they are given in parentheses. -Opcodes marked with ``-`` do not return a result, -those marked with ``*`` are special in a certain way and all others return exactly one value. -Opcodes marked with ``F``, ``H``, ``B``, ``C`` or ``I`` are present since Frontier, Homestead, -Byzantium, Constantinople or Istanbul, respectively. - -In the following, ``mem[a...b)`` signifies the bytes of memory starting at position ``a`` up to -but not including position ``b`` and ``storage[p]`` signifies the storage contents at slot ``p``. - -In the grammar, opcodes are represented as pre-defined identifiers ("built-in functions"). - -+-------------------------+-----+---+-----------------------------------------------------------------+ -| Instruction | | | Explanation | -+=========================+=====+===+=================================================================+ -| stop() + `-` | F | stop execution, identical to return(0, 0) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| add(x, y) | | F | x + y | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| sub(x, y) | | F | x - y | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| mul(x, y) | | F | x * y | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| div(x, y) | | F | x / y | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| sdiv(x, y) | | F | x / y, for signed numbers in two's complement | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| mod(x, y) | | F | x % y | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| smod(x, y) | | F | x % y, for signed numbers in two's complement | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| exp(x, y) | | F | x to the power of y | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| not(x) | | F | ~x, every bit of x is negated | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| lt(x, y) | | F | 1 if x < y, 0 otherwise | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| gt(x, y) | | F | 1 if x > y, 0 otherwise | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| slt(x, y) | | F | 1 if x < y, 0 otherwise, for signed numbers in two's complement | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| sgt(x, y) | | F | 1 if x > y, 0 otherwise, for signed numbers in two's complement | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| eq(x, y) | | F | 1 if x == y, 0 otherwise | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| iszero(x) | | F | 1 if x == 0, 0 otherwise | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| and(x, y) | | F | bitwise "and" of x and y | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| or(x, y) | | F | bitwise "or" of x and y | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| xor(x, y) | | F | bitwise "xor" of x and y | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| byte(n, x) | | F | nth byte of x, where the most significant byte is the 0th byte | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| shl(x, y) | | C | logical shift left y by x bits | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| shr(x, y) | | C | logical shift right y by x bits | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| sar(x, y) | | C | signed arithmetic shift right y by x bits | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| addmod(x, y, m) | | F | (x + y) % m with arbitrary precision arithmetic | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| mulmod(x, y, m) | | F | (x * y) % m with arbitrary precision arithmetic | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| signextend(i, x) | | F | sign extend from (i*8+7)th bit counting from least significant | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| keccak256(p, n) | | F | keccak(mem[p...(p+n))) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| pc() | | F | current position in code | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| pop(x) | `-` | F | discard value x | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| mload(p) | | F | mem[p...(p+32)) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| mstore(p, v) | `-` | F | mem[p...(p+32)) := v | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| mstore8(p, v) | `-` | F | mem[p] := v & 0xff (only modifies a single byte) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| sload(p) | | F | storage[p] | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| sstore(p, v) | `-` | F | storage[p] := v | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| msize() | | F | size of memory, i.e. largest accessed memory index | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| gas() | | F | gas still available to execution | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| address() | | F | address of the current contract / execution context | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| balance(a) | | F | wei balance at address a | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| selfbalance() | | I | equivalent to balance(address()), but cheaper | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| caller() | | F | call sender (excluding ``delegatecall``) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| callvalue() | | F | wei sent together with the current call | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| calldataload(p) | | F | call data starting from position p (32 bytes) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| calldatasize() | | F | size of call data in bytes | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| calldatacopy(t, f, s) | `-` | F | copy s bytes from calldata at position f to mem at position t | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| codesize() | | F | size of the code of the current contract / execution context | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| codecopy(t, f, s) | `-` | F | copy s bytes from code at position f to mem at position t | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| extcodesize(a) | | F | size of the code at address a | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| extcodecopy(a, t, f, s) | `-` | F | like codecopy(t, f, s) but take code at address a | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| returndatasize() | | B | size of the last returndata | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| returndatacopy(t, f, s) | `-` | B | copy s bytes from returndata at position f to mem at position t | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| extcodehash(a) | | C | code hash of address a | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| create(v, p, n) | | F | create new contract with code mem[p...(p+n)) and send v wei | -| | | | and return the new address | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| create2(v, p, n, s) | | C | create new contract with code mem[p...(p+n)) at address | -| | | | keccak256(0xff . this . s . keccak256(mem[p...(p+n))) | -| | | | and send v wei and return the new address, where ``0xff`` is a | -| | | | 1 byte value, ``this`` is the current contract's address | -| | | | as a 20 byte value and ``s`` is a big-endian 256-bit value | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| call(g, a, v, in, | | F | call contract at address a with input mem[in...(in+insize)) | -| insize, out, outsize) | | | providing g gas and v wei and output area | -| | | | mem[out...(out+outsize)) returning 0 on error (eg. out of gas) | -| | | | and 1 on success | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| callcode(g, a, v, in, | | F | identical to ``call`` but only use the code from a and stay | -| insize, out, outsize) | | | in the context of the current contract otherwise | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| delegatecall(g, a, in, | | H | identical to ``callcode`` but also keep ``caller`` | -| insize, out, outsize) | | | and ``callvalue`` | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| staticcall(g, a, in, | | B | identical to ``call(g, a, 0, in, insize, out, outsize)`` but do | -| insize, out, outsize) | | | not allow state modifications | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| return(p, s) | `-` | F | end execution, return data mem[p...(p+s)) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| revert(p, s) | `-` | B | end execution, revert state changes, return data mem[p...(p+s)) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| selfdestruct(a) | `-` | F | end execution, destroy current contract and send funds to a | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| invalid() | `-` | F | end execution with invalid instruction | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| log0(p, s) | `-` | F | log without topics and data mem[p...(p+s)) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| log1(p, s, t1) | `-` | F | log with topic t1 and data mem[p...(p+s)) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| log2(p, s, t1, t2) | `-` | F | log with topics t1, t2 and data mem[p...(p+s)) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| log3(p, s, t1, t2, t3) | `-` | F | log with topics t1, t2, t3 and data mem[p...(p+s)) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| log4(p, s, t1, t2, t3, | `-` | F | log with topics t1, t2, t3, t4 and data mem[p...(p+s)) | -| t4) | | | | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| chainid() | | I | ID of the executing chain (EIP 1344) | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| origin() | | F | transaction sender | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| gasprice() | | F | gas price of the transaction | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| blockhash(b) | | F | hash of block nr b - only for last 256 blocks excluding current | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| coinbase() | | F | current mining beneficiary | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| timestamp() | | F | timestamp of the current block in seconds since the epoch | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| number() | | F | current block number | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| difficulty() | | F | difficulty of the current block | -+-------------------------+-----+---+-----------------------------------------------------------------+ -| gaslimit() | | F | block gas limit of the current block | -+-------------------------+-----+---+-----------------------------------------------------------------+ - -Literals --------- - -You can use integer constants by typing them in decimal or hexadecimal notation and an -appropriate ``PUSHi`` instruction will automatically be generated. The following creates code -to add 2 and 3 resulting in 5 and then computes the bitwise ``AND`` with the string "abc". -The final value is assigned to a local variable called ``x``. -Strings are stored left-aligned and cannot be longer than 32 bytes. - -.. code:: - - assembly { let x := and("abc", add(3, 2)) } - - -Functional Style ------------------ - -For a sequence of opcodes, it is often hard to see what the actual -arguments for certain opcodes are. In the following example, -``3`` is added to the contents in memory at position ``0x80``. - -.. code:: - - 3 0x80 mload add 0x80 mstore - -Solidity inline assembly has a "functional style" notation where the same code -would be written as follows: - -.. code:: - - mstore(0x80, add(mload(0x80), 3)) - -If you read the code from right to left, you end up with exactly the same -sequence of constants and opcodes, but it is much clearer where the -values end up. - -If you care about the exact stack layout, just note that the -syntactically first argument for a function or opcode will be put at the -top of the stack. Access to External Variables, Functions and Libraries ----------------------------------------------------- You can access Solidity variables and other identifiers by using their name. -For variables stored in the memory data location, this pushes the address, and not the value -onto the stack. Variables stored in the storage data location are different, as they might not -occupy a full storage slot, so their "address" is composed of a slot and a byte-offset + +Local variables of value type are directly usable in inline assembly. + +Local variables that refer to memory or calldata evaluate to the +address of the variable in memory, resp. calldata, not the value itself. + +For local storage variables or state variables, a single Yul identifier +is not sufficient, since they do not necessarily occupy a single full storage slot. +Therefore, their "address" is composed of a slot and a byte-offset inside that slot. To retrieve the slot pointed to by the variable ``x``, you use ``x_slot``, and to retrieve the byte-offset you use ``x_offset``. @@ -391,7 +142,9 @@ Local Solidity variables are available for assignments, for example: uint b; function f(uint x) public view returns (uint r) { assembly { - r := mul(x, sload(b_slot)) // ignore the offset, we know it is zero + // We ignore the storage slot offset, we know it is zero + // in this special case. + r := mul(x, sload(b_slot)) } } } @@ -407,177 +160,19 @@ Local Solidity variables are available for assignments, for example: To clean signed types, you can use the ``signextend`` opcode: ``assembly { signextend(, x) }`` -Declaring Assembly-Local Variables ----------------------------------- -You can use the ``let`` keyword to declare variables that are only visible in -inline assembly and actually only in the current ``{...}``-block. What happens -is that the ``let`` instruction will create a new stack slot that is reserved -for the variable and automatically removed again when the end of the block -is reached. You need to provide an initial value for the variable which can -be just ``0``, but it can also be a complex functional-style expression. - -Since 0.6.0 the name of a declared variable may not end in ``_offset`` or ``_slot`` +Since Solidity 0.6.0 the name of a inline assembly variable may not end in ``_offset`` or ``_slot`` and it may not shadow any declaration visible in the scope of the inline assembly block (including variable, contract and function declarations). Similarly, if the name of a declared variable contains a dot ``.``, the prefix up to the ``.`` may not conflict with any declaration visible in the scope of the inline assembly block. -.. code:: - - pragma solidity >=0.4.16 <0.7.0; - - contract C { - function f(uint x) public view returns (uint b) { - assembly { - let v := add(x, 1) - mstore(0x80, v) - { - let y := add(sload(v), 1) - b := y - } // y is "deallocated" here - b := add(b, v) - } // v is "deallocated" here - } - } - - -Assignments ------------ Assignments are possible to assembly-local variables and to function-local variables. Take care that when you assign to variables that point to memory or storage, you will only change the pointer and not the data. -Variables can only be assigned expressions that result in exactly one value. -If you want to assign the values returned from a function that has -multiple return parameters, you have to provide multiple variables. -.. code:: - - { - let v := 0 - let g := add(v, 2) - function f() -> a, b { } - let c, d := f() - } - -If --- - -The if statement can be used for conditionally executing code. -There is no "else" part, consider using "switch" (see below) if -you need multiple alternatives. - -.. code:: - - { - if eq(value, 0) { revert(0, 0) } - } - -The curly braces for the body are required. - -Switch ------- - -You can use a switch statement as a very basic version of "if/else". -It takes the value of an expression and compares it to several constants. -The branch corresponding to the matching constant is taken. Contrary to the -error-prone behaviour of some programming languages, control flow does -not continue from one case to the next. There can be a fallback or default -case called ``default``. - -.. code:: - - { - let x := 0 - switch calldataload(4) - case 0 { - x := calldataload(0x24) - } - default { - x := calldataload(0x44) - } - sstore(0, div(x, 2)) - } - -The list of cases does not require curly braces, but the body of a -case does require them. - -Loops ------ - -Assembly supports a simple for-style loop. For-style loops have -a header containing an initializing part, a condition and a post-iteration -part. The condition has to be a functional-style expression, while -the other two are blocks. If the initializing part -declares any variables, the scope of these variables is extended into the -body (including the condition and the post-iteration part). - -The ``break`` and ``continue`` statements can be used to exit the loop -or skip to the post-part, respectively. - -The following example computes the sum of an area in memory. - -.. code:: - - { - let x := 0 - for { let i := 0 } lt(i, 0x100) { i := add(i, 0x20) } { - x := add(x, mload(i)) - } - } - -For loops can also be written so that they behave like while loops: -Simply leave the initialization and post-iteration parts empty. - -.. code:: - - { - let x := 0 - let i := 0 - for { } lt(i, 0x100) { } { // while(i < 0x100) - x := add(x, mload(i)) - i := add(i, 0x20) - } - } - -Functions ---------- - -Assembly allows the definition of low-level functions. These take their -arguments (and a return PC) from the stack and also put the results onto the -stack. Calling a function looks the same way as executing a functional-style -opcode. - -Functions can be defined anywhere and are visible in the block they are -declared in. Inside a function, you cannot access local variables -defined outside of that function. - -If you call a function that returns multiple values, you have to assign -them to a tuple using ``a, b := f(x)`` or ``let a, b := f(x)``. - -The ``leave`` statement can be used to exit the current function. It -works like the ``return`` statement in other languages just that it does -not take a value to return, it just exits the functions and the function -will return whatever values are currently assigned to the return variable(s). - -The following example implements the power function by square-and-multiply. - -.. code:: - - { - function power(base, exponent) -> result { - switch exponent - case 0 { result := 1 } - case 1 { result := base } - default { - result := power(mul(base, base), div(exponent, 2)) - switch mod(exponent, 2) - case 1 { result := mul(base, result) } - } - } - } Things to Avoid --------------- @@ -593,7 +188,8 @@ Conventions in Solidity ----------------------- In contrast to EVM assembly, Solidity has types which are narrower than 256 bits, -e.g. ``uint24``. For efficiency, most arithmetic operations ignore the fact that types can be shorter than 256 +e.g. ``uint24``. For efficiency, most arithmetic operations ignore the fact that +types can be shorter than 256 bits, and the higher-order bits are cleaned when necessary, i.e., shortly before they are written to memory or before comparisons are performed. This means that if you access such a variable @@ -630,157 +226,3 @@ first slot of the array and followed by the array elements. to allow better convertibility between statically- and dynamically-sized arrays, so do not rely on this. - -Standalone Assembly -=================== - -The assembly language described as inline assembly above can also be used -standalone and in fact, the plan is to use it as an intermediate language -for the Solidity compiler. In this form, it tries to achieve several goals: - -1. Programs written in it should be readable, even if the code is generated by a compiler from Solidity. -2. The translation from assembly to bytecode should contain as few "surprises" as possible. -3. Control flow should be easy to detect to help in formal verification and optimization. - -In order to achieve the first and last goal, assembly provides high-level constructs -like ``for`` loops, ``if`` and ``switch`` statements and function calls. It should be possible -to write assembly programs that do not make use of explicit ``SWAP``, ``DUP``, -``JUMP`` and ``JUMPI`` statements, because the first two obfuscate the data flow -and the last two obfuscate control flow. Furthermore, functional statements of -the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like -``7 y x add mul`` because in the first form, it is much easier to see which -operand is used for which opcode. - -The second goal is achieved by compiling the -higher level constructs to bytecode in a very regular way. -The only non-local operation performed -by the assembler is name lookup of user-defined identifiers (functions, variables, ...), -which follow very simple and regular scoping rules and cleanup of local variables from the stack. - -Scoping: An identifier that is declared (label, variable, function, assembly) -is only visible in the block where it was declared (including nested blocks -inside the current block). It is not legal to access local variables across -function borders, even if they would be in scope. Shadowing is not allowed. -Local variables cannot be accessed before they were declared, but -functions and assemblies can. Assemblies are special blocks that are used -for e.g. returning runtime code or creating contracts. No identifier from an -outer assembly is visible in a sub-assembly. - -If control flow passes over the end of a block, pop instructions are inserted -that match the number of local variables declared in that block. -Whenever a local variable is referenced, the code generator needs -to know its current relative position in the stack and thus it needs to -keep track of the current so-called stack height. Since all local variables -are removed at the end of a block, the stack height before and after the block -should be the same. If this is not the case, compilation fails. - -Using ``switch``, ``for`` and functions, it should be possible to write -complex code without using ``jump`` or ``jumpi`` manually. This makes it much -easier to analyze the control flow, which allows for improved formal -verification and optimization. - -Furthermore, if manual jumps are allowed, computing the stack height is rather complicated. -The position of all local variables on the stack needs to be known, otherwise -neither references to local variables nor removing local variables automatically -from the stack at the end of a block will work properly. - -Example: - -We will follow an example compilation from Solidity to assembly. -We consider the runtime bytecode of the following Solidity program:: - - pragma solidity >=0.4.16 <0.7.0; - - - contract C { - function f(uint x) public pure returns (uint y) { - y = 1; - for (uint i = 0; i < x; i++) - y = 2 * y; - } - } - -The following assembly will be generated:: - - { - mstore(0x40, 0x80) // store the "free memory pointer" - // function dispatcher - switch div(calldataload(0), exp(2, 226)) - case 0xb3de648b { - let r := f(calldataload(4)) - let ret := $allocate(0x20) - mstore(ret, r) - return(ret, 0x20) - } - default { revert(0, 0) } - // memory allocator - function $allocate(size) -> pos { - pos := mload(0x40) - mstore(0x40, add(pos, size)) - } - // the contract function - function f(x) -> y { - y := 1 - for { let i := 0 } lt(i, x) { i := add(i, 1) } { - y := mul(2, y) - } - } - } - - -Assembly Grammar ----------------- - -The tasks of the parser are the following: - -- Turn the byte stream into a token stream, discarding C++-style comments - (a special comment exists for source references, but we will not explain it here). -- Turn the token stream into an AST according to the grammar below -- Register identifiers with the block they are defined in (annotation to the - AST node) and note from which point on, variables can be accessed. - -The assembly lexer follows the one defined by Solidity itself. - -Whitespace is used to delimit tokens and it consists of the characters -Space, Tab and Linefeed. Comments are regular JavaScript/C++ comments and -are interpreted in the same way as Whitespace. - -Grammar:: - - AssemblyBlock = '{' AssemblyItem* '}' - AssemblyItem = - Identifier | - AssemblyBlock | - AssemblyExpression | - AssemblyLocalDefinition | - AssemblyAssignment | - AssemblyIf | - AssemblySwitch | - AssemblyFunctionDefinition | - AssemblyFor | - 'break' | - 'continue' | - 'leave' | - SubAssembly - AssemblyExpression = AssemblyCall | Identifier | AssemblyLiteral - AssemblyLiteral = NumberLiteral | StringLiteral | HexLiteral - Identifier = [a-zA-Z_$] [a-zA-Z_0-9.]* - AssemblyCall = Identifier '(' ( AssemblyExpression ( ',' AssemblyExpression )* )? ')' - AssemblyLocalDefinition = 'let' IdentifierOrList ( ':=' AssemblyExpression )? - AssemblyAssignment = IdentifierOrList ':=' AssemblyExpression - IdentifierOrList = Identifier | '(' IdentifierList ')' - IdentifierList = Identifier ( ',' Identifier)* - AssemblyIf = 'if' AssemblyExpression AssemblyBlock - AssemblySwitch = 'switch' AssemblyExpression AssemblyCase* - ( 'default' AssemblyBlock )? - AssemblyCase = 'case' AssemblyExpression AssemblyBlock - AssemblyFunctionDefinition = 'function' Identifier '(' IdentifierList? ')' - ( '->' '(' IdentifierList ')' )? AssemblyBlock - AssemblyFor = 'for' ( AssemblyBlock | AssemblyExpression ) - AssemblyExpression ( AssemblyBlock | AssemblyExpression ) AssemblyBlock - SubAssembly = 'assembly' Identifier AssemblyBlock - NumberLiteral = HexNumber | DecimalNumber - HexLiteral = 'hex' ('"' ([0-9a-fA-F]{2})* '"' | '\'' ([0-9a-fA-F]{2})* '\'') - StringLiteral = '"' ([^"\r\n\\] | '\\' .)* '"' - HexNumber = '0x' [0-9a-fA-F]+ - DecimalNumber = [0-9]+ diff --git a/docs/contributing.rst b/docs/contributing.rst index 1b50ccaaa..dbfcebe89 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -328,7 +328,7 @@ Whiskers compiler in various places to aid readability, and thus maintainability and verifiability, of the code. The syntax comes with a substantial difference to Mustache. The template markers ``{{`` and ``}}`` are -replaced by ``<`` and ``>`` in order to aid parsing and avoid conflicts with :ref:`inline-assembly` +replaced by ``<`` and ``>`` in order to aid parsing and avoid conflicts with :ref:`yul` (The symbols ``<`` and ``>`` are invalid in inline assembly, while ``{`` and ``}`` are used to delimit blocks). Another limitation is that lists are only resolved one depth and they do not recurse. This may change in the future. diff --git a/docs/yul.rst b/docs/yul.rst index 739a83996..9f0ff5283 100644 --- a/docs/yul.rst +++ b/docs/yul.rst @@ -1,53 +1,74 @@ +.. _yul: + ### Yul ### -.. _yul: - .. index:: ! assembly, ! asm, ! evmasm, ! yul, julia, iulia Yul (previously also called JULIA or IULIA) is an intermediate language that can be compiled to bytecode for different backends. -Support for EVM 1.0, EVM 1.5 and eWASM is planned, and it is designed to be a usable common denominator of all three -platforms. It can already be used for "inline assembly" inside Solidity and future versions of the Solidity compiler -will use Yul as an intermediate language. Yul is a good target for high-level optimisation stages that can benefit all target platforms equally. +Support for EVM 1.0, EVM 1.5 and eWASM is planned, and it is designed to +be a usable common denominator of all three +platforms. It can already be used in stand-alone mode and +for "inline assembly" inside Solidity +and there is an experimental implementation of the Solidity compiler +that uses Yul as an intermediate language. Yul is a good target for +high-level optimisation stages that can benefit all target platforms equally. -With the "inline assembly" flavour, Yul can be used as a language setting -for the :ref:`standard-json interface `: +Motivation and High-level Description +===================================== -:: +The design of Yul tries to achieve several goals: - { - "language": "Yul", - "sources": { "input.yul": { "content": "{ sstore(0, 1) }" } }, - "settings": { - "outputSelection": { "*": { "*": ["*"], "": [ "*" ] } }, - "optimizer": { "enabled": true, "details": { "yul": true } } - } - } +1. Programs written in Yul should be readable, even if the code is generated by a compiler from Solidity or another high-level language. +2. Control flow should be easy to understand to help in manual inspection, formal verification and optimization. +3. The translation from Yul to bytecode should be as straightforward as possible. +4. Yul should be suitable for whole-program optimization. -And on the command line interface with the ``--strict-assembly`` parameter. +In order to achieve the first and second goal, Yul provides high-level constructs +like ``for`` loops, ``if`` and ``switch`` statements and function calls. These should +be sufficient for adequately representing the control flow for assembly programs. +Therefore, no explicit statements for ``SWAP``, ``DUP``, ``JUMP`` and ``JUMPI`` +are provided, because the first two obfuscate the data flow +and the last two obfuscate control flow. Furthermore, functional statements of +the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like +``7 y x add mul`` because in the first form, it is much easier to see which +operand is used for which opcode. -.. warning:: +Even though it was designed for stack machines, Yul does not expose the complexity of the stack itself. +The programmer or auditor should not have to worry about the stack. - Yul is in active development and bytecode generation is fully implemented only for untyped Yul (everything is ``u256``) - and with EVM 1.0 as target, :ref:`EVM opcodes ` are used as built-in functions. +The third goal is achieved by compiling the +higher level constructs to bytecode in a very regular way. +The only non-local operation performed +by the assembler is name lookup of user-defined identifiers (functions, variables, ...) +and cleanup of local variables from the stack. -The core components of Yul are functions, blocks, variables, literals, -for-loops, if-statements, switch-statements, expressions and assignments to variables. +To avoid confusions between concepts like values and references, +Yul is statically typed. At the same time, there is a default type +(usually the integer word of the target machine) that can always +be omitted to help readability. -Yul is typed, both variables and literals must specify the type with postfix -notation. The supported types are ``bool``, ``u8``, ``s8``, ``u32``, ``s32``, -``u64``, ``s64``, ``u128``, ``s128``, ``u256`` and ``s256``. +To keep the language simple and flexible, Yul does not have +any built-in operations, functions or types in its pure form. +These are added together with their semantics when specifying a dialect of Yul, +which allows to specialize Yul to the requirements of different +target platforms and feature sets. -Yul in itself does not even provide operators. If the EVM is targeted, -opcodes will be available as built-in functions, but they can be reimplemented -if the backend changes. For a list of mandatory built-in functions, see the section below. +Currently, there is only one specified dialect of Yul. This dialect uses +the EVM opcodes as builtin functions +(see below) and defines only the type ``u256``, which is the native 256-bit +type of the EVM. Because of that, we will not provide types in the examples below. -The following example program assumes that the EVM opcodes ``mul``, ``div`` -and ``mod`` are available either natively or as functions and computes exponentiation. -As per the warning above, the following code is untyped and can be compiled using ``solc --strict-assembly``. + +Simple Example +============== + +The following example program is written in the EVM dialect and computes exponentiation. +It can be compiled using ``solc --strict-assembly``. The builtin functions +``mul`` and ``div`` compute product and division, respectively. .. code:: @@ -67,8 +88,8 @@ As per the warning above, the following code is untyped and can be compiled usin } It is also possible to implement the same function using a for-loop -instead of with recursion. Here, we need the EVM opcodes ``lt`` (less-than) -and ``add`` to be available. +instead of with recursion. Here, ``lt(a, b)`` computes whether ``a`` is less than ``b``. +less-than comparison. .. code:: @@ -83,10 +104,326 @@ and ``add`` to be available. } } + + + +Stand-Alone Usage +================= + +You can use Yul in its stand-alone form in the EVM dialect using the Solidity compiler. +This will use the `Yul object notation `_ so that it is possible to refer +to code as data to deploy contracts. This Yul mode is available for the commandline compiler +(use ``--strict-assembly``) and for the :ref:`standard-json interface `: + +:: + + { + "language": "Yul", + "sources": { "input.yul": { "content": "{ sstore(0, 1) }" } }, + "settings": { + "outputSelection": { "*": { "*": ["*"], "": [ "*" ] } }, + "optimizer": { "enabled": true, "details": { "yul": true } } + } + } + +.. warning:: + + Yul is in active development and bytecode generation is only fully implemented for the EVM dialect of Yul + with EVM 1.0 as target. + + +Informal Description of Yul +=========================== + +In the following, we will talk about each individual aspect +of the Yul language. In examples, we will use the default EVM dialect. + +Syntax +------ + +Yul parses comments, literals and identifiers in the same way as Solidity, +so you can e.g. use ``//`` and ``/* */`` to denote comments. +There is one exception: Identifiers in Yul can contain dots: ``.``. + +Yul can specify "objects" that consist of code, data and sub-objects. +Please see `Yul Objects `_ below for details on that. +In this section, we are only concerned with the code part of such an object. +This code part always consists of a curly-braces +delimited block. Most tools support specifying just a code block +where an object is expected. + +Inside a code block, the following elements can be used +(see the later sections for more details): + + - literals, i.e. ``0x123``, ``42`` or ``"abc"`` (strings up to 32 characters) + - calls to builtin functions, e.g. ``add(1, mload(0))`` + - variable declarations, e.g. ``let x := 7``, ``let x := add(y, 3)`` or ``let x`` (initial value of 0 is assigned) + - identifiers (variables), e.g. ``add(3, x)`` + - assignments, e.g. ``x := add(y, 3)`` + - blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }`` + - if statements, e.g. ``if lt(a, b) { sstore(0, 1) }`` + - switch statements, e.g. ``switch mload(0) case 0 { revert() } default { mstore(0, 1) }`` + - for loops, e.g. ``for { let i := 0} lt(i, 10) { i := add(i, 1) } { mstore(i, 7) }`` + - function definitions, e.g. ``function f(a, b) -> c { c := add(a, b) }``` + +Multiple syntactical elements can follow each other simply separated by +whitespace, i.e. there is no terminating ``;`` or newline required. + +Literals +-------- + +You can use integer constants in decimal or hexadecimal notation. +When compiling for the EVM, this will be translated into an +appropriate ``PUSHi`` instruction. In the following example, +``3`` and ``2`` are added resulting in 5 and then the +bitwise ``and`` with the string "abc" is computed. +The final value is assigned to a local variable called ``x``. +Strings are stored left-aligned and cannot be longer than 32 bytes. + +.. code:: + + let x := and("abc", add(3, 2)) + +Unless it is the default type, the type of a literal +has to be specified after a colon: + +.. code:: + + let x := and("abc":uint32, add(3:uint256, 2:uint256)) + + +Function Calls +-------------- + +Both built-in and user-defined functions (see below) can be called +in the same way as shown in the previous example. +If the function returns a single value, it can be directly used +inside an expression again. If it returns multiple values, +they have to be assigned to local variables. + +.. code:: + + mstore(0x80, add(mload(0x80), 3)) + // Here, the user-defined function `f` returns + // two values. The definition of the function + // is missing from the example. + let x, y := f(1, mload(0)) + +For built-in functions of the EVM, functional expressions +can be directly translated to a stream of opcodes: +You just read the expression from right to left to obtain the +opcodes. In the case of the first line in the example, this +is ``PUSH1 3 PUSH1 0x80 MLOAD ADD PUSH1 0x80 MSTORE``. + +For calls to user-defined functions, the arguments are also +put on the stack from right to left and this is the order +in which argument lists are evaluated. The return values, +though, are expected on the stack from left to right, +i.e. in this example, ``y`` is on top of the stack and ``x`` +is below it. + +Variable Declarations +--------------------- + +You can use the ``let`` keyword to declare variables. +A variable is only visible inside the +``{...}``-block it was defined in. When compiling to the EVM, +a new stack slot is created that is reserved +for the variable and automatically removed again when the end of the block +is reached. You can provide an initial value for the variable. +If you do not provide a value, the variable will be initialized to zero. + +Since variables are stored on the stack, they do not directly +influence memory or storage, but they can be used as pointers +to memory or storage locations in the built-in functions +``mstore``, ``mload``, ``sstore`` and ``sload``. +Future dialects migh introduce specific types for such pointers. + +When a variable is referenced, its current value is copied. +For the EVM, this translates to a ``DUP`` instruction. + +.. code:: + + { + let zero := 0 + let v := calldataload(zero) + { + let y := add(sload(v), 1) + v := y + } // y is "deallocated" here + sstore(v, zero) + } // v and zero are "deallocated" here + + +If the declared variable should have a type different from the default type, +you denote that following a colon. You can also declare multiple +variables in one statement when you assign from a function call +that returns multiple values. + +.. code:: + + { + let zero:uint32 := 0:uint32 + let v:uint256, t:uint32 := f() + let x, y := g() + } + +Depending on the optimiser settings, the compiler can free the stack slots +already after the variable has been used for +the last time, even though it is still in scope. + + +Assignments +----------- + +Variables can be assigned to after their definition using the +``:=`` operator. It is possible to assign multiple +variables at the same time. For this, the number and types of the +values have to match. +If you want to assign the values returned from a function that has +multiple return parameters, you have to provide multiple variables. + +.. code:: + + let v := 0 + // re-assign v + v := 2 + let t := add(v, 2) + function f() -> a, b { } + // assign multiple values + v, t := f() + + +If +-- + +The if statement can be used for conditionally executing code. +No "else" block can be defined. Consider using "switch" instead (see below) if +you need multiple alternatives. + +.. code:: + + if eq(value, 0) { revert(0, 0) } + +The curly braces for the body are required. + +Switch +------ + +You can use a switch statement as an extended version of the if statement. +It takes the value of an expression and compares it to several literal constants. +The branch corresponding to the matching constant is taken. +Contrary to other programming languages, for safety reasons, control flow does +not continue from one case to the next. There can be a fallback or default +case called ``default`` which is taken if none of the literal constants matches. + +.. code:: + + { + let x := 0 + switch calldataload(4) + case 0 { + x := calldataload(0x24) + } + default { + x := calldataload(0x44) + } + sstore(0, div(x, 2)) + } + +The list of cases is not enclosed by curly braces, but the body of a +case does require them. + +Loops +----- + +Yul supports for-loops which consist of +a header containing an initializing part, a condition, a post-iteration +part and a body. The condition has to be an expression, while +the other three are blocks. If the initializing part +declares any variables at the top level, the scope of these variables extends to all other +parts of the loop. + +The ``break`` and ``continue`` statements can be used in the body to exit the loop +or skip to the post-part, respectively. + +The following example computes the sum of an area in memory. + +.. code:: + + { + let x := 0 + for { let i := 0 } lt(i, 0x100) { i := add(i, 0x20) } { + x := add(x, mload(i)) + } + } + +For loops can also be used as a replacement for while loops: +Simply leave the initialization and post-iteration parts empty. + +.. code:: + + { + let x := 0 + let i := 0 + for { } lt(i, 0x100) { } { // while(i < 0x100) + x := add(x, mload(i)) + i := add(i, 0x20) + } + } + +Function Declarations +--------------------- + +Yul allows the definition of functions. These should not be confused with functions +in Solidity since they are never part of an external interface of a contract and +are part of a namespace separate from the one for Solidity functions. + +For the EVM, Yul functions take their +arguments (and a return PC) from the stack and also put the results onto the +stack. User-defined functions and built-in functions are called in exactly the same way. + +Functions can be defined anywhere and are visible in the block they are +declared in. Inside a function, you cannot access local variables +defined outside of that function. + +Functions declare parameters and return variables, similar to Solidity. +To return a value, you assign it to the return variable(s). + +If you call a function that returns multiple values, you have to assign +them to multiple variables using ``a, b := f(x)`` or ``let a, b := f(x)``. + +The ``leave`` statement can be used to exit the current function. It +works like the ``return`` statement in other languages just that it does +not take a value to return, it just exits the functions and the function +will return whatever values are currently assigned to the return variable(s). + +Note that the EVM dialect has a built-in function called ``return`` that +quits the full execution context (internal message call) and not just +the current yul function. + +The following example implements the power function by square-and-multiply. + +.. code:: + + { + function power(base, exponent) -> result { + switch exponent + case 0 { result := 1 } + case 1 { result := base } + default { + result := power(mul(base, base), div(exponent, 2)) + switch mod(exponent, 2) + case 1 { result := mul(base, result) } + } + } + } + Specification of Yul ==================== -This chapter describes Yul code. It is usually placed inside a Yul object, which is described in the following chapter. +This chapter describes Yul code formally. Yul code is usually placed inside Yul objects, +which are explained in their own chapter. Grammar:: @@ -128,11 +465,10 @@ Grammar:: Identifier '(' ( Expression ( ',' Expression )* )? ')' Identifier = [a-zA-Z_$] [a-zA-Z_$0-9.]* IdentifierList = Identifier ( ',' Identifier)* - TypeName = Identifier | BuiltinTypeName - BuiltinTypeName = 'bool' | [us] ( '8' | '32' | '64' | '128' | '256' ) - TypedIdentifierList = Identifier ':' TypeName ( ',' Identifier ':' TypeName )* + TypeName = Identifier + TypedIdentifierList = Identifier ( ':' TypeName )? ( ',' Identifier ( ':' TypeName )? )* Literal = - (NumberLiteral | StringLiteral | HexLiteral | TrueLiteral | FalseLiteral) ':' TypeName + (NumberLiteral | StringLiteral | HexLiteral | TrueLiteral | FalseLiteral) ( ':' TypeName )? NumberLiteral = HexNumber | DecimalNumber HexLiteral = 'hex' ('"' ([0-9a-fA-F]{2})* '"' | '\'' ([0-9a-fA-F]{2})* '\'') StringLiteral = '"' ([^"\r\n\\] | '\\' .)* '"' @@ -141,19 +477,23 @@ Grammar:: HexNumber = '0x' [0-9a-fA-F]+ DecimalNumber = [0-9]+ + Restrictions on the Grammar --------------------------- +Apart from those directly imposed by the grammar, the following +restrictions apply: + Switches must have at least one case (including the default case). -If all possible values of the expression are covered, a default case should -not be allowed (i.e. a switch with a ``bool`` expression that has both a -true and a false case should not allow a default case). All case values need to -have the same type. +All case values need to have the same type and distinct values. +If all possible values of the expression type are covered, a default case is +not allowed (i.e. a switch with a ``bool`` expression that has both a +true and a false case do not allow a default case). Every expression evaluates to zero or more values. Identifiers and Literals evaluate to exactly one value and function calls evaluate to a number of values equal to the -number of return values of the function called. +number of return variables of the function called. In variable declarations and assignments, the right-hand-side expression (if present) has to evaluate to a number of values equal to the number of @@ -168,13 +508,22 @@ In all other situations, expressions have to evaluate to exactly one value. The ``continue`` and ``break`` statements can only be used inside loop bodies and have to be in the same function as the loop (or both have to be at the -top level). -The ``leave`` statement can only be used inside a function. +top level). The ``continue`` and ``break`` statements cannot be used +in other parts of a loop, not even when it is scoped inside a second loop's body. + The condition part of the for-loop has to evaluate to exactly one value. + +The ``leave`` statement can only be used inside a function. + Functions cannot be defined anywhere inside for loop init blocks. Literals cannot be larger than the their type. The largest type defined is 256-bit wide. +During assignments and function calls, the types of the respective values have to match. +There is no implicit type conversion. Type conversion in general can only be achieved +if the dialect provides an appropriate built-in function that takes a value of one +type and returns a value of a different type. + Scoping Rules ------------- @@ -186,9 +535,11 @@ introduce new identifiers into these scopes. Identifiers are visible in the block they are defined in (including all sub-nodes and sub-blocks). -As an exception, identifiers defined directly in the "init" part of the for-loop -(the first block) are visible in all other parts of the for-loop -(but not outside of the loop). +As an exception, the scope of the "init" part of the or-loop +(the first block) extends across all other parts of the for loop. +This means that variables declared in the init part (but not inside a +block inside the init part) are visible in all other parts of the for-loop. + Identifiers declared in the other parts of the for loop respect the regular syntactical scoping rules. @@ -197,7 +548,7 @@ to ``{ I... for {} C { P... } { B... } }``. The parameters and return parameters of functions are visible in the -function body and their names cannot overlap. +function body and their names have to be distinct. Variables can only be referenced after their declaration. In particular, variables cannot be referenced in the right hand side of their own variable @@ -215,13 +566,14 @@ Formal Specification -------------------- We formally specify Yul by providing an evaluation function E overloaded -on the various nodes of the AST. Any functions can have side effects, so +on the various nodes of the AST. As builtin functions can have side effects, E takes two state objects and the AST node and returns two new state objects and a variable number of other values. The two state objects are the global state object (which in the context of the EVM is the memory, storage and state of the blockchain) and the local state object (the state of local variables, i.e. a segment of the stack in the EVM). + If the AST node is a statement, E returns the two state objects and a "mode", which is used for the ``break``, ``continue`` and ``leave`` statements. If the AST node is an expression, E returns the two state objects and @@ -336,248 +688,207 @@ We will use a destructuring notation for the AST nodes. E(G, L, n: DecimalNumber) = G, L, dec(n), where dec is the decimal decoding function -Type Conversion Functions -------------------------- +.. _opcodes: -Yul has no support for implicit type conversion and therefore functions exist to provide explicit conversion. -When converting a larger type to a shorter type a runtime exception can occur in case of an overflow. +EVM Dialect +----------- -Truncating conversions are supported between the following types: - - ``bool`` - - ``u32`` - - ``u64`` - - ``u256`` - - ``s256`` +The default dialect of Yul currently is the EVM dialect for the currently selected version of the EVM. +with a version of the EVM. The only type available in this dialect +is ``u256``, the 256-bit native type of the Ethereum Virtual Machine. +Since it is the default type of this dialect, it can be omitted. -For each of these a type conversion function exists having the prototype in the form of ``to(x:) -> y:``, -such as ``u32tobool(x:u32) -> y:bool``, ``u256tou32(x:u256) -> y:u32`` or ``s256tou256(x:s256) -> y:u256``. +The following table lists all builtin functions +(depending on the EVM version) and provides a short description of the +semantics of the function / opcode. +This document does not want to be a full description of the Ethereum virtual machine. +Please refer to a different document if you are interested in the precise semantics. -.. note:: +Opcodes marked with ``-`` do not return a result and all others return exactly one value. +Opcodes marked with ``F``, ``H``, ``B``, ``C`` or ``I`` are present since Frontier, Homestead, +Byzantium, Constantinople or Istanbul, respectively. - ``u32tobool(x:u32) -> y:bool`` can be implemented as ``y := not(iszerou256(x))`` and - ``booltou32(x:bool) -> y:u32`` can be implemented as ``switch x case true:bool { y := 1:u32 } case false:bool { y := 0:u32 }`` +In the following, ``mem[a...b)`` signifies the bytes of memory starting at position ``a`` up to +but not including position ``b`` and ``storage[p]`` signifies the storage contents at slot ``p``. -Low-level Functions -------------------- +Since Yul manages local variables and control-flow, +opcodes that interfere with these features are not available. This includes +the ``dup`` and ``swap`` instructions as well as ``jump`` instructions, labels and the ``push`` instructions. -The following functions must be available: ++-------------------------+-----+---+-----------------------------------------------------------------+ +| Instruction | | | Explanation | ++=========================+=====+===+=================================================================+ +| stop() + `-` | F | stop execution, identical to return(0, 0) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| add(x, y) | | F | x + y | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| sub(x, y) | | F | x - y | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| mul(x, y) | | F | x * y | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| div(x, y) | | F | x / y or 0 if y == 0 | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| sdiv(x, y) | | F | x / y, for signed numbers in two's complement, 0 if y == 0 | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| mod(x, y) | | F | x % y, 0 if y == 0 | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| smod(x, y) | | F | x % y, for signed numbers in two's complement, 0 if y == 0 | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| exp(x, y) | | F | x to the power of y | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| not(x) | | F | bitwise "not" of x (every bit of x is negated) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| lt(x, y) | | F | 1 if x < y, 0 otherwise | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| gt(x, y) | | F | 1 if x > y, 0 otherwise | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| slt(x, y) | | F | 1 if x < y, 0 otherwise, for signed numbers in two's complement | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| sgt(x, y) | | F | 1 if x > y, 0 otherwise, for signed numbers in two's complement | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| eq(x, y) | | F | 1 if x == y, 0 otherwise | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| iszero(x) | | F | 1 if x == 0, 0 otherwise | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| and(x, y) | | F | bitwise "and" of x and y | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| or(x, y) | | F | bitwise "or" of x and y | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| xor(x, y) | | F | bitwise "xor" of x and y | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| byte(n, x) | | F | nth byte of x, where the most significant byte is the 0th byte | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| shl(x, y) | | C | logical shift left y by x bits | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| shr(x, y) | | C | logical shift right y by x bits | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| sar(x, y) | | C | signed arithmetic shift right y by x bits | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| addmod(x, y, m) | | F | (x + y) % m with arbitrary precision arithmetic, 0 if m == 0 | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| mulmod(x, y, m) | | F | (x * y) % m with arbitrary precision arithmetic, 0 if m == 0 | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| signextend(i, x) | | F | sign extend from (i*8+7)th bit counting from least significant | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| keccak256(p, n) | | F | keccak(mem[p...(p+n))) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| pc() | | F | current position in code | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| pop(x) | `-` | F | discard value x | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| mload(p) | | F | mem[p...(p+32)) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| mstore(p, v) | `-` | F | mem[p...(p+32)) := v | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| mstore8(p, v) | `-` | F | mem[p] := v & 0xff (only modifies a single byte) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| sload(p) | | F | storage[p] | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| sstore(p, v) | `-` | F | storage[p] := v | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| msize() | | F | size of memory, i.e. largest accessed memory index | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| gas() | | F | gas still available to execution | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| address() | | F | address of the current contract / execution context | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| balance(a) | | F | wei balance at address a | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| selfbalance() | | I | equivalent to balance(address()), but cheaper | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| caller() | | F | call sender (excluding ``delegatecall``) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| callvalue() | | F | wei sent together with the current call | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| calldataload(p) | | F | call data starting from position p (32 bytes) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| calldatasize() | | F | size of call data in bytes | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| calldatacopy(t, f, s) | `-` | F | copy s bytes from calldata at position f to mem at position t | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| codesize() | | F | size of the code of the current contract / execution context | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| codecopy(t, f, s) | `-` | F | copy s bytes from code at position f to mem at position t | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| extcodesize(a) | | F | size of the code at address a | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| extcodecopy(a, t, f, s) | `-` | F | like codecopy(t, f, s) but take code at address a | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| returndatasize() | | B | size of the last returndata | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| returndatacopy(t, f, s) | `-` | B | copy s bytes from returndata at position f to mem at position t | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| extcodehash(a) | | C | code hash of address a | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| create(v, p, n) | | F | create new contract with code mem[p...(p+n)) and send v wei | +| | | | and return the new address | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| create2(v, p, n, s) | | C | create new contract with code mem[p...(p+n)) at address | +| | | | keccak256(0xff . this . s . keccak256(mem[p...(p+n))) | +| | | | and send v wei and return the new address, where ``0xff`` is a | +| | | | 1 byte value, ``this`` is the current contract's address | +| | | | as a 20 byte value and ``s`` is a big-endian 256-bit value | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| call(g, a, v, in, | | F | call contract at address a with input mem[in...(in+insize)) | +| insize, out, outsize) | | | providing g gas and v wei and output area | +| | | | mem[out...(out+outsize)) returning 0 on error (eg. out of gas) | +| | | | and 1 on success | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| callcode(g, a, v, in, | | F | identical to ``call`` but only use the code from a and stay | +| insize, out, outsize) | | | in the context of the current contract otherwise | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| delegatecall(g, a, in, | | H | identical to ``callcode`` but also keep ``caller`` | +| insize, out, outsize) | | | and ``callvalue`` | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| staticcall(g, a, in, | | B | identical to ``call(g, a, 0, in, insize, out, outsize)`` but do | +| insize, out, outsize) | | | not allow state modifications | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| return(p, s) | `-` | F | end execution, return data mem[p...(p+s)) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| revert(p, s) | `-` | B | end execution, revert state changes, return data mem[p...(p+s)) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| selfdestruct(a) | `-` | F | end execution, destroy current contract and send funds to a | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| invalid() | `-` | F | end execution with invalid instruction | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| log0(p, s) | `-` | F | log without topics and data mem[p...(p+s)) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| log1(p, s, t1) | `-` | F | log with topic t1 and data mem[p...(p+s)) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| log2(p, s, t1, t2) | `-` | F | log with topics t1, t2 and data mem[p...(p+s)) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| log3(p, s, t1, t2, t3) | `-` | F | log with topics t1, t2, t3 and data mem[p...(p+s)) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| log4(p, s, t1, t2, t3, | `-` | F | log with topics t1, t2, t3, t4 and data mem[p...(p+s)) | +| t4) | | | | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| chainid() | | I | ID of the executing chain (EIP 1344) | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| origin() | | F | transaction sender | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| gasprice() | | F | gas price of the transaction | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| blockhash(b) | | F | hash of block nr b - only for last 256 blocks excluding current | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| coinbase() | | F | current mining beneficiary | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| timestamp() | | F | timestamp of the current block in seconds since the epoch | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| number() | | F | current block number | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| difficulty() | | F | difficulty of the current block | ++-------------------------+-----+---+-----------------------------------------------------------------+ +| gaslimit() | | F | block gas limit of the current block | ++-------------------------+-----+---+-----------------------------------------------------------------+ -+---------------------------------------------------------------------------------------------------------------+ -| *Logic* | -+---------------------------------------------+-----------------------------------------------------------------+ -| not(x:bool) ‑> z:bool | logical not | -+---------------------------------------------+-----------------------------------------------------------------+ -| and(x:bool, y:bool) ‑> z:bool | logical and | -+---------------------------------------------+-----------------------------------------------------------------+ -| or(x:bool, y:bool) ‑> z:bool | logical or | -+---------------------------------------------+-----------------------------------------------------------------+ -| xor(x:bool, y:bool) ‑> z:bool | xor | -+---------------------------------------------+-----------------------------------------------------------------+ -| *Arithmetic* | -+---------------------------------------------+-----------------------------------------------------------------+ -| addu256(x:u256, y:u256) ‑> z:u256 | x + y | -+---------------------------------------------+-----------------------------------------------------------------+ -| subu256(x:u256, y:u256) ‑> z:u256 | x - y | -+---------------------------------------------+-----------------------------------------------------------------+ -| mulu256(x:u256, y:u256) ‑> z:u256 | x * y | -+---------------------------------------------+-----------------------------------------------------------------+ -| divu256(x:u256, y:u256) ‑> z:u256 | x / y | -+---------------------------------------------+-----------------------------------------------------------------+ -| divs256(x:s256, y:s256) ‑> z:s256 | x / y, for signed numbers in two's complement | -+---------------------------------------------+-----------------------------------------------------------------+ -| modu256(x:u256, y:u256) ‑> z:u256 | x % y | -+---------------------------------------------+-----------------------------------------------------------------+ -| mods256(x:s256, y:s256) ‑> z:s256 | x % y, for signed numbers in two's complement | -+---------------------------------------------+-----------------------------------------------------------------+ -| signextendu256(i:u256, x:u256) ‑> z:u256 | sign extend from (i*8+7)th bit counting from least significant | -+---------------------------------------------+-----------------------------------------------------------------+ -| expu256(x:u256, y:u256) ‑> z:u256 | x to the power of y | -+---------------------------------------------+-----------------------------------------------------------------+ -| addmodu256(x:u256, y:u256, m:u256) ‑> z:u256| (x + y) % m with arbitrary precision arithmetic | -+---------------------------------------------+-----------------------------------------------------------------+ -| mulmodu256(x:u256, y:u256, m:u256) ‑> z:u256| (x * y) % m with arbitrary precision arithmetic | -+---------------------------------------------+-----------------------------------------------------------------+ -| ltu256(x:u256, y:u256) ‑> z:bool | true if x < y, false otherwise | -+---------------------------------------------+-----------------------------------------------------------------+ -| gtu256(x:u256, y:u256) ‑> z:bool | true if x > y, false otherwise | -+---------------------------------------------+-----------------------------------------------------------------+ -| lts256(x:s256, y:s256) ‑> z:bool | true if x < y, false otherwise | -| | (for signed numbers in two's complement) | -+---------------------------------------------+-----------------------------------------------------------------+ -| gts256(x:s256, y:s256) ‑> z:bool | true if x > y, false otherwise | -| | (for signed numbers in two's complement) | -+---------------------------------------------+-----------------------------------------------------------------+ -| equ256(x:u256, y:u256) ‑> z:bool | true if x == y, false otherwise | -+---------------------------------------------+-----------------------------------------------------------------+ -| iszerou256(x:u256) ‑> z:bool | true if x == 0, false otherwise | -+---------------------------------------------+-----------------------------------------------------------------+ -| notu256(x:u256) ‑> z:u256 | ~x, every bit of x is negated | -+---------------------------------------------+-----------------------------------------------------------------+ -| andu256(x:u256, y:u256) ‑> z:u256 | bitwise and of x and y | -+---------------------------------------------+-----------------------------------------------------------------+ -| oru256(x:u256, y:u256) ‑> z:u256 | bitwise or of x and y | -+---------------------------------------------+-----------------------------------------------------------------+ -| xoru256(x:u256, y:u256) ‑> z:u256 | bitwise xor of x and y | -+---------------------------------------------+-----------------------------------------------------------------+ -| shlu256(x:u256, y:u256) ‑> z:u256 | logical left shift of x by y | -+---------------------------------------------+-----------------------------------------------------------------+ -| shru256(x:u256, y:u256) ‑> z:u256 | logical right shift of x by y | -+---------------------------------------------+-----------------------------------------------------------------+ -| sars256(x:s256, y:u256) ‑> z:u256 | arithmetic right shift of x by y | -+---------------------------------------------+-----------------------------------------------------------------+ -| byte(n:u256, x:u256) ‑> v:u256 | nth byte of x, where the most significant byte is the 0th byte | -| | Cannot this be just replaced by and256(shr256(n, x), 0xff) and | -| | let it be optimised out by the EVM backend? | -+---------------------------------------------+-----------------------------------------------------------------+ -| *Memory and storage* | -+---------------------------------------------+-----------------------------------------------------------------+ -| mload(p:u256) ‑> v:u256 | mem[p..(p+32)) | -+---------------------------------------------+-----------------------------------------------------------------+ -| mstore(p:u256, v:u256) | mem[p..(p+32)) := v | -+---------------------------------------------+-----------------------------------------------------------------+ -| mstore8(p:u256, v:u256) | mem[p] := v & 0xff - only modifies a single byte | -+---------------------------------------------+-----------------------------------------------------------------+ -| sload(p:u256) ‑> v:u256 | storage[p] | -+---------------------------------------------+-----------------------------------------------------------------+ -| sstore(p:u256, v:u256) | storage[p] := v | -+---------------------------------------------+-----------------------------------------------------------------+ -| msize() ‑> size:u256 | size of memory, i.e. largest accessed memory index, albeit due | -| | due to the memory extension function, which extends by words, | -| | this will always be a multiple of 32 bytes | -+---------------------------------------------+-----------------------------------------------------------------+ -| *Execution control* | -+---------------------------------------------+-----------------------------------------------------------------+ -| create(v:u256, p:u256, n:u256) | create new contract with code mem[p..(p+n)) and send v wei | -| | and return the new address | -+---------------------------------------------+-----------------------------------------------------------------+ -| create2(v:u256, p:u256, n:u256, s:u256) | create new contract with code mem[p...(p+n)) at address | -| | keccak256(0xff . this . s . keccak256(mem[p...(p+n))) | -| | and send v wei and return the new address, where ``0xff`` is a | -| | 8 byte value, ``this`` is the current contract's address | -| | as a 20 byte value and ``s`` is a big-endian 256-bit value | -+---------------------------------------------+-----------------------------------------------------------------+ -| call(g:u256, a:u256, v:u256, in:u256, | call contract at address a with input mem[in..(in+insize)) | -| insize:u256, out:u256, | providing g gas and v wei and output area | -| outsize:u256) | mem[out..(out+outsize)) returning 0 on error (eg. out of gas) | -| ‑> r:u256 | and 1 on success | -+---------------------------------------------+-----------------------------------------------------------------+ -| callcode(g:u256, a:u256, v:u256, in:u256, | identical to ``call`` but only use the code from a | -| insize:u256, out:u256, | and stay in the context of the | -| outsize:u256) ‑> r:u256 | current contract otherwise | -+---------------------------------------------+-----------------------------------------------------------------+ -| delegatecall(g:u256, a:u256, in:u256, | identical to ``callcode``, | -| insize:u256, out:u256, | but also keep ``caller`` | -| outsize:u256) ‑> r:u256 | and ``callvalue`` | -+---------------------------------------------+-----------------------------------------------------------------+ -| abort() | abort (equals to invalid instruction on EVM) | -+---------------------------------------------+-----------------------------------------------------------------+ -| return(p:u256, s:u256) | end execution, return data mem[p..(p+s)) | -+---------------------------------------------+-----------------------------------------------------------------+ -| revert(p:u256, s:u256) | end execution, revert state changes, return data mem[p..(p+s)) | -+---------------------------------------------+-----------------------------------------------------------------+ -| selfdestruct(a:u256) | end execution, destroy current contract and send funds to a | -+---------------------------------------------+-----------------------------------------------------------------+ -| log0(p:u256, s:u256) | log without topics and data mem[p..(p+s)) | -+---------------------------------------------+-----------------------------------------------------------------+ -| log1(p:u256, s:u256, t1:u256) | log with topic t1 and data mem[p..(p+s)) | -+---------------------------------------------+-----------------------------------------------------------------+ -| log2(p:u256, s:u256, t1:u256, t2:u256) | log with topics t1, t2 and data mem[p..(p+s)) | -+---------------------------------------------+-----------------------------------------------------------------+ -| log3(p:u256, s:u256, t1:u256, t2:u256, | log with topics t, t2, t3 and data mem[p..(p+s)) | -| t3:u256) | | -+---------------------------------------------+-----------------------------------------------------------------+ -| log4(p:u256, s:u256, t1:u256, t2:u256, | log with topics t1, t2, t3, t4 and data mem[p..(p+s)) | -| t3:u256, t4:u256) | | -+---------------------------------------------+-----------------------------------------------------------------+ -| *State queries* | -+---------------------------------------------+-----------------------------------------------------------------+ -| blockcoinbase() ‑> address:u256 | current mining beneficiary | -+---------------------------------------------+-----------------------------------------------------------------+ -| blockdifficulty() ‑> difficulty:u256 | difficulty of the current block | -+---------------------------------------------+-----------------------------------------------------------------+ -| blockgaslimit() ‑> limit:u256 | block gas limit of the current block | -+---------------------------------------------+-----------------------------------------------------------------+ -| blockhash(b:u256) ‑> hash:u256 | hash of block nr b - only for last 256 blocks excluding current | -+---------------------------------------------+-----------------------------------------------------------------+ -| blocknumber() ‑> block:u256 | current block number | -+---------------------------------------------+-----------------------------------------------------------------+ -| blocktimestamp() ‑> timestamp:u256 | timestamp of the current block in seconds since the epoch | -+---------------------------------------------+-----------------------------------------------------------------+ -| txorigin() ‑> address:u256 | transaction sender | -+---------------------------------------------+-----------------------------------------------------------------+ -| txgasprice() ‑> price:u256 | gas price of the transaction | -+---------------------------------------------+-----------------------------------------------------------------+ -| gasleft() ‑> gas:u256 | gas still available to execution | -+---------------------------------------------+-----------------------------------------------------------------+ -| balance(a:u256) ‑> v:u256 | wei balance at address a | -+---------------------------------------------+-----------------------------------------------------------------+ -| this() ‑> address:u256 | address of the current contract / execution context | -+---------------------------------------------+-----------------------------------------------------------------+ -| caller() ‑> address:u256 | call sender (excluding delegatecall) | -+---------------------------------------------+-----------------------------------------------------------------+ -| callvalue() ‑> v:u256 | wei sent together with the current call | -+---------------------------------------------+-----------------------------------------------------------------+ -| calldataload(p:u256) ‑> v:u256 | call data starting from position p (32 bytes) | -+---------------------------------------------+-----------------------------------------------------------------+ -| calldatasize() ‑> v:u256 | size of call data in bytes | -+---------------------------------------------+-----------------------------------------------------------------+ -| calldatacopy(t:u256, f:u256, s:u256) | copy s bytes from calldata at position f to mem at position t | -+---------------------------------------------+-----------------------------------------------------------------+ -| codesize() ‑> size:u256 | size of the code of the current contract / execution context | -+---------------------------------------------+-----------------------------------------------------------------+ -| codecopy(t:u256, f:u256, s:u256) | copy s bytes from code at position f to mem at position t | -+---------------------------------------------+-----------------------------------------------------------------+ -| extcodesize(a:u256) ‑> size:u256 | size of the code at address a | -+---------------------------------------------+-----------------------------------------------------------------+ -| extcodecopy(a:u256, t:u256, f:u256, s:u256) | like codecopy(t, f, s) but take code at address a | -+---------------------------------------------+-----------------------------------------------------------------+ -| extcodehash(a:u256) | code hash of address a | -+---------------------------------------------+-----------------------------------------------------------------+ -| *Others* | -+---------------------------------------------+-----------------------------------------------------------------+ -| discard(unused:bool) | discard value | -+---------------------------------------------+-----------------------------------------------------------------+ -| discardu256(unused:u256) | discard value | -+---------------------------------------------+-----------------------------------------------------------------+ -| splitu256tou64(x:u256) ‑> (x1:u64, x2:u64, | split u256 to four u64's | -| x3:u64, x4:u64) | | -+---------------------------------------------+-----------------------------------------------------------------+ -| combineu64tou256(x1:u64, x2:u64, x3:u64, | combine four u64's into a single u256 | -| x4:u64) ‑> (x:u256) | | -+---------------------------------------------+-----------------------------------------------------------------+ -| keccak256(p:u256, s:u256) ‑> v:u256 | keccak(mem[p...(p+s))) | -+---------------------------------------------+-----------------------------------------------------------------+ -| *Object access* | | -+---------------------------------------------+-----------------------------------------------------------------+ -| datasize(name:string) ‑> size:u256 | size of the data object in bytes, name has to be string literal | -+---------------------------------------------+-----------------------------------------------------------------+ -| dataoffset(name:string) ‑> offset:u256 | offset of the data object inside the data area in bytes, | -| | name has to be string literal | -+---------------------------------------------+-----------------------------------------------------------------+ -| datacopy(dst:u256, src:u256, len:u256) | copy len bytes from the data area starting at offset src bytes | -| | to memory at position dst | -+---------------------------------------------+-----------------------------------------------------------------+ +There are three additional functions, ``datasize(x)``, ``dataoffset(x)`` and ``datacopy(t, f, l)``, +which are used to access other parts of a Yul object. -Backends --------- +``datasize`` and ``dataoffset`` can only take string literals (the names of other objects) +as arguments and return the size and offset in the data area, respectively. +For the EVM, the ``datacopy`` function is equivalent to ``codecopy``. -Backends or targets are the translators from Yul to a specific bytecode. Each of the backends can expose functions -prefixed with the name of the backend. We reserve ``evm_`` and ``ewasm_`` prefixes for the two proposed backends. - -Backend: EVM ------------- - -The EVM target will have all the underlying EVM opcodes exposed with the `evm_` prefix. - -Backend: "EVM 1.5" ------------------- - -TBD - -Backend: eWASM --------------- - -TBD +.. _yul-object: Specification of Yul Object =========================== @@ -603,13 +914,16 @@ An example Yul Object is shown below: .. code:: - // Code consists of a single object. A single "code" node is the code of the object. + // A contract consists of a single object with sub-objects representing + // the code to be deployed or other contracts it can create. + // The single "code" node is the executable code of the object. // Every (other) named object or data section is serialized and // made accessible to the special built-in functions datacopy / dataoffset / datasize // Access to nested objects can be performed by joining the names using ``.``. // The current object and sub-objects and data items inside the current object // are in scope without nested access. object "Contract1" { + // This is the constructor code of the contract. code { function allocate(size) -> ptr { ptr := mload(0x40) @@ -620,15 +934,14 @@ An example Yul Object is shown below: // first create "runtime.Contract2" let size := datasize("runtime.Contract2") let offset := allocate(size) - // This will turn into a memory->memory copy for eWASM and - // a codecopy for EVM + // This will turn into codecopy for EVM datacopy(offset, dataoffset("runtime.Contract2"), size) // constructor parameter is a single number 0x1234 mstore(add(offset, size), 0x1234) pop(create(offset, add(size, 32), 0)) - // now return the runtime object (this is - // constructor code) + // now return the runtime object (the currently + // executing code is the constructor code) size := datasize("runtime") offset := allocate(size) // This will turn into a memory->memory copy for eWASM and @@ -651,8 +964,7 @@ An example Yul Object is shown below: let size := datasize("Contract2") let offset := allocate(size) - // This will turn into a memory->memory copy for eWASM and - // a codecopy for EVM + // This will turn into codecopy for EVM datacopy(offset, dataoffset("Contract2"), size) // constructor parameter is a single number 0x1234 mstore(add(offset, size), 0x1234) @@ -676,3 +988,21 @@ An example Yul Object is shown below: } } } + +Yul Optimizer +============= + +The Yul optimizer operates on Yul code and uses the same language for input, output and +intermediate states. This allows for easy debugging and verification of the optimizer. + +Please see the +`documentation in the source code `_ +for more details about its internals. + +If you want to use Solidity in stand-alone Yul mode, you activate the optimizer using ``--optimize``: + +:: + + solc --strict-assembly --optimize + +In Solidity mode, the Yul optimizer is activated together with the regular optimizer.