Combine Yul documentation sections.

This commit is contained in:
chriseth 2020-01-09 18:35:22 +01:00
parent 3d4a2219a6
commit 1d6b42eaa4
3 changed files with 650 additions and 878 deletions

View File

@ -1,41 +1,20 @@
################# .. _inline-assembly:
Solidity Assembly
################# ###############
Inline Assembly
###############
.. index:: ! assembly, ! asm, ! evmasm .. index:: ! assembly, ! asm, ! evmasm
Solidity defines an assembly language that you can use without Solidity and also
as "inline assembly" inside Solidity source code. This guide starts with describing
how to use inline assembly, how it differs from standalone assembly
(sometimes also referred to by its proper name "Yul"), and
specifies assembly itself.
.. _inline-assembly:
Inline Assembly
===============
You can interleave Solidity statements with inline assembly in a language close You can interleave Solidity statements with inline assembly in a language close
to the one of the virtual machine. This gives you more fine-grained control, to the one of the Ethereum virtual machine. This gives you more fine-grained control,
especially when you are enhancing the language by writing libraries. which is especially useful when you are enhancing the language by writing libraries.
As the EVM is a stack machine, it is often hard to address the correct stack slot The language used for inline assembly in Solidity is called `Yul <yul>`_
and provide arguments to opcodes at the correct point on the stack. Solidity's inline and it is documented in its own section. This section will only cover
assembly helps you do this, and with other issues that arise when writing manual assembly. how the inline assembly code can interface with the surrounding Solidity code.
For inline assembly, the stack is actually not visible at all, but if you look
closer, there is always a very direct translation from inline assembly to
the stack based EVM opcode stream.
Inline assembly has the following features:
* functional-style opcodes: ``mul(1, add(2, 3))``
* assembly-local variables: ``let x := add(2, 3) let y := mload(0x40) x := add(x, y)``
* access to external variables: ``function f(uint x) public { assembly { x := sub(x, 1) } }``
* loops: ``for { let i := 0 } lt(i, x) { i := add(i, 1) } { y := mul(2, y) }``
* if statements: ``if slt(x, 0) { x := sub(0, x) }``
* switch statements: ``switch x case 0 { y := mul(x, 2) } default { y := 0 }``
* function calls: ``function f(x) -> y { switch x case 0 { y := 1 } default { y := mul(x, f(sub(x, 1))) } }``
.. warning:: .. warning::
Inline assembly is a way to access the Ethereum Virtual Machine Inline assembly is a way to access the Ethereum Virtual Machine
@ -43,24 +22,14 @@ Inline assembly has the following features:
features and checks of Solidity. You should only use it for features and checks of Solidity. You should only use it for
tasks that need it, and only if you are confident with using it. tasks that need it, and only if you are confident with using it.
Syntax
------
Assembly parses comments, literals and identifiers in the same way as Solidity, so you can use the An inline assembly block is marked by ``assembly { ... }``, where the code inside
usual ``//`` and ``/* */`` comments. There is one exception: Identifiers in inline assembly can contain the curly braces is code in the `Yul <yul>`_ language.
``.``. Inline assembly is marked by ``assembly { ... }`` and inside
these curly braces, you can use the following (see the later sections for more details):
- literals, i.e. ``0x123``, ``42`` or ``"abc"`` (strings up to 32 characters) The inline assembly code can access local Solidity variables as explained below.
- opcodes in functional style, e.g. ``add(1, mload(0))``
- variable declarations, e.g. ``let x := 7``, ``let x := add(y, 3)`` or ``let x`` (initial value of 0 is assigned)
- identifiers (assembly-local variables and externals if used as inline assembly), e.g. ``add(3, x)``, ``sstore(x_slot, 2)``
- assignments, e.g. ``x := add(y, 3)``
- blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }``
Inline assembly manages local variables and control-flow. Because of that, Different inline assembly blocks share no namespace, i.e. it is not possible
opcodes that interfere with these features are not available. This includes to call a Yul function or access a Yul variable defined in a different inline assembly block.
the ``dup`` and ``swap`` instructions as well as ``jump`` instructions and labels.
Example Example
------- -------
@ -146,238 +115,20 @@ efficient code, for example:
} }
.. _opcodes:
Opcodes
-------
This document does not want to be a full description of the Ethereum virtual machine, but the
following list can be used as a quick reference of its opcodes.
If an opcode takes arguments, they are given in parentheses.
Opcodes marked with ``-`` do not return a result,
those marked with ``*`` are special in a certain way and all others return exactly one value.
Opcodes marked with ``F``, ``H``, ``B``, ``C`` or ``I`` are present since Frontier, Homestead,
Byzantium, Constantinople or Istanbul, respectively.
In the following, ``mem[a...b)`` signifies the bytes of memory starting at position ``a`` up to
but not including position ``b`` and ``storage[p]`` signifies the storage contents at slot ``p``.
In the grammar, opcodes are represented as pre-defined identifiers ("built-in functions").
+-------------------------+-----+---+-----------------------------------------------------------------+
| Instruction | | | Explanation |
+=========================+=====+===+=================================================================+
| stop() + `-` | F | stop execution, identical to return(0, 0) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| add(x, y) | | F | x + y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sub(x, y) | | F | x - y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mul(x, y) | | F | x * y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| div(x, y) | | F | x / y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sdiv(x, y) | | F | x / y, for signed numbers in two's complement |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mod(x, y) | | F | x % y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| smod(x, y) | | F | x % y, for signed numbers in two's complement |
+-------------------------+-----+---+-----------------------------------------------------------------+
| exp(x, y) | | F | x to the power of y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| not(x) | | F | ~x, every bit of x is negated |
+-------------------------+-----+---+-----------------------------------------------------------------+
| lt(x, y) | | F | 1 if x < y, 0 otherwise |
+-------------------------+-----+---+-----------------------------------------------------------------+
| gt(x, y) | | F | 1 if x > y, 0 otherwise |
+-------------------------+-----+---+-----------------------------------------------------------------+
| slt(x, y) | | F | 1 if x < y, 0 otherwise, for signed numbers in two's complement |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sgt(x, y) | | F | 1 if x > y, 0 otherwise, for signed numbers in two's complement |
+-------------------------+-----+---+-----------------------------------------------------------------+
| eq(x, y) | | F | 1 if x == y, 0 otherwise |
+-------------------------+-----+---+-----------------------------------------------------------------+
| iszero(x) | | F | 1 if x == 0, 0 otherwise |
+-------------------------+-----+---+-----------------------------------------------------------------+
| and(x, y) | | F | bitwise "and" of x and y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| or(x, y) | | F | bitwise "or" of x and y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| xor(x, y) | | F | bitwise "xor" of x and y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| byte(n, x) | | F | nth byte of x, where the most significant byte is the 0th byte |
+-------------------------+-----+---+-----------------------------------------------------------------+
| shl(x, y) | | C | logical shift left y by x bits |
+-------------------------+-----+---+-----------------------------------------------------------------+
| shr(x, y) | | C | logical shift right y by x bits |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sar(x, y) | | C | signed arithmetic shift right y by x bits |
+-------------------------+-----+---+-----------------------------------------------------------------+
| addmod(x, y, m) | | F | (x + y) % m with arbitrary precision arithmetic |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mulmod(x, y, m) | | F | (x * y) % m with arbitrary precision arithmetic |
+-------------------------+-----+---+-----------------------------------------------------------------+
| signextend(i, x) | | F | sign extend from (i*8+7)th bit counting from least significant |
+-------------------------+-----+---+-----------------------------------------------------------------+
| keccak256(p, n) | | F | keccak(mem[p...(p+n))) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| pc() | | F | current position in code |
+-------------------------+-----+---+-----------------------------------------------------------------+
| pop(x) | `-` | F | discard value x |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mload(p) | | F | mem[p...(p+32)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mstore(p, v) | `-` | F | mem[p...(p+32)) := v |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mstore8(p, v) | `-` | F | mem[p] := v & 0xff (only modifies a single byte) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sload(p) | | F | storage[p] |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sstore(p, v) | `-` | F | storage[p] := v |
+-------------------------+-----+---+-----------------------------------------------------------------+
| msize() | | F | size of memory, i.e. largest accessed memory index |
+-------------------------+-----+---+-----------------------------------------------------------------+
| gas() | | F | gas still available to execution |
+-------------------------+-----+---+-----------------------------------------------------------------+
| address() | | F | address of the current contract / execution context |
+-------------------------+-----+---+-----------------------------------------------------------------+
| balance(a) | | F | wei balance at address a |
+-------------------------+-----+---+-----------------------------------------------------------------+
| selfbalance() | | I | equivalent to balance(address()), but cheaper |
+-------------------------+-----+---+-----------------------------------------------------------------+
| caller() | | F | call sender (excluding ``delegatecall``) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| callvalue() | | F | wei sent together with the current call |
+-------------------------+-----+---+-----------------------------------------------------------------+
| calldataload(p) | | F | call data starting from position p (32 bytes) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| calldatasize() | | F | size of call data in bytes |
+-------------------------+-----+---+-----------------------------------------------------------------+
| calldatacopy(t, f, s) | `-` | F | copy s bytes from calldata at position f to mem at position t |
+-------------------------+-----+---+-----------------------------------------------------------------+
| codesize() | | F | size of the code of the current contract / execution context |
+-------------------------+-----+---+-----------------------------------------------------------------+
| codecopy(t, f, s) | `-` | F | copy s bytes from code at position f to mem at position t |
+-------------------------+-----+---+-----------------------------------------------------------------+
| extcodesize(a) | | F | size of the code at address a |
+-------------------------+-----+---+-----------------------------------------------------------------+
| extcodecopy(a, t, f, s) | `-` | F | like codecopy(t, f, s) but take code at address a |
+-------------------------+-----+---+-----------------------------------------------------------------+
| returndatasize() | | B | size of the last returndata |
+-------------------------+-----+---+-----------------------------------------------------------------+
| returndatacopy(t, f, s) | `-` | B | copy s bytes from returndata at position f to mem at position t |
+-------------------------+-----+---+-----------------------------------------------------------------+
| extcodehash(a) | | C | code hash of address a |
+-------------------------+-----+---+-----------------------------------------------------------------+
| create(v, p, n) | | F | create new contract with code mem[p...(p+n)) and send v wei |
| | | | and return the new address |
+-------------------------+-----+---+-----------------------------------------------------------------+
| create2(v, p, n, s) | | C | create new contract with code mem[p...(p+n)) at address |
| | | | keccak256(0xff . this . s . keccak256(mem[p...(p+n))) |
| | | | and send v wei and return the new address, where ``0xff`` is a |
| | | | 1 byte value, ``this`` is the current contract's address |
| | | | as a 20 byte value and ``s`` is a big-endian 256-bit value |
+-------------------------+-----+---+-----------------------------------------------------------------+
| call(g, a, v, in, | | F | call contract at address a with input mem[in...(in+insize)) |
| insize, out, outsize) | | | providing g gas and v wei and output area |
| | | | mem[out...(out+outsize)) returning 0 on error (eg. out of gas) |
| | | | and 1 on success |
+-------------------------+-----+---+-----------------------------------------------------------------+
| callcode(g, a, v, in, | | F | identical to ``call`` but only use the code from a and stay |
| insize, out, outsize) | | | in the context of the current contract otherwise |
+-------------------------+-----+---+-----------------------------------------------------------------+
| delegatecall(g, a, in, | | H | identical to ``callcode`` but also keep ``caller`` |
| insize, out, outsize) | | | and ``callvalue`` |
+-------------------------+-----+---+-----------------------------------------------------------------+
| staticcall(g, a, in, | | B | identical to ``call(g, a, 0, in, insize, out, outsize)`` but do |
| insize, out, outsize) | | | not allow state modifications |
+-------------------------+-----+---+-----------------------------------------------------------------+
| return(p, s) | `-` | F | end execution, return data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| revert(p, s) | `-` | B | end execution, revert state changes, return data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| selfdestruct(a) | `-` | F | end execution, destroy current contract and send funds to a |
+-------------------------+-----+---+-----------------------------------------------------------------+
| invalid() | `-` | F | end execution with invalid instruction |
+-------------------------+-----+---+-----------------------------------------------------------------+
| log0(p, s) | `-` | F | log without topics and data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| log1(p, s, t1) | `-` | F | log with topic t1 and data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| log2(p, s, t1, t2) | `-` | F | log with topics t1, t2 and data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| log3(p, s, t1, t2, t3) | `-` | F | log with topics t1, t2, t3 and data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| log4(p, s, t1, t2, t3, | `-` | F | log with topics t1, t2, t3, t4 and data mem[p...(p+s)) |
| t4) | | | |
+-------------------------+-----+---+-----------------------------------------------------------------+
| chainid() | | I | ID of the executing chain (EIP 1344) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| origin() | | F | transaction sender |
+-------------------------+-----+---+-----------------------------------------------------------------+
| gasprice() | | F | gas price of the transaction |
+-------------------------+-----+---+-----------------------------------------------------------------+
| blockhash(b) | | F | hash of block nr b - only for last 256 blocks excluding current |
+-------------------------+-----+---+-----------------------------------------------------------------+
| coinbase() | | F | current mining beneficiary |
+-------------------------+-----+---+-----------------------------------------------------------------+
| timestamp() | | F | timestamp of the current block in seconds since the epoch |
+-------------------------+-----+---+-----------------------------------------------------------------+
| number() | | F | current block number |
+-------------------------+-----+---+-----------------------------------------------------------------+
| difficulty() | | F | difficulty of the current block |
+-------------------------+-----+---+-----------------------------------------------------------------+
| gaslimit() | | F | block gas limit of the current block |
+-------------------------+-----+---+-----------------------------------------------------------------+
Literals
--------
You can use integer constants by typing them in decimal or hexadecimal notation and an
appropriate ``PUSHi`` instruction will automatically be generated. The following creates code
to add 2 and 3 resulting in 5 and then computes the bitwise ``AND`` with the string "abc".
The final value is assigned to a local variable called ``x``.
Strings are stored left-aligned and cannot be longer than 32 bytes.
.. code::
assembly { let x := and("abc", add(3, 2)) }
Functional Style
-----------------
For a sequence of opcodes, it is often hard to see what the actual
arguments for certain opcodes are. In the following example,
``3`` is added to the contents in memory at position ``0x80``.
.. code::
3 0x80 mload add 0x80 mstore
Solidity inline assembly has a "functional style" notation where the same code
would be written as follows:
.. code::
mstore(0x80, add(mload(0x80), 3))
If you read the code from right to left, you end up with exactly the same
sequence of constants and opcodes, but it is much clearer where the
values end up.
If you care about the exact stack layout, just note that the
syntactically first argument for a function or opcode will be put at the
top of the stack.
Access to External Variables, Functions and Libraries Access to External Variables, Functions and Libraries
----------------------------------------------------- -----------------------------------------------------
You can access Solidity variables and other identifiers by using their name. You can access Solidity variables and other identifiers by using their name.
For variables stored in the memory data location, this pushes the address, and not the value
onto the stack. Variables stored in the storage data location are different, as they might not Local variables of value type are directly usable in inline assembly.
occupy a full storage slot, so their "address" is composed of a slot and a byte-offset
Local variables that refer to memory or calldata evaluate to the
address of the variable in memory, resp. calldata, not the value itself.
For local storage variables or state variables, a single Yul identifier
is not sufficient, since they do not necessarily occupy a single full storage slot.
Therefore, their "address" is composed of a slot and a byte-offset
inside that slot. To retrieve the slot pointed to by the variable ``x``, you inside that slot. To retrieve the slot pointed to by the variable ``x``, you
use ``x_slot``, and to retrieve the byte-offset you use ``x_offset``. use ``x_slot``, and to retrieve the byte-offset you use ``x_offset``.
@ -391,7 +142,9 @@ Local Solidity variables are available for assignments, for example:
uint b; uint b;
function f(uint x) public view returns (uint r) { function f(uint x) public view returns (uint r) {
assembly { assembly {
r := mul(x, sload(b_slot)) // ignore the offset, we know it is zero // We ignore the storage slot offset, we know it is zero
// in this special case.
r := mul(x, sload(b_slot))
} }
} }
} }
@ -407,177 +160,19 @@ Local Solidity variables are available for assignments, for example:
To clean signed types, you can use the ``signextend`` opcode: To clean signed types, you can use the ``signextend`` opcode:
``assembly { signextend(<num_bytes_of_x_minus_one>, x) }`` ``assembly { signextend(<num_bytes_of_x_minus_one>, x) }``
Declaring Assembly-Local Variables
----------------------------------
You can use the ``let`` keyword to declare variables that are only visible in Since Solidity 0.6.0 the name of a inline assembly variable may not end in ``_offset`` or ``_slot``
inline assembly and actually only in the current ``{...}``-block. What happens
is that the ``let`` instruction will create a new stack slot that is reserved
for the variable and automatically removed again when the end of the block
is reached. You need to provide an initial value for the variable which can
be just ``0``, but it can also be a complex functional-style expression.
Since 0.6.0 the name of a declared variable may not end in ``_offset`` or ``_slot``
and it may not shadow any declaration visible in the scope of the inline assembly block and it may not shadow any declaration visible in the scope of the inline assembly block
(including variable, contract and function declarations). Similarly, if the name of a declared (including variable, contract and function declarations). Similarly, if the name of a declared
variable contains a dot ``.``, the prefix up to the ``.`` may not conflict with any variable contains a dot ``.``, the prefix up to the ``.`` may not conflict with any
declaration visible in the scope of the inline assembly block. declaration visible in the scope of the inline assembly block.
.. code::
pragma solidity >=0.4.16 <0.7.0;
contract C {
function f(uint x) public view returns (uint b) {
assembly {
let v := add(x, 1)
mstore(0x80, v)
{
let y := add(sload(v), 1)
b := y
} // y is "deallocated" here
b := add(b, v)
} // v is "deallocated" here
}
}
Assignments
-----------
Assignments are possible to assembly-local variables and to function-local Assignments are possible to assembly-local variables and to function-local
variables. Take care that when you assign to variables that point to variables. Take care that when you assign to variables that point to
memory or storage, you will only change the pointer and not the data. memory or storage, you will only change the pointer and not the data.
Variables can only be assigned expressions that result in exactly one value.
If you want to assign the values returned from a function that has
multiple return parameters, you have to provide multiple variables.
.. code::
{
let v := 0
let g := add(v, 2)
function f() -> a, b { }
let c, d := f()
}
If
--
The if statement can be used for conditionally executing code.
There is no "else" part, consider using "switch" (see below) if
you need multiple alternatives.
.. code::
{
if eq(value, 0) { revert(0, 0) }
}
The curly braces for the body are required.
Switch
------
You can use a switch statement as a very basic version of "if/else".
It takes the value of an expression and compares it to several constants.
The branch corresponding to the matching constant is taken. Contrary to the
error-prone behaviour of some programming languages, control flow does
not continue from one case to the next. There can be a fallback or default
case called ``default``.
.. code::
{
let x := 0
switch calldataload(4)
case 0 {
x := calldataload(0x24)
}
default {
x := calldataload(0x44)
}
sstore(0, div(x, 2))
}
The list of cases does not require curly braces, but the body of a
case does require them.
Loops
-----
Assembly supports a simple for-style loop. For-style loops have
a header containing an initializing part, a condition and a post-iteration
part. The condition has to be a functional-style expression, while
the other two are blocks. If the initializing part
declares any variables, the scope of these variables is extended into the
body (including the condition and the post-iteration part).
The ``break`` and ``continue`` statements can be used to exit the loop
or skip to the post-part, respectively.
The following example computes the sum of an area in memory.
.. code::
{
let x := 0
for { let i := 0 } lt(i, 0x100) { i := add(i, 0x20) } {
x := add(x, mload(i))
}
}
For loops can also be written so that they behave like while loops:
Simply leave the initialization and post-iteration parts empty.
.. code::
{
let x := 0
let i := 0
for { } lt(i, 0x100) { } { // while(i < 0x100)
x := add(x, mload(i))
i := add(i, 0x20)
}
}
Functions
---------
Assembly allows the definition of low-level functions. These take their
arguments (and a return PC) from the stack and also put the results onto the
stack. Calling a function looks the same way as executing a functional-style
opcode.
Functions can be defined anywhere and are visible in the block they are
declared in. Inside a function, you cannot access local variables
defined outside of that function.
If you call a function that returns multiple values, you have to assign
them to a tuple using ``a, b := f(x)`` or ``let a, b := f(x)``.
The ``leave`` statement can be used to exit the current function. It
works like the ``return`` statement in other languages just that it does
not take a value to return, it just exits the functions and the function
will return whatever values are currently assigned to the return variable(s).
The following example implements the power function by square-and-multiply.
.. code::
{
function power(base, exponent) -> result {
switch exponent
case 0 { result := 1 }
case 1 { result := base }
default {
result := power(mul(base, base), div(exponent, 2))
switch mod(exponent, 2)
case 1 { result := mul(base, result) }
}
}
}
Things to Avoid Things to Avoid
--------------- ---------------
@ -593,7 +188,8 @@ Conventions in Solidity
----------------------- -----------------------
In contrast to EVM assembly, Solidity has types which are narrower than 256 bits, In contrast to EVM assembly, Solidity has types which are narrower than 256 bits,
e.g. ``uint24``. For efficiency, most arithmetic operations ignore the fact that types can be shorter than 256 e.g. ``uint24``. For efficiency, most arithmetic operations ignore the fact that
types can be shorter than 256
bits, and the higher-order bits are cleaned when necessary, bits, and the higher-order bits are cleaned when necessary,
i.e., shortly before they are written to memory or before comparisons are performed. i.e., shortly before they are written to memory or before comparisons are performed.
This means that if you access such a variable This means that if you access such a variable
@ -630,157 +226,3 @@ first slot of the array and followed by the array elements.
to allow better convertibility between statically- and dynamically-sized arrays, so to allow better convertibility between statically- and dynamically-sized arrays, so
do not rely on this. do not rely on this.
Standalone Assembly
===================
The assembly language described as inline assembly above can also be used
standalone and in fact, the plan is to use it as an intermediate language
for the Solidity compiler. In this form, it tries to achieve several goals:
1. Programs written in it should be readable, even if the code is generated by a compiler from Solidity.
2. The translation from assembly to bytecode should contain as few "surprises" as possible.
3. Control flow should be easy to detect to help in formal verification and optimization.
In order to achieve the first and last goal, assembly provides high-level constructs
like ``for`` loops, ``if`` and ``switch`` statements and function calls. It should be possible
to write assembly programs that do not make use of explicit ``SWAP``, ``DUP``,
``JUMP`` and ``JUMPI`` statements, because the first two obfuscate the data flow
and the last two obfuscate control flow. Furthermore, functional statements of
the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like
``7 y x add mul`` because in the first form, it is much easier to see which
operand is used for which opcode.
The second goal is achieved by compiling the
higher level constructs to bytecode in a very regular way.
The only non-local operation performed
by the assembler is name lookup of user-defined identifiers (functions, variables, ...),
which follow very simple and regular scoping rules and cleanup of local variables from the stack.
Scoping: An identifier that is declared (label, variable, function, assembly)
is only visible in the block where it was declared (including nested blocks
inside the current block). It is not legal to access local variables across
function borders, even if they would be in scope. Shadowing is not allowed.
Local variables cannot be accessed before they were declared, but
functions and assemblies can. Assemblies are special blocks that are used
for e.g. returning runtime code or creating contracts. No identifier from an
outer assembly is visible in a sub-assembly.
If control flow passes over the end of a block, pop instructions are inserted
that match the number of local variables declared in that block.
Whenever a local variable is referenced, the code generator needs
to know its current relative position in the stack and thus it needs to
keep track of the current so-called stack height. Since all local variables
are removed at the end of a block, the stack height before and after the block
should be the same. If this is not the case, compilation fails.
Using ``switch``, ``for`` and functions, it should be possible to write
complex code without using ``jump`` or ``jumpi`` manually. This makes it much
easier to analyze the control flow, which allows for improved formal
verification and optimization.
Furthermore, if manual jumps are allowed, computing the stack height is rather complicated.
The position of all local variables on the stack needs to be known, otherwise
neither references to local variables nor removing local variables automatically
from the stack at the end of a block will work properly.
Example:
We will follow an example compilation from Solidity to assembly.
We consider the runtime bytecode of the following Solidity program::
pragma solidity >=0.4.16 <0.7.0;
contract C {
function f(uint x) public pure returns (uint y) {
y = 1;
for (uint i = 0; i < x; i++)
y = 2 * y;
}
}
The following assembly will be generated::
{
mstore(0x40, 0x80) // store the "free memory pointer"
// function dispatcher
switch div(calldataload(0), exp(2, 226))
case 0xb3de648b {
let r := f(calldataload(4))
let ret := $allocate(0x20)
mstore(ret, r)
return(ret, 0x20)
}
default { revert(0, 0) }
// memory allocator
function $allocate(size) -> pos {
pos := mload(0x40)
mstore(0x40, add(pos, size))
}
// the contract function
function f(x) -> y {
y := 1
for { let i := 0 } lt(i, x) { i := add(i, 1) } {
y := mul(2, y)
}
}
}
Assembly Grammar
----------------
The tasks of the parser are the following:
- Turn the byte stream into a token stream, discarding C++-style comments
(a special comment exists for source references, but we will not explain it here).
- Turn the token stream into an AST according to the grammar below
- Register identifiers with the block they are defined in (annotation to the
AST node) and note from which point on, variables can be accessed.
The assembly lexer follows the one defined by Solidity itself.
Whitespace is used to delimit tokens and it consists of the characters
Space, Tab and Linefeed. Comments are regular JavaScript/C++ comments and
are interpreted in the same way as Whitespace.
Grammar::
AssemblyBlock = '{' AssemblyItem* '}'
AssemblyItem =
Identifier |
AssemblyBlock |
AssemblyExpression |
AssemblyLocalDefinition |
AssemblyAssignment |
AssemblyIf |
AssemblySwitch |
AssemblyFunctionDefinition |
AssemblyFor |
'break' |
'continue' |
'leave' |
SubAssembly
AssemblyExpression = AssemblyCall | Identifier | AssemblyLiteral
AssemblyLiteral = NumberLiteral | StringLiteral | HexLiteral
Identifier = [a-zA-Z_$] [a-zA-Z_0-9.]*
AssemblyCall = Identifier '(' ( AssemblyExpression ( ',' AssemblyExpression )* )? ')'
AssemblyLocalDefinition = 'let' IdentifierOrList ( ':=' AssemblyExpression )?
AssemblyAssignment = IdentifierOrList ':=' AssemblyExpression
IdentifierOrList = Identifier | '(' IdentifierList ')'
IdentifierList = Identifier ( ',' Identifier)*
AssemblyIf = 'if' AssemblyExpression AssemblyBlock
AssemblySwitch = 'switch' AssemblyExpression AssemblyCase*
( 'default' AssemblyBlock )?
AssemblyCase = 'case' AssemblyExpression AssemblyBlock
AssemblyFunctionDefinition = 'function' Identifier '(' IdentifierList? ')'
( '->' '(' IdentifierList ')' )? AssemblyBlock
AssemblyFor = 'for' ( AssemblyBlock | AssemblyExpression )
AssemblyExpression ( AssemblyBlock | AssemblyExpression ) AssemblyBlock
SubAssembly = 'assembly' Identifier AssemblyBlock
NumberLiteral = HexNumber | DecimalNumber
HexLiteral = 'hex' ('"' ([0-9a-fA-F]{2})* '"' | '\'' ([0-9a-fA-F]{2})* '\'')
StringLiteral = '"' ([^"\r\n\\] | '\\' .)* '"'
HexNumber = '0x' [0-9a-fA-F]+
DecimalNumber = [0-9]+

View File

@ -328,7 +328,7 @@ Whiskers
compiler in various places to aid readability, and thus maintainability and verifiability, of the code. compiler in various places to aid readability, and thus maintainability and verifiability, of the code.
The syntax comes with a substantial difference to Mustache. The template markers ``{{`` and ``}}`` are The syntax comes with a substantial difference to Mustache. The template markers ``{{`` and ``}}`` are
replaced by ``<`` and ``>`` in order to aid parsing and avoid conflicts with :ref:`inline-assembly` replaced by ``<`` and ``>`` in order to aid parsing and avoid conflicts with :ref:`yul`
(The symbols ``<`` and ``>`` are invalid in inline assembly, while ``{`` and ``}`` are used to delimit blocks). (The symbols ``<`` and ``>`` are invalid in inline assembly, while ``{`` and ``}`` are used to delimit blocks).
Another limitation is that lists are only resolved one depth and they do not recurse. This may change in the future. Another limitation is that lists are only resolved one depth and they do not recurse. This may change in the future.

File diff suppressed because it is too large Load Diff