solidity/docs/yul.rst

1090 lines
51 KiB
ReStructuredText

.. _yul:
###
Yul
###
.. index:: ! assembly, ! asm, ! evmasm, ! yul, julia, iulia
Yul (previously also called JULIA or IULIA) is an intermediate language that can be
compiled to bytecode for different backends.
Support for EVM 1.0, EVM 1.5 and eWASM is planned, and it is designed to
be a usable common denominator of all three
platforms. It can already be used in stand-alone mode and
for "inline assembly" inside Solidity
and there is an experimental implementation of the Solidity compiler
that uses Yul as an intermediate language. Yul is a good target for
high-level optimisation stages that can benefit all target platforms equally.
Motivation and High-level Description
=====================================
The design of Yul tries to achieve several goals:
1. Programs written in Yul should be readable, even if the code is generated by a compiler from Solidity or another high-level language.
2. Control flow should be easy to understand to help in manual inspection, formal verification and optimization.
3. The translation from Yul to bytecode should be as straightforward as possible.
4. Yul should be suitable for whole-program optimization.
In order to achieve the first and second goal, Yul provides high-level constructs
like ``for`` loops, ``if`` and ``switch`` statements and function calls. These should
be sufficient for adequately representing the control flow for assembly programs.
Therefore, no explicit statements for ``SWAP``, ``DUP``, ``JUMP`` and ``JUMPI``
are provided, because the first two obfuscate the data flow
and the last two obfuscate control flow. Furthermore, functional statements of
the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like
``7 y x add mul`` because in the first form, it is much easier to see which
operand is used for which opcode.
Even though it was designed for stack machines, Yul does not expose the complexity of the stack itself.
The programmer or auditor should not have to worry about the stack.
The third goal is achieved by compiling the
higher level constructs to bytecode in a very regular way.
The only non-local operation performed
by the assembler is name lookup of user-defined identifiers (functions, variables, ...)
and cleanup of local variables from the stack.
To avoid confusions between concepts like values and references,
Yul is statically typed. At the same time, there is a default type
(usually the integer word of the target machine) that can always
be omitted to help readability.
To keep the language simple and flexible, Yul does not have
any built-in operations, functions or types in its pure form.
These are added together with their semantics when specifying a dialect of Yul,
which allows to specialize Yul to the requirements of different
target platforms and feature sets.
Currently, there is only one specified dialect of Yul. This dialect uses
the EVM opcodes as builtin functions
(see below) and defines only the type ``u256``, which is the native 256-bit
type of the EVM. Because of that, we will not provide types in the examples below.
Simple Example
==============
The following example program is written in the EVM dialect and computes exponentiation.
It can be compiled using ``solc --strict-assembly``. The builtin functions
``mul`` and ``div`` compute product and division, respectively.
.. code-block:: yul
{
function power(base, exponent) -> result
{
switch exponent
case 0 { result := 1 }
case 1 { result := base }
default
{
result := power(mul(base, base), div(exponent, 2))
switch mod(exponent, 2)
case 1 { result := mul(base, result) }
}
}
}
It is also possible to implement the same function using a for-loop
instead of with recursion. Here, ``lt(a, b)`` computes whether ``a`` is less than ``b``.
less-than comparison.
.. code-block:: yul
{
function power(base, exponent) -> result
{
result := 1
for { let i := 0 } lt(i, exponent) { i := add(i, 1) }
{
result := mul(result, base)
}
}
}
Stand-Alone Usage
=================
You can use Yul in its stand-alone form in the EVM dialect using the Solidity compiler.
This will use the :ref:`Yul object notation <yul-object>` so that it is possible to refer
to code as data to deploy contracts. This Yul mode is available for the commandline compiler
(use ``--strict-assembly``) and for the :ref:`standard-json interface <compiler-api>`:
.. code-block:: json
{
"language": "Yul",
"sources": { "input.yul": { "content": "{ sstore(0, 1) }" } },
"settings": {
"outputSelection": { "*": { "*": ["*"], "": [ "*" ] } },
"optimizer": { "enabled": true, "details": { "yul": true } }
}
}
.. warning::
Yul is in active development and bytecode generation is only fully implemented for the EVM dialect of Yul
with EVM 1.0 as target.
Informal Description of Yul
===========================
In the following, we will talk about each individual aspect
of the Yul language. In examples, we will use the default EVM dialect.
Syntax
------
Yul parses comments, literals and identifiers in the same way as Solidity,
so you can e.g. use ``//`` and ``/* */`` to denote comments.
There is one exception: Identifiers in Yul can contain dots: ``.``.
Yul can specify "objects" that consist of code, data and sub-objects.
Please see :ref:`Yul Objects <yul-object>` below for details on that.
In this section, we are only concerned with the code part of such an object.
This code part always consists of a curly-braces
delimited block. Most tools support specifying just a code block
where an object is expected.
Inside a code block, the following elements can be used
(see the later sections for more details):
- literals, i.e. ``0x123``, ``42`` or ``"abc"`` (strings up to 32 characters)
- calls to builtin functions, e.g. ``add(1, mload(0))``
- variable declarations, e.g. ``let x := 7``, ``let x := add(y, 3)`` or ``let x`` (initial value of 0 is assigned)
- identifiers (variables), e.g. ``add(3, x)``
- assignments, e.g. ``x := add(y, 3)``
- blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }``
- if statements, e.g. ``if lt(a, b) { sstore(0, 1) }``
- switch statements, e.g. ``switch mload(0) case 0 { revert() } default { mstore(0, 1) }``
- for loops, e.g. ``for { let i := 0} lt(i, 10) { i := add(i, 1) } { mstore(i, 7) }``
- function definitions, e.g. ``function f(a, b) -> c { c := add(a, b) }```
Multiple syntactical elements can follow each other simply separated by
whitespace, i.e. there is no terminating ``;`` or newline required.
Literals
--------
You can use integer constants in decimal or hexadecimal notation.
When compiling for the EVM, this will be translated into an
appropriate ``PUSHi`` instruction. In the following example,
``3`` and ``2`` are added resulting in 5 and then the
bitwise ``and`` with the string "abc" is computed.
The final value is assigned to a local variable called ``x``.
Strings are stored left-aligned and cannot be longer than 32 bytes.
.. code-block:: yul
let x := and("abc", add(3, 2))
Unless it is the default type, the type of a literal
has to be specified after a colon:
.. code-block:: yul
let x := and("abc":uint32, add(3:uint256, 2:uint256))
Function Calls
--------------
Both built-in and user-defined functions (see below) can be called
in the same way as shown in the previous example.
If the function returns a single value, it can be directly used
inside an expression again. If it returns multiple values,
they have to be assigned to local variables.
.. code-block:: yul
mstore(0x80, add(mload(0x80), 3))
// Here, the user-defined function `f` returns
// two values. The definition of the function
// is missing from the example.
let x, y := f(1, mload(0))
For built-in functions of the EVM, functional expressions
can be directly translated to a stream of opcodes:
You just read the expression from right to left to obtain the
opcodes. In the case of the first line in the example, this
is ``PUSH1 3 PUSH1 0x80 MLOAD ADD PUSH1 0x80 MSTORE``.
For calls to user-defined functions, the arguments are also
put on the stack from right to left and this is the order
in which argument lists are evaluated. The return values,
though, are expected on the stack from left to right,
i.e. in this example, ``y`` is on top of the stack and ``x``
is below it.
Variable Declarations
---------------------
You can use the ``let`` keyword to declare variables.
A variable is only visible inside the
``{...}``-block it was defined in. When compiling to the EVM,
a new stack slot is created that is reserved
for the variable and automatically removed again when the end of the block
is reached. You can provide an initial value for the variable.
If you do not provide a value, the variable will be initialized to zero.
Since variables are stored on the stack, they do not directly
influence memory or storage, but they can be used as pointers
to memory or storage locations in the built-in functions
``mstore``, ``mload``, ``sstore`` and ``sload``.
Future dialects migh introduce specific types for such pointers.
When a variable is referenced, its current value is copied.
For the EVM, this translates to a ``DUP`` instruction.
.. code-block:: yul
{
let zero := 0
let v := calldataload(zero)
{
let y := add(sload(v), 1)
v := y
} // y is "deallocated" here
sstore(v, zero)
} // v and zero are "deallocated" here
If the declared variable should have a type different from the default type,
you denote that following a colon. You can also declare multiple
variables in one statement when you assign from a function call
that returns multiple values.
.. code-block:: yul
{
let zero:uint32 := 0:uint32
let v:uint256, t:uint32 := f()
let x, y := g()
}
Depending on the optimiser settings, the compiler can free the stack slots
already after the variable has been used for
the last time, even though it is still in scope.
Assignments
-----------
Variables can be assigned to after their definition using the
``:=`` operator. It is possible to assign multiple
variables at the same time. For this, the number and types of the
values have to match.
If you want to assign the values returned from a function that has
multiple return parameters, you have to provide multiple variables.
.. code-block:: yul
let v := 0
// re-assign v
v := 2
let t := add(v, 2)
function f() -> a, b { }
// assign multiple values
v, t := f()
If
--
The if statement can be used for conditionally executing code.
No "else" block can be defined. Consider using "switch" instead (see below) if
you need multiple alternatives.
.. code-block:: yul
if eq(value, 0) { revert(0, 0) }
The curly braces for the body are required.
Switch
------
You can use a switch statement as an extended version of the if statement.
It takes the value of an expression and compares it to several literal constants.
The branch corresponding to the matching constant is taken.
Contrary to other programming languages, for safety reasons, control flow does
not continue from one case to the next. There can be a fallback or default
case called ``default`` which is taken if none of the literal constants matches.
.. code-block:: yul
{
let x := 0
switch calldataload(4)
case 0 {
x := calldataload(0x24)
}
default {
x := calldataload(0x44)
}
sstore(0, div(x, 2))
}
The list of cases is not enclosed by curly braces, but the body of a
case does require them.
Loops
-----
Yul supports for-loops which consist of
a header containing an initializing part, a condition, a post-iteration
part and a body. The condition has to be an expression, while
the other three are blocks. If the initializing part
declares any variables at the top level, the scope of these variables extends to all other
parts of the loop.
The ``break`` and ``continue`` statements can be used in the body to exit the loop
or skip to the post-part, respectively.
The following example computes the sum of an area in memory.
.. code-block:: yul
{
let x := 0
for { let i := 0 } lt(i, 0x100) { i := add(i, 0x20) } {
x := add(x, mload(i))
}
}
For loops can also be used as a replacement for while loops:
Simply leave the initialization and post-iteration parts empty.
.. code-block:: yul
{
let x := 0
let i := 0
for { } lt(i, 0x100) { } { // while(i < 0x100)
x := add(x, mload(i))
i := add(i, 0x20)
}
}
Function Declarations
---------------------
Yul allows the definition of functions. These should not be confused with functions
in Solidity since they are never part of an external interface of a contract and
are part of a namespace separate from the one for Solidity functions.
For the EVM, Yul functions take their
arguments (and a return PC) from the stack and also put the results onto the
stack. User-defined functions and built-in functions are called in exactly the same way.
Functions can be defined anywhere and are visible in the block they are
declared in. Inside a function, you cannot access local variables
defined outside of that function.
Functions declare parameters and return variables, similar to Solidity.
To return a value, you assign it to the return variable(s).
If you call a function that returns multiple values, you have to assign
them to multiple variables using ``a, b := f(x)`` or ``let a, b := f(x)``.
The ``leave`` statement can be used to exit the current function. It
works like the ``return`` statement in other languages just that it does
not take a value to return, it just exits the functions and the function
will return whatever values are currently assigned to the return variable(s).
Note that the EVM dialect has a built-in function called ``return`` that
quits the full execution context (internal message call) and not just
the current yul function.
The following example implements the power function by square-and-multiply.
.. code-block:: yul
{
function power(base, exponent) -> result {
switch exponent
case 0 { result := 1 }
case 1 { result := base }
default {
result := power(mul(base, base), div(exponent, 2))
switch mod(exponent, 2)
case 1 { result := mul(base, result) }
}
}
}
Specification of Yul
====================
This chapter describes Yul code formally. Yul code is usually placed inside Yul objects,
which are explained in their own chapter.
.. code-block:: none
Block = '{' Statement* '}'
Statement =
Block |
FunctionDefinition |
VariableDeclaration |
Assignment |
If |
Expression |
Switch |
ForLoop |
BreakContinue |
Leave
FunctionDefinition =
'function' Identifier '(' TypedIdentifierList? ')'
( '->' TypedIdentifierList )? Block
VariableDeclaration =
'let' TypedIdentifierList ( ':=' Expression )?
Assignment =
IdentifierList ':=' Expression
Expression =
FunctionCall | Identifier | Literal
If =
'if' Expression Block
Switch =
'switch' Expression ( Case+ Default? | Default )
Case =
'case' Literal Block
Default =
'default' Block
ForLoop =
'for' Block Expression Block Block
BreakContinue =
'break' | 'continue'
Leave = 'leave'
FunctionCall =
Identifier '(' ( Expression ( ',' Expression )* )? ')'
Identifier = [a-zA-Z_$] [a-zA-Z_$0-9.]*
IdentifierList = Identifier ( ',' Identifier)*
TypeName = Identifier
TypedIdentifierList = Identifier ( ':' TypeName )? ( ',' Identifier ( ':' TypeName )? )*
Literal =
(NumberLiteral | StringLiteral | HexLiteral | TrueLiteral | FalseLiteral) ( ':' TypeName )?
NumberLiteral = HexNumber | DecimalNumber
HexLiteral = 'hex' ('"' ([0-9a-fA-F]{2})* '"' | '\'' ([0-9a-fA-F]{2})* '\'')
StringLiteral = '"' ([^"\r\n\\] | '\\' .)* '"'
TrueLiteral = 'true'
FalseLiteral = 'false'
HexNumber = '0x' [0-9a-fA-F]+
DecimalNumber = [0-9]+
Restrictions on the Grammar
---------------------------
Apart from those directly imposed by the grammar, the following
restrictions apply:
Switches must have at least one case (including the default case).
All case values need to have the same type and distinct values.
If all possible values of the expression type are covered, a default case is
not allowed (i.e. a switch with a ``bool`` expression that has both a
true and a false case do not allow a default case).
Every expression evaluates to zero or more values. Identifiers and Literals
evaluate to exactly
one value and function calls evaluate to a number of values equal to the
number of return variables of the function called.
In variable declarations and assignments, the right-hand-side expression
(if present) has to evaluate to a number of values equal to the number of
variables on the left-hand-side.
This is the only situation where an expression evaluating
to more than one value is allowed.
Expressions that are also statements (i.e. at the block level) have to
evaluate to zero values.
In all other situations, expressions have to evaluate to exactly one value.
The ``continue`` and ``break`` statements can only be used inside loop bodies
and have to be in the same function as the loop (or both have to be at the
top level). The ``continue`` and ``break`` statements cannot be used
in other parts of a loop, not even when it is scoped inside a second loop's body.
The condition part of the for-loop has to evaluate to exactly one value.
The ``leave`` statement can only be used inside a function.
Functions cannot be defined anywhere inside for loop init blocks.
Literals cannot be larger than the their type. The largest type defined is 256-bit wide.
During assignments and function calls, the types of the respective values have to match.
There is no implicit type conversion. Type conversion in general can only be achieved
if the dialect provides an appropriate built-in function that takes a value of one
type and returns a value of a different type.
Scoping Rules
-------------
Scopes in Yul are tied to Blocks (exceptions are functions and the for loop
as explained below) and all declarations
(``FunctionDefinition``, ``VariableDeclaration``)
introduce new identifiers into these scopes.
Identifiers are visible in
the block they are defined in (including all sub-nodes and sub-blocks).
As an exception, the scope of the "init" part of the or-loop
(the first block) extends across all other parts of the for loop.
This means that variables declared in the init part (but not inside a
block inside the init part) are visible in all other parts of the for-loop.
Identifiers declared in the other parts of the for loop respect the regular
syntactical scoping rules.
This means a for-loop of the form ``for { I... } C { P... } { B... }`` is equivalent
to ``{ I... for {} C { P... } { B... } }``.
The parameters and return parameters of functions are visible in the
function body and their names have to be distinct.
Variables can only be referenced after their declaration. In particular,
variables cannot be referenced in the right hand side of their own variable
declaration.
Functions can be referenced already before their declaration (if they are visible).
Shadowing is disallowed, i.e. you cannot declare an identifier at a point
where another identifier with the same name is also visible, even if it is
not accessible.
Inside functions, it is not possible to access a variable that was declared
outside of that function.
Formal Specification
--------------------
We formally specify Yul by providing an evaluation function E overloaded
on the various nodes of the AST. As builtin functions can have side effects,
E takes two state objects and the AST node and returns two new
state objects and a variable number of other values.
The two state objects are the global state object
(which in the context of the EVM is the memory, storage and state of the
blockchain) and the local state object (the state of local variables, i.e. a
segment of the stack in the EVM).
If the AST node is a statement, E returns the two state objects and a "mode",
which is used for the ``break``, ``continue`` and ``leave`` statements.
If the AST node is an expression, E returns the two state objects and
as many values as the expression evaluates to.
The exact nature of the global state is unspecified for this high level
description. The local state ``L`` is a mapping of identifiers ``i`` to values ``v``,
denoted as ``L[i] = v``.
For an identifier ``v``, let ``$v`` be the name of the identifier.
We will use a destructuring notation for the AST nodes.
.. code-block:: none
E(G, L, <{St1, ..., Stn}>: Block) =
let G1, L1, mode = E(G, L, St1, ..., Stn)
let L2 be a restriction of L1 to the identifiers of L
G1, L2, mode
E(G, L, St1, ..., Stn: Statement) =
if n is zero:
G, L, regular
else:
let G1, L1, mode = E(G, L, St1)
if mode is regular then
E(G1, L1, St2, ..., Stn)
otherwise
G1, L1, mode
E(G, L, FunctionDefinition) =
G, L, regular
E(G, L, <let var_1, ..., var_n := rhs>: VariableDeclaration) =
E(G, L, <var_1, ..., var_n := rhs>: Assignment)
E(G, L, <let var_1, ..., var_n>: VariableDeclaration) =
let L1 be a copy of L where L1[$var_i] = 0 for i = 1, ..., n
G, L1, regular
E(G, L, <var_1, ..., var_n := rhs>: Assignment) =
let G1, L1, v1, ..., vn = E(G, L, rhs)
let L2 be a copy of L1 where L2[$var_i] = vi for i = 1, ..., n
G, L2, regular
E(G, L, <for { i1, ..., in } condition post body>: ForLoop) =
if n >= 1:
let G1, L, mode = E(G, L, i1, ..., in)
// mode has to be regular or leave due to the syntactic restrictions
if mode is leave then
G1, L1 restricted to variables of L, leave
otherwise
let G2, L2, mode = E(G1, L1, for {} condition post body)
G2, L2 restricted to variables of L, mode
else:
let G1, L1, v = E(G, L, condition)
if v is false:
G1, L1, regular
else:
let G2, L2, mode = E(G1, L, body)
if mode is break:
G2, L2, regular
otherwise if mode is leave:
G2, L2, leave
else:
G3, L3, mode = E(G2, L2, post)
if mode is leave:
G2, L3, leave
otherwise
E(G3, L3, for {} condition post body)
E(G, L, break: BreakContinue) =
G, L, break
E(G, L, continue: BreakContinue) =
G, L, continue
E(G, L, leave: Leave) =
G, L, leave
E(G, L, <if condition body>: If) =
let G0, L0, v = E(G, L, condition)
if v is true:
E(G0, L0, body)
else:
G0, L0, regular
E(G, L, <switch condition case l1:t1 st1 ... case ln:tn stn>: Switch) =
E(G, L, switch condition case l1:t1 st1 ... case ln:tn stn default {})
E(G, L, <switch condition case l1:t1 st1 ... case ln:tn stn default st'>: Switch) =
let G0, L0, v = E(G, L, condition)
// i = 1 .. n
// Evaluate literals, context doesn't matter
let _, _, v1 = E(G0, L0, l1)
...
let _, _, vn = E(G0, L0, ln)
if there exists smallest i such that vi = v:
E(G0, L0, sti)
else:
E(G0, L0, st')
E(G, L, <name>: Identifier) =
G, L, L[$name]
E(G, L, <fname(arg1, ..., argn)>: FunctionCall) =
G1, L1, vn = E(G, L, argn)
...
G(n-1), L(n-1), v2 = E(G(n-2), L(n-2), arg2)
Gn, Ln, v1 = E(G(n-1), L(n-1), arg1)
Let <function fname (param1, ..., paramn) -> ret1, ..., retm block>
be the function of name $fname visible at the point of the call.
Let L' be a new local state such that
L'[$parami] = vi and L'[$reti] = 0 for all i.
Let G'', L'', mode = E(Gn, L', block)
G'', Ln, L''[$ret1], ..., L''[$retm]
E(G, L, l: HexLiteral) = G, L, hexString(l),
where hexString decodes l from hex and left-aligns it into 32 bytes
E(G, L, l: StringLiteral) = G, L, utf8EncodeLeftAligned(l),
where utf8EncodeLeftAligned performs a utf8 encoding of l
and aligns it left into 32 bytes
E(G, L, n: HexNumber) = G, L, hex(n)
where hex is the hexadecimal decoding function
E(G, L, n: DecimalNumber) = G, L, dec(n),
where dec is the decimal decoding function
.. _opcodes:
EVM Dialect
-----------
The default dialect of Yul currently is the EVM dialect for the currently selected version of the EVM.
with a version of the EVM. The only type available in this dialect
is ``u256``, the 256-bit native type of the Ethereum Virtual Machine.
Since it is the default type of this dialect, it can be omitted.
The following table lists all builtin functions
(depending on the EVM version) and provides a short description of the
semantics of the function / opcode.
This document does not want to be a full description of the Ethereum virtual machine.
Please refer to a different document if you are interested in the precise semantics.
Opcodes marked with ``-`` do not return a result and all others return exactly one value.
Opcodes marked with ``F``, ``H``, ``B``, ``C`` or ``I`` are present since Frontier, Homestead,
Byzantium, Constantinople or Istanbul, respectively.
In the following, ``mem[a...b)`` signifies the bytes of memory starting at position ``a`` up to
but not including position ``b`` and ``storage[p]`` signifies the storage contents at slot ``p``.
Since Yul manages local variables and control-flow,
opcodes that interfere with these features are not available. This includes
the ``dup`` and ``swap`` instructions as well as ``jump`` instructions, labels and the ``push`` instructions.
+-------------------------+-----+---+-----------------------------------------------------------------+
| Instruction | | | Explanation |
+=========================+=====+===+=================================================================+
| stop() + `-` | F | stop execution, identical to return(0, 0) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| add(x, y) | | F | x + y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sub(x, y) | | F | x - y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mul(x, y) | | F | x * y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| div(x, y) | | F | x / y or 0 if y == 0 |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sdiv(x, y) | | F | x / y, for signed numbers in two's complement, 0 if y == 0 |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mod(x, y) | | F | x % y, 0 if y == 0 |
+-------------------------+-----+---+-----------------------------------------------------------------+
| smod(x, y) | | F | x % y, for signed numbers in two's complement, 0 if y == 0 |
+-------------------------+-----+---+-----------------------------------------------------------------+
| exp(x, y) | | F | x to the power of y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| not(x) | | F | bitwise "not" of x (every bit of x is negated) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| lt(x, y) | | F | 1 if x < y, 0 otherwise |
+-------------------------+-----+---+-----------------------------------------------------------------+
| gt(x, y) | | F | 1 if x > y, 0 otherwise |
+-------------------------+-----+---+-----------------------------------------------------------------+
| slt(x, y) | | F | 1 if x < y, 0 otherwise, for signed numbers in two's complement |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sgt(x, y) | | F | 1 if x > y, 0 otherwise, for signed numbers in two's complement |
+-------------------------+-----+---+-----------------------------------------------------------------+
| eq(x, y) | | F | 1 if x == y, 0 otherwise |
+-------------------------+-----+---+-----------------------------------------------------------------+
| iszero(x) | | F | 1 if x == 0, 0 otherwise |
+-------------------------+-----+---+-----------------------------------------------------------------+
| and(x, y) | | F | bitwise "and" of x and y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| or(x, y) | | F | bitwise "or" of x and y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| xor(x, y) | | F | bitwise "xor" of x and y |
+-------------------------+-----+---+-----------------------------------------------------------------+
| byte(n, x) | | F | nth byte of x, where the most significant byte is the 0th byte |
+-------------------------+-----+---+-----------------------------------------------------------------+
| shl(x, y) | | C | logical shift left y by x bits |
+-------------------------+-----+---+-----------------------------------------------------------------+
| shr(x, y) | | C | logical shift right y by x bits |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sar(x, y) | | C | signed arithmetic shift right y by x bits |
+-------------------------+-----+---+-----------------------------------------------------------------+
| addmod(x, y, m) | | F | (x + y) % m with arbitrary precision arithmetic, 0 if m == 0 |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mulmod(x, y, m) | | F | (x * y) % m with arbitrary precision arithmetic, 0 if m == 0 |
+-------------------------+-----+---+-----------------------------------------------------------------+
| signextend(i, x) | | F | sign extend from (i*8+7)th bit counting from least significant |
+-------------------------+-----+---+-----------------------------------------------------------------+
| keccak256(p, n) | | F | keccak(mem[p...(p+n))) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| pc() | | F | current position in code |
+-------------------------+-----+---+-----------------------------------------------------------------+
| pop(x) | `-` | F | discard value x |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mload(p) | | F | mem[p...(p+32)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mstore(p, v) | `-` | F | mem[p...(p+32)) := v |
+-------------------------+-----+---+-----------------------------------------------------------------+
| mstore8(p, v) | `-` | F | mem[p] := v & 0xff (only modifies a single byte) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sload(p) | | F | storage[p] |
+-------------------------+-----+---+-----------------------------------------------------------------+
| sstore(p, v) | `-` | F | storage[p] := v |
+-------------------------+-----+---+-----------------------------------------------------------------+
| msize() | | F | size of memory, i.e. largest accessed memory index |
+-------------------------+-----+---+-----------------------------------------------------------------+
| gas() | | F | gas still available to execution |
+-------------------------+-----+---+-----------------------------------------------------------------+
| address() | | F | address of the current contract / execution context |
+-------------------------+-----+---+-----------------------------------------------------------------+
| balance(a) | | F | wei balance at address a |
+-------------------------+-----+---+-----------------------------------------------------------------+
| selfbalance() | | I | equivalent to balance(address()), but cheaper |
+-------------------------+-----+---+-----------------------------------------------------------------+
| caller() | | F | call sender (excluding ``delegatecall``) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| callvalue() | | F | wei sent together with the current call |
+-------------------------+-----+---+-----------------------------------------------------------------+
| calldataload(p) | | F | call data starting from position p (32 bytes) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| calldatasize() | | F | size of call data in bytes |
+-------------------------+-----+---+-----------------------------------------------------------------+
| calldatacopy(t, f, s) | `-` | F | copy s bytes from calldata at position f to mem at position t |
+-------------------------+-----+---+-----------------------------------------------------------------+
| codesize() | | F | size of the code of the current contract / execution context |
+-------------------------+-----+---+-----------------------------------------------------------------+
| codecopy(t, f, s) | `-` | F | copy s bytes from code at position f to mem at position t |
+-------------------------+-----+---+-----------------------------------------------------------------+
| extcodesize(a) | | F | size of the code at address a |
+-------------------------+-----+---+-----------------------------------------------------------------+
| extcodecopy(a, t, f, s) | `-` | F | like codecopy(t, f, s) but take code at address a |
+-------------------------+-----+---+-----------------------------------------------------------------+
| returndatasize() | | B | size of the last returndata |
+-------------------------+-----+---+-----------------------------------------------------------------+
| returndatacopy(t, f, s) | `-` | B | copy s bytes from returndata at position f to mem at position t |
+-------------------------+-----+---+-----------------------------------------------------------------+
| extcodehash(a) | | C | code hash of address a |
+-------------------------+-----+---+-----------------------------------------------------------------+
| create(v, p, n) | | F | create new contract with code mem[p...(p+n)) and send v wei |
| | | | and return the new address |
+-------------------------+-----+---+-----------------------------------------------------------------+
| create2(v, p, n, s) | | C | create new contract with code mem[p...(p+n)) at address |
| | | | keccak256(0xff . this . s . keccak256(mem[p...(p+n))) |
| | | | and send v wei and return the new address, where ``0xff`` is a |
| | | | 1 byte value, ``this`` is the current contract's address |
| | | | as a 20 byte value and ``s`` is a big-endian 256-bit value |
+-------------------------+-----+---+-----------------------------------------------------------------+
| call(g, a, v, in, | | F | call contract at address a with input mem[in...(in+insize)) |
| insize, out, outsize) | | | providing g gas and v wei and output area |
| | | | mem[out...(out+outsize)) returning 0 on error (eg. out of gas) |
| | | | and 1 on success |
| | | | :ref:`See more <yul-call-return-area>` |
+-------------------------+-----+---+-----------------------------------------------------------------+
| callcode(g, a, v, in, | | F | identical to ``call`` but only use the code from a and stay |
| insize, out, outsize) | | | in the context of the current contract otherwise |
| | | | :ref:`See more <yul-call-return-area>` |
+-------------------------+-----+---+-----------------------------------------------------------------+
| delegatecall(g, a, in, | | H | identical to ``callcode`` but also keep ``caller`` |
| insize, out, outsize) | | | and ``callvalue`` |
| | | | :ref:`See more <yul-call-return-area>` |
+-------------------------+-----+---+-----------------------------------------------------------------+
| staticcall(g, a, in, | | B | identical to ``call(g, a, 0, in, insize, out, outsize)`` but do |
| insize, out, outsize) | | | not allow state modifications |
| | | | :ref:`See more <yul-call-return-area>` |
+-------------------------+-----+---+-----------------------------------------------------------------+
| return(p, s) | `-` | F | end execution, return data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| revert(p, s) | `-` | B | end execution, revert state changes, return data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| selfdestruct(a) | `-` | F | end execution, destroy current contract and send funds to a |
+-------------------------+-----+---+-----------------------------------------------------------------+
| invalid() | `-` | F | end execution with invalid instruction |
+-------------------------+-----+---+-----------------------------------------------------------------+
| log0(p, s) | `-` | F | log without topics and data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| log1(p, s, t1) | `-` | F | log with topic t1 and data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| log2(p, s, t1, t2) | `-` | F | log with topics t1, t2 and data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| log3(p, s, t1, t2, t3) | `-` | F | log with topics t1, t2, t3 and data mem[p...(p+s)) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| log4(p, s, t1, t2, t3, | `-` | F | log with topics t1, t2, t3, t4 and data mem[p...(p+s)) |
| t4) | | | |
+-------------------------+-----+---+-----------------------------------------------------------------+
| chainid() | | I | ID of the executing chain (EIP 1344) |
+-------------------------+-----+---+-----------------------------------------------------------------+
| origin() | | F | transaction sender |
+-------------------------+-----+---+-----------------------------------------------------------------+
| gasprice() | | F | gas price of the transaction |
+-------------------------+-----+---+-----------------------------------------------------------------+
| blockhash(b) | | F | hash of block nr b - only for last 256 blocks excluding current |
+-------------------------+-----+---+-----------------------------------------------------------------+
| coinbase() | | F | current mining beneficiary |
+-------------------------+-----+---+-----------------------------------------------------------------+
| timestamp() | | F | timestamp of the current block in seconds since the epoch |
+-------------------------+-----+---+-----------------------------------------------------------------+
| number() | | F | current block number |
+-------------------------+-----+---+-----------------------------------------------------------------+
| difficulty() | | F | difficulty of the current block |
+-------------------------+-----+---+-----------------------------------------------------------------+
| gaslimit() | | F | block gas limit of the current block |
+-------------------------+-----+---+-----------------------------------------------------------------+
.. _yul-call-return-area:
.. note::
The ``call*`` instructions use the ``out`` and ``outsize`` parameters to define an area in memory where
the return data is placed. This area is written to depending on how many bytes the called contract returns.
If it returns more data, only the first ``outsize`` bytes are written. You can access the rest of the data
using the ``returndatacopy`` opcode. If it returns less data, then the remaining bytes are not touched at all.
You need to use the ``returndatasize`` opcode to check which part of this memory area contains the return data.
The remaining bytes will retain their values as of before the call. If the call fails (it returns ``0``),
nothing is written to that area, but you can still retrieve the failure data using ``returndatacopy``.
In some internal dialects, there are additional functions:
datasize, dataoffset, datacopy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The functions ``datasize(x)``, ``dataoffset(x)`` and ``datacopy(t, f, l)``,
are used to access other parts of a Yul object.
``datasize`` and ``dataoffset`` can only take string literals (the names of other objects)
as arguments and return the size and offset in the data area, respectively.
For the EVM, the ``datacopy`` function is equivalent to ``codecopy``.
setimmutable, loadimmutable
^^^^^^^^^^^^^^^^^^^^^^^^^^^
The functions ``setimmutable("name", value)`` and ``loadimmutable("name")`` are
used for the immutable mechanism in Solidity and do not nicely map to pur Yul.
The function ``setimmutable`` assumes that the runtime code of a contract
is currently copied to memory at offsot zero. The call to ``setimmutable("name", value)``
will store ``value`` at all points in memory that contain a call to
``loadimmutable("name")``.
.. _yul-object:
Specification of Yul Object
===========================
Yul objects are used to group named code and data sections.
The functions ``datasize``, ``dataoffset`` and ``datacopy``
can be used to access these sections from within code.
Hex strings can be used to specify data in hex encoding,
regular strings in native encoding. For code,
``datacopy`` will access its assembled binary representation.
.. code-block:: none
Object = 'object' StringLiteral '{' Code ( Object | Data )* '}'
Code = 'code' Block
Data = 'data' StringLiteral ( HexLiteral | StringLiteral )
HexLiteral = 'hex' ('"' ([0-9a-fA-F]{2})* '"' | '\'' ([0-9a-fA-F]{2})* '\'')
StringLiteral = '"' ([^"\r\n\\] | '\\' .)* '"'
Above, ``Block`` refers to ``Block`` in the Yul code grammar explained in the previous chapter.
An example Yul Object is shown below:
.. code-block:: yul
// A contract consists of a single object with sub-objects representing
// the code to be deployed or other contracts it can create.
// The single "code" node is the executable code of the object.
// Every (other) named object or data section is serialized and
// made accessible to the special built-in functions datacopy / dataoffset / datasize
// The current object, sub-objects and data items inside the current object
// are in scope.
object "Contract1" {
// This is the constructor code of the contract.
code {
function allocate(size) -> ptr {
ptr := mload(0x40)
if iszero(ptr) { ptr := 0x60 }
mstore(0x40, add(ptr, size))
}
// first create "Contract2"
let size := datasize("Contract2")
let offset := allocate(size)
// This will turn into codecopy for EVM
datacopy(offset, dataoffset("Contract2"), size)
// constructor parameter is a single number 0x1234
mstore(add(offset, size), 0x1234)
pop(create(offset, add(size, 32), 0))
// now return the runtime object (the currently
// executing code is the constructor code)
size := datasize("runtime")
offset := allocate(size)
// This will turn into a memory->memory copy for eWASM and
// a codecopy for EVM
datacopy(offset, dataoffset("runtime"), size)
return(offset, size)
}
data "Table2" hex"4123"
object "runtime" {
code {
function allocate(size) -> ptr {
ptr := mload(0x40)
if iszero(ptr) { ptr := 0x60 }
mstore(0x40, add(ptr, size))
}
// runtime code
mstore(0, "Hello, World!")
return(0, 0x20)
}
}
// Embedded object. Use case is that the outside is a factory contract,
// and Contract2 is the code to be created by the factory
object "Contract2" {
code {
// code here ...
}
object "runtime" {
code {
// code here ...
}
}
data "Table1" hex"4123"
}
}
Yul Optimizer
=============
The Yul optimizer operates on Yul code and uses the same language for input, output and
intermediate states. This allows for easy debugging and verification of the optimizer.
Please see the
`documentation in the source code <https://github.com/ethereum/solidity/blob/develop/libyul/optimiser/README.md>`_
for more details about its internals.
If you want to use Solidity in stand-alone Yul mode, you activate the optimizer using ``--optimize``:
.. code-block:: sh
solc --strict-assembly --optimize
In Solidity mode, the Yul optimizer is activated together with the regular optimizer.
Optimization step sequence
--------------------------
By default the Yul optimizer applies its predefined sequence of optimization steps to the generated assembly.
You can override this sequence and supply your own using the `--yul-optimizations` option when compiling
in Solidity mode:
.. code-block:: sh
solc --optimize --ir-optimized --yul-optimizations 'dhfoD[xarrscLMcCTU]uljmul'
By enclosing part of the sequence in square brackets (`[]`) you tell the optimizer to repeatedly
apply that part until it no longer improves the size of the resulting assembly.
You can use brackets multiple times in a single sequence but they cannot be nested.
The following optimization steps are available:
============ ===============================
Abbreviation Full name
============ ===============================
f `BlockFlattener`
l `CircularReferencesPruner`
c `CommonSubexpressionEliminator`
C `ConditionalSimplifier`
U `ConditionalUnsimplifier`
n `ControlFlowSimplifier`
D `DeadCodeEliminator`
v `EquivalentFunctionCombiner`
e `ExpressionInliner`
j `ExpressionJoiner`
s `ExpressionSimplifier`
x `ExpressionSplitter`
I `ForLoopConditionIntoBody`
O `ForLoopConditionOutOfBody`
o `ForLoopInitRewriter`
i `FullInliner`
g `FunctionGrouper`
h `FunctionHoister`
T `LiteralRematerialiser`
L `LoadResolver`
M `LoopInvariantCodeMotion`
r `RedundantAssignEliminator`
m `Rematerialiser`
V `SSAReverser`
a `SSATransform`
t `StructuralSimplifier`
u `UnusedPruner`
d `VarDeclInitializer`
============ ===============================
Some steps depend on properties ensured by `BlockFlattener`, `FunctionGrouper`, `ForLoopInitRewriter`.
For this reason the Yul optimizer always applies them before applying any steps supplied by the user.