mirror of
https://github.com/ethereum/solidity
synced 2023-10-03 13:03:40 +00:00
1018 lines
48 KiB
ReStructuredText
1018 lines
48 KiB
ReStructuredText
.. _yul:
|
|
|
|
###
|
|
Yul
|
|
###
|
|
|
|
.. index:: ! assembly, ! asm, ! evmasm, ! yul, julia, iulia
|
|
|
|
Yul (previously also called JULIA or IULIA) is an intermediate language that can be
|
|
compiled to bytecode for different backends.
|
|
|
|
Support for EVM 1.0, EVM 1.5 and eWASM is planned, and it is designed to
|
|
be a usable common denominator of all three
|
|
platforms. It can already be used in stand-alone mode and
|
|
for "inline assembly" inside Solidity
|
|
and there is an experimental implementation of the Solidity compiler
|
|
that uses Yul as an intermediate language. Yul is a good target for
|
|
high-level optimisation stages that can benefit all target platforms equally.
|
|
|
|
Motivation and High-level Description
|
|
=====================================
|
|
|
|
The design of Yul tries to achieve several goals:
|
|
|
|
1. Programs written in Yul should be readable, even if the code is generated by a compiler from Solidity or another high-level language.
|
|
2. Control flow should be easy to understand to help in manual inspection, formal verification and optimization.
|
|
3. The translation from Yul to bytecode should be as straightforward as possible.
|
|
4. Yul should be suitable for whole-program optimization.
|
|
|
|
In order to achieve the first and second goal, Yul provides high-level constructs
|
|
like ``for`` loops, ``if`` and ``switch`` statements and function calls. These should
|
|
be sufficient for adequately representing the control flow for assembly programs.
|
|
Therefore, no explicit statements for ``SWAP``, ``DUP``, ``JUMP`` and ``JUMPI``
|
|
are provided, because the first two obfuscate the data flow
|
|
and the last two obfuscate control flow. Furthermore, functional statements of
|
|
the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like
|
|
``7 y x add mul`` because in the first form, it is much easier to see which
|
|
operand is used for which opcode.
|
|
|
|
Even though it was designed for stack machines, Yul does not expose the complexity of the stack itself.
|
|
The programmer or auditor should not have to worry about the stack.
|
|
|
|
The third goal is achieved by compiling the
|
|
higher level constructs to bytecode in a very regular way.
|
|
The only non-local operation performed
|
|
by the assembler is name lookup of user-defined identifiers (functions, variables, ...)
|
|
and cleanup of local variables from the stack.
|
|
|
|
To avoid confusions between concepts like values and references,
|
|
Yul is statically typed. At the same time, there is a default type
|
|
(usually the integer word of the target machine) that can always
|
|
be omitted to help readability.
|
|
|
|
To keep the language simple and flexible, Yul does not have
|
|
any built-in operations, functions or types in its pure form.
|
|
These are added together with their semantics when specifying a dialect of Yul,
|
|
which allows to specialize Yul to the requirements of different
|
|
target platforms and feature sets.
|
|
|
|
Currently, there is only one specified dialect of Yul. This dialect uses
|
|
the EVM opcodes as builtin functions
|
|
(see below) and defines only the type ``u256``, which is the native 256-bit
|
|
type of the EVM. Because of that, we will not provide types in the examples below.
|
|
|
|
|
|
Simple Example
|
|
==============
|
|
|
|
The following example program is written in the EVM dialect and computes exponentiation.
|
|
It can be compiled using ``solc --strict-assembly``. The builtin functions
|
|
``mul`` and ``div`` compute product and division, respectively.
|
|
|
|
.. code::
|
|
|
|
{
|
|
function power(base, exponent) -> result
|
|
{
|
|
switch exponent
|
|
case 0 { result := 1 }
|
|
case 1 { result := base }
|
|
default
|
|
{
|
|
result := power(mul(base, base), div(exponent, 2))
|
|
switch mod(exponent, 2)
|
|
case 1 { result := mul(base, result) }
|
|
}
|
|
}
|
|
}
|
|
|
|
It is also possible to implement the same function using a for-loop
|
|
instead of with recursion. Here, ``lt(a, b)`` computes whether ``a`` is less than ``b``.
|
|
less-than comparison.
|
|
|
|
.. code::
|
|
|
|
{
|
|
function power(base, exponent) -> result
|
|
{
|
|
result := 1
|
|
for { let i := 0 } lt(i, exponent) { i := add(i, 1) }
|
|
{
|
|
result := mul(result, base)
|
|
}
|
|
}
|
|
}
|
|
|
|
|
|
|
|
|
|
Stand-Alone Usage
|
|
=================
|
|
|
|
You can use Yul in its stand-alone form in the EVM dialect using the Solidity compiler.
|
|
This will use the :ref:`Yul object notation <yul-object>` so that it is possible to refer
|
|
to code as data to deploy contracts. This Yul mode is available for the commandline compiler
|
|
(use ``--strict-assembly``) and for the :ref:`standard-json interface <compiler-api>`:
|
|
|
|
::
|
|
|
|
{
|
|
"language": "Yul",
|
|
"sources": { "input.yul": { "content": "{ sstore(0, 1) }" } },
|
|
"settings": {
|
|
"outputSelection": { "*": { "*": ["*"], "": [ "*" ] } },
|
|
"optimizer": { "enabled": true, "details": { "yul": true } }
|
|
}
|
|
}
|
|
|
|
.. warning::
|
|
|
|
Yul is in active development and bytecode generation is only fully implemented for the EVM dialect of Yul
|
|
with EVM 1.0 as target.
|
|
|
|
|
|
Informal Description of Yul
|
|
===========================
|
|
|
|
In the following, we will talk about each individual aspect
|
|
of the Yul language. In examples, we will use the default EVM dialect.
|
|
|
|
Syntax
|
|
------
|
|
|
|
Yul parses comments, literals and identifiers in the same way as Solidity,
|
|
so you can e.g. use ``//`` and ``/* */`` to denote comments.
|
|
There is one exception: Identifiers in Yul can contain dots: ``.``.
|
|
|
|
Yul can specify "objects" that consist of code, data and sub-objects.
|
|
Please see :ref:`Yul Objects <yul-object>` below for details on that.
|
|
In this section, we are only concerned with the code part of such an object.
|
|
This code part always consists of a curly-braces
|
|
delimited block. Most tools support specifying just a code block
|
|
where an object is expected.
|
|
|
|
Inside a code block, the following elements can be used
|
|
(see the later sections for more details):
|
|
|
|
- literals, i.e. ``0x123``, ``42`` or ``"abc"`` (strings up to 32 characters)
|
|
- calls to builtin functions, e.g. ``add(1, mload(0))``
|
|
- variable declarations, e.g. ``let x := 7``, ``let x := add(y, 3)`` or ``let x`` (initial value of 0 is assigned)
|
|
- identifiers (variables), e.g. ``add(3, x)``
|
|
- assignments, e.g. ``x := add(y, 3)``
|
|
- blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }``
|
|
- if statements, e.g. ``if lt(a, b) { sstore(0, 1) }``
|
|
- switch statements, e.g. ``switch mload(0) case 0 { revert() } default { mstore(0, 1) }``
|
|
- for loops, e.g. ``for { let i := 0} lt(i, 10) { i := add(i, 1) } { mstore(i, 7) }``
|
|
- function definitions, e.g. ``function f(a, b) -> c { c := add(a, b) }```
|
|
|
|
Multiple syntactical elements can follow each other simply separated by
|
|
whitespace, i.e. there is no terminating ``;`` or newline required.
|
|
|
|
Literals
|
|
--------
|
|
|
|
You can use integer constants in decimal or hexadecimal notation.
|
|
When compiling for the EVM, this will be translated into an
|
|
appropriate ``PUSHi`` instruction. In the following example,
|
|
``3`` and ``2`` are added resulting in 5 and then the
|
|
bitwise ``and`` with the string "abc" is computed.
|
|
The final value is assigned to a local variable called ``x``.
|
|
Strings are stored left-aligned and cannot be longer than 32 bytes.
|
|
|
|
.. code::
|
|
|
|
let x := and("abc", add(3, 2))
|
|
|
|
Unless it is the default type, the type of a literal
|
|
has to be specified after a colon:
|
|
|
|
.. code::
|
|
|
|
let x := and("abc":uint32, add(3:uint256, 2:uint256))
|
|
|
|
|
|
Function Calls
|
|
--------------
|
|
|
|
Both built-in and user-defined functions (see below) can be called
|
|
in the same way as shown in the previous example.
|
|
If the function returns a single value, it can be directly used
|
|
inside an expression again. If it returns multiple values,
|
|
they have to be assigned to local variables.
|
|
|
|
.. code::
|
|
|
|
mstore(0x80, add(mload(0x80), 3))
|
|
// Here, the user-defined function `f` returns
|
|
// two values. The definition of the function
|
|
// is missing from the example.
|
|
let x, y := f(1, mload(0))
|
|
|
|
For built-in functions of the EVM, functional expressions
|
|
can be directly translated to a stream of opcodes:
|
|
You just read the expression from right to left to obtain the
|
|
opcodes. In the case of the first line in the example, this
|
|
is ``PUSH1 3 PUSH1 0x80 MLOAD ADD PUSH1 0x80 MSTORE``.
|
|
|
|
For calls to user-defined functions, the arguments are also
|
|
put on the stack from right to left and this is the order
|
|
in which argument lists are evaluated. The return values,
|
|
though, are expected on the stack from left to right,
|
|
i.e. in this example, ``y`` is on top of the stack and ``x``
|
|
is below it.
|
|
|
|
Variable Declarations
|
|
---------------------
|
|
|
|
You can use the ``let`` keyword to declare variables.
|
|
A variable is only visible inside the
|
|
``{...}``-block it was defined in. When compiling to the EVM,
|
|
a new stack slot is created that is reserved
|
|
for the variable and automatically removed again when the end of the block
|
|
is reached. You can provide an initial value for the variable.
|
|
If you do not provide a value, the variable will be initialized to zero.
|
|
|
|
Since variables are stored on the stack, they do not directly
|
|
influence memory or storage, but they can be used as pointers
|
|
to memory or storage locations in the built-in functions
|
|
``mstore``, ``mload``, ``sstore`` and ``sload``.
|
|
Future dialects migh introduce specific types for such pointers.
|
|
|
|
When a variable is referenced, its current value is copied.
|
|
For the EVM, this translates to a ``DUP`` instruction.
|
|
|
|
.. code::
|
|
|
|
{
|
|
let zero := 0
|
|
let v := calldataload(zero)
|
|
{
|
|
let y := add(sload(v), 1)
|
|
v := y
|
|
} // y is "deallocated" here
|
|
sstore(v, zero)
|
|
} // v and zero are "deallocated" here
|
|
|
|
|
|
If the declared variable should have a type different from the default type,
|
|
you denote that following a colon. You can also declare multiple
|
|
variables in one statement when you assign from a function call
|
|
that returns multiple values.
|
|
|
|
.. code::
|
|
|
|
{
|
|
let zero:uint32 := 0:uint32
|
|
let v:uint256, t:uint32 := f()
|
|
let x, y := g()
|
|
}
|
|
|
|
Depending on the optimiser settings, the compiler can free the stack slots
|
|
already after the variable has been used for
|
|
the last time, even though it is still in scope.
|
|
|
|
|
|
Assignments
|
|
-----------
|
|
|
|
Variables can be assigned to after their definition using the
|
|
``:=`` operator. It is possible to assign multiple
|
|
variables at the same time. For this, the number and types of the
|
|
values have to match.
|
|
If you want to assign the values returned from a function that has
|
|
multiple return parameters, you have to provide multiple variables.
|
|
|
|
.. code::
|
|
|
|
let v := 0
|
|
// re-assign v
|
|
v := 2
|
|
let t := add(v, 2)
|
|
function f() -> a, b { }
|
|
// assign multiple values
|
|
v, t := f()
|
|
|
|
|
|
If
|
|
--
|
|
|
|
The if statement can be used for conditionally executing code.
|
|
No "else" block can be defined. Consider using "switch" instead (see below) if
|
|
you need multiple alternatives.
|
|
|
|
.. code::
|
|
|
|
if eq(value, 0) { revert(0, 0) }
|
|
|
|
The curly braces for the body are required.
|
|
|
|
Switch
|
|
------
|
|
|
|
You can use a switch statement as an extended version of the if statement.
|
|
It takes the value of an expression and compares it to several literal constants.
|
|
The branch corresponding to the matching constant is taken.
|
|
Contrary to other programming languages, for safety reasons, control flow does
|
|
not continue from one case to the next. There can be a fallback or default
|
|
case called ``default`` which is taken if none of the literal constants matches.
|
|
|
|
.. code::
|
|
|
|
{
|
|
let x := 0
|
|
switch calldataload(4)
|
|
case 0 {
|
|
x := calldataload(0x24)
|
|
}
|
|
default {
|
|
x := calldataload(0x44)
|
|
}
|
|
sstore(0, div(x, 2))
|
|
}
|
|
|
|
The list of cases is not enclosed by curly braces, but the body of a
|
|
case does require them.
|
|
|
|
Loops
|
|
-----
|
|
|
|
Yul supports for-loops which consist of
|
|
a header containing an initializing part, a condition, a post-iteration
|
|
part and a body. The condition has to be an expression, while
|
|
the other three are blocks. If the initializing part
|
|
declares any variables at the top level, the scope of these variables extends to all other
|
|
parts of the loop.
|
|
|
|
The ``break`` and ``continue`` statements can be used in the body to exit the loop
|
|
or skip to the post-part, respectively.
|
|
|
|
The following example computes the sum of an area in memory.
|
|
|
|
.. code::
|
|
|
|
{
|
|
let x := 0
|
|
for { let i := 0 } lt(i, 0x100) { i := add(i, 0x20) } {
|
|
x := add(x, mload(i))
|
|
}
|
|
}
|
|
|
|
For loops can also be used as a replacement for while loops:
|
|
Simply leave the initialization and post-iteration parts empty.
|
|
|
|
.. code::
|
|
|
|
{
|
|
let x := 0
|
|
let i := 0
|
|
for { } lt(i, 0x100) { } { // while(i < 0x100)
|
|
x := add(x, mload(i))
|
|
i := add(i, 0x20)
|
|
}
|
|
}
|
|
|
|
Function Declarations
|
|
---------------------
|
|
|
|
Yul allows the definition of functions. These should not be confused with functions
|
|
in Solidity since they are never part of an external interface of a contract and
|
|
are part of a namespace separate from the one for Solidity functions.
|
|
|
|
For the EVM, Yul functions take their
|
|
arguments (and a return PC) from the stack and also put the results onto the
|
|
stack. User-defined functions and built-in functions are called in exactly the same way.
|
|
|
|
Functions can be defined anywhere and are visible in the block they are
|
|
declared in. Inside a function, you cannot access local variables
|
|
defined outside of that function.
|
|
|
|
Functions declare parameters and return variables, similar to Solidity.
|
|
To return a value, you assign it to the return variable(s).
|
|
|
|
If you call a function that returns multiple values, you have to assign
|
|
them to multiple variables using ``a, b := f(x)`` or ``let a, b := f(x)``.
|
|
|
|
The ``leave`` statement can be used to exit the current function. It
|
|
works like the ``return`` statement in other languages just that it does
|
|
not take a value to return, it just exits the functions and the function
|
|
will return whatever values are currently assigned to the return variable(s).
|
|
|
|
Note that the EVM dialect has a built-in function called ``return`` that
|
|
quits the full execution context (internal message call) and not just
|
|
the current yul function.
|
|
|
|
The following example implements the power function by square-and-multiply.
|
|
|
|
.. code::
|
|
|
|
{
|
|
function power(base, exponent) -> result {
|
|
switch exponent
|
|
case 0 { result := 1 }
|
|
case 1 { result := base }
|
|
default {
|
|
result := power(mul(base, base), div(exponent, 2))
|
|
switch mod(exponent, 2)
|
|
case 1 { result := mul(base, result) }
|
|
}
|
|
}
|
|
}
|
|
|
|
Specification of Yul
|
|
====================
|
|
|
|
This chapter describes Yul code formally. Yul code is usually placed inside Yul objects,
|
|
which are explained in their own chapter.
|
|
|
|
Grammar::
|
|
|
|
Block = '{' Statement* '}'
|
|
Statement =
|
|
Block |
|
|
FunctionDefinition |
|
|
VariableDeclaration |
|
|
Assignment |
|
|
If |
|
|
Expression |
|
|
Switch |
|
|
ForLoop |
|
|
BreakContinue |
|
|
Leave
|
|
FunctionDefinition =
|
|
'function' Identifier '(' TypedIdentifierList? ')'
|
|
( '->' TypedIdentifierList )? Block
|
|
VariableDeclaration =
|
|
'let' TypedIdentifierList ( ':=' Expression )?
|
|
Assignment =
|
|
IdentifierList ':=' Expression
|
|
Expression =
|
|
FunctionCall | Identifier | Literal
|
|
If =
|
|
'if' Expression Block
|
|
Switch =
|
|
'switch' Expression ( Case+ Default? | Default )
|
|
Case =
|
|
'case' Literal Block
|
|
Default =
|
|
'default' Block
|
|
ForLoop =
|
|
'for' Block Expression Block Block
|
|
BreakContinue =
|
|
'break' | 'continue'
|
|
Leave = 'leave'
|
|
FunctionCall =
|
|
Identifier '(' ( Expression ( ',' Expression )* )? ')'
|
|
Identifier = [a-zA-Z_$] [a-zA-Z_$0-9.]*
|
|
IdentifierList = Identifier ( ',' Identifier)*
|
|
TypeName = Identifier
|
|
TypedIdentifierList = Identifier ( ':' TypeName )? ( ',' Identifier ( ':' TypeName )? )*
|
|
Literal =
|
|
(NumberLiteral | StringLiteral | HexLiteral | TrueLiteral | FalseLiteral) ( ':' TypeName )?
|
|
NumberLiteral = HexNumber | DecimalNumber
|
|
HexLiteral = 'hex' ('"' ([0-9a-fA-F]{2})* '"' | '\'' ([0-9a-fA-F]{2})* '\'')
|
|
StringLiteral = '"' ([^"\r\n\\] | '\\' .)* '"'
|
|
TrueLiteral = 'true'
|
|
FalseLiteral = 'false'
|
|
HexNumber = '0x' [0-9a-fA-F]+
|
|
DecimalNumber = [0-9]+
|
|
|
|
|
|
Restrictions on the Grammar
|
|
---------------------------
|
|
|
|
Apart from those directly imposed by the grammar, the following
|
|
restrictions apply:
|
|
|
|
Switches must have at least one case (including the default case).
|
|
All case values need to have the same type and distinct values.
|
|
If all possible values of the expression type are covered, a default case is
|
|
not allowed (i.e. a switch with a ``bool`` expression that has both a
|
|
true and a false case do not allow a default case).
|
|
|
|
Every expression evaluates to zero or more values. Identifiers and Literals
|
|
evaluate to exactly
|
|
one value and function calls evaluate to a number of values equal to the
|
|
number of return variables of the function called.
|
|
|
|
In variable declarations and assignments, the right-hand-side expression
|
|
(if present) has to evaluate to a number of values equal to the number of
|
|
variables on the left-hand-side.
|
|
This is the only situation where an expression evaluating
|
|
to more than one value is allowed.
|
|
|
|
Expressions that are also statements (i.e. at the block level) have to
|
|
evaluate to zero values.
|
|
|
|
In all other situations, expressions have to evaluate to exactly one value.
|
|
|
|
The ``continue`` and ``break`` statements can only be used inside loop bodies
|
|
and have to be in the same function as the loop (or both have to be at the
|
|
top level). The ``continue`` and ``break`` statements cannot be used
|
|
in other parts of a loop, not even when it is scoped inside a second loop's body.
|
|
|
|
The condition part of the for-loop has to evaluate to exactly one value.
|
|
|
|
The ``leave`` statement can only be used inside a function.
|
|
|
|
Functions cannot be defined anywhere inside for loop init blocks.
|
|
|
|
Literals cannot be larger than the their type. The largest type defined is 256-bit wide.
|
|
|
|
During assignments and function calls, the types of the respective values have to match.
|
|
There is no implicit type conversion. Type conversion in general can only be achieved
|
|
if the dialect provides an appropriate built-in function that takes a value of one
|
|
type and returns a value of a different type.
|
|
|
|
Scoping Rules
|
|
-------------
|
|
|
|
Scopes in Yul are tied to Blocks (exceptions are functions and the for loop
|
|
as explained below) and all declarations
|
|
(``FunctionDefinition``, ``VariableDeclaration``)
|
|
introduce new identifiers into these scopes.
|
|
|
|
Identifiers are visible in
|
|
the block they are defined in (including all sub-nodes and sub-blocks).
|
|
|
|
As an exception, the scope of the "init" part of the or-loop
|
|
(the first block) extends across all other parts of the for loop.
|
|
This means that variables declared in the init part (but not inside a
|
|
block inside the init part) are visible in all other parts of the for-loop.
|
|
|
|
Identifiers declared in the other parts of the for loop respect the regular
|
|
syntactical scoping rules.
|
|
|
|
This means a for-loop of the form ``for { I... } C { P... } { B... }`` is equivalent
|
|
to ``{ I... for {} C { P... } { B... } }``.
|
|
|
|
|
|
The parameters and return parameters of functions are visible in the
|
|
function body and their names have to be distinct.
|
|
|
|
Variables can only be referenced after their declaration. In particular,
|
|
variables cannot be referenced in the right hand side of their own variable
|
|
declaration.
|
|
Functions can be referenced already before their declaration (if they are visible).
|
|
|
|
Shadowing is disallowed, i.e. you cannot declare an identifier at a point
|
|
where another identifier with the same name is also visible, even if it is
|
|
not accessible.
|
|
|
|
Inside functions, it is not possible to access a variable that was declared
|
|
outside of that function.
|
|
|
|
Formal Specification
|
|
--------------------
|
|
|
|
We formally specify Yul by providing an evaluation function E overloaded
|
|
on the various nodes of the AST. As builtin functions can have side effects,
|
|
E takes two state objects and the AST node and returns two new
|
|
state objects and a variable number of other values.
|
|
The two state objects are the global state object
|
|
(which in the context of the EVM is the memory, storage and state of the
|
|
blockchain) and the local state object (the state of local variables, i.e. a
|
|
segment of the stack in the EVM).
|
|
|
|
If the AST node is a statement, E returns the two state objects and a "mode",
|
|
which is used for the ``break``, ``continue`` and ``leave`` statements.
|
|
If the AST node is an expression, E returns the two state objects and
|
|
as many values as the expression evaluates to.
|
|
|
|
|
|
The exact nature of the global state is unspecified for this high level
|
|
description. The local state ``L`` is a mapping of identifiers ``i`` to values ``v``,
|
|
denoted as ``L[i] = v``.
|
|
|
|
For an identifier ``v``, let ``$v`` be the name of the identifier.
|
|
|
|
We will use a destructuring notation for the AST nodes.
|
|
|
|
.. code::
|
|
|
|
E(G, L, <{St1, ..., Stn}>: Block) =
|
|
let G1, L1, mode = E(G, L, St1, ..., Stn)
|
|
let L2 be a restriction of L1 to the identifiers of L
|
|
G1, L2, mode
|
|
E(G, L, St1, ..., Stn: Statement) =
|
|
if n is zero:
|
|
G, L, regular
|
|
else:
|
|
let G1, L1, mode = E(G, L, St1)
|
|
if mode is regular then
|
|
E(G1, L1, St2, ..., Stn)
|
|
otherwise
|
|
G1, L1, mode
|
|
E(G, L, FunctionDefinition) =
|
|
G, L, regular
|
|
E(G, L, <let var_1, ..., var_n := rhs>: VariableDeclaration) =
|
|
E(G, L, <var_1, ..., var_n := rhs>: Assignment)
|
|
E(G, L, <let var_1, ..., var_n>: VariableDeclaration) =
|
|
let L1 be a copy of L where L1[$var_i] = 0 for i = 1, ..., n
|
|
G, L1, regular
|
|
E(G, L, <var_1, ..., var_n := rhs>: Assignment) =
|
|
let G1, L1, v1, ..., vn = E(G, L, rhs)
|
|
let L2 be a copy of L1 where L2[$var_i] = vi for i = 1, ..., n
|
|
G, L2, regular
|
|
E(G, L, <for { i1, ..., in } condition post body>: ForLoop) =
|
|
if n >= 1:
|
|
let G1, L, mode = E(G, L, i1, ..., in)
|
|
// mode has to be regular or leave due to the syntactic restrictions
|
|
if mode is leave then
|
|
G1, L1 restricted to variables of L, leave
|
|
otherwise
|
|
let G2, L2, mode = E(G1, L1, for {} condition post body)
|
|
G2, L2 restricted to variables of L, mode
|
|
else:
|
|
let G1, L1, v = E(G, L, condition)
|
|
if v is false:
|
|
G1, L1, regular
|
|
else:
|
|
let G2, L2, mode = E(G1, L, body)
|
|
if mode is break:
|
|
G2, L2, regular
|
|
otherwise if mode is leave:
|
|
G2, L2, leave
|
|
else:
|
|
G3, L3, mode = E(G2, L2, post)
|
|
if mode is leave:
|
|
G2, L3, leave
|
|
otherwise
|
|
E(G3, L3, for {} condition post body)
|
|
E(G, L, break: BreakContinue) =
|
|
G, L, break
|
|
E(G, L, continue: BreakContinue) =
|
|
G, L, continue
|
|
E(G, L, leave: Leave) =
|
|
G, L, leave
|
|
E(G, L, <if condition body>: If) =
|
|
let G0, L0, v = E(G, L, condition)
|
|
if v is true:
|
|
E(G0, L0, body)
|
|
else:
|
|
G0, L0, regular
|
|
E(G, L, <switch condition case l1:t1 st1 ... case ln:tn stn>: Switch) =
|
|
E(G, L, switch condition case l1:t1 st1 ... case ln:tn stn default {})
|
|
E(G, L, <switch condition case l1:t1 st1 ... case ln:tn stn default st'>: Switch) =
|
|
let G0, L0, v = E(G, L, condition)
|
|
// i = 1 .. n
|
|
// Evaluate literals, context doesn't matter
|
|
let _, _, v1 = E(G0, L0, l1)
|
|
...
|
|
let _, _, vn = E(G0, L0, ln)
|
|
if there exists smallest i such that vi = v:
|
|
E(G0, L0, sti)
|
|
else:
|
|
E(G0, L0, st')
|
|
|
|
E(G, L, <name>: Identifier) =
|
|
G, L, L[$name]
|
|
E(G, L, <fname(arg1, ..., argn)>: FunctionCall) =
|
|
G1, L1, vn = E(G, L, argn)
|
|
...
|
|
G(n-1), L(n-1), v2 = E(G(n-2), L(n-2), arg2)
|
|
Gn, Ln, v1 = E(G(n-1), L(n-1), arg1)
|
|
Let <function fname (param1, ..., paramn) -> ret1, ..., retm block>
|
|
be the function of name $fname visible at the point of the call.
|
|
Let L' be a new local state such that
|
|
L'[$parami] = vi and L'[$reti] = 0 for all i.
|
|
Let G'', L'', mode = E(Gn, L', block)
|
|
G'', Ln, L''[$ret1], ..., L''[$retm]
|
|
E(G, L, l: HexLiteral) = G, L, hexString(l),
|
|
where hexString decodes l from hex and left-aligns it into 32 bytes
|
|
E(G, L, l: StringLiteral) = G, L, utf8EncodeLeftAligned(l),
|
|
where utf8EncodeLeftAligned performs a utf8 encoding of l
|
|
and aligns it left into 32 bytes
|
|
E(G, L, n: HexNumber) = G, L, hex(n)
|
|
where hex is the hexadecimal decoding function
|
|
E(G, L, n: DecimalNumber) = G, L, dec(n),
|
|
where dec is the decimal decoding function
|
|
|
|
.. _opcodes:
|
|
|
|
EVM Dialect
|
|
-----------
|
|
|
|
The default dialect of Yul currently is the EVM dialect for the currently selected version of the EVM.
|
|
with a version of the EVM. The only type available in this dialect
|
|
is ``u256``, the 256-bit native type of the Ethereum Virtual Machine.
|
|
Since it is the default type of this dialect, it can be omitted.
|
|
|
|
The following table lists all builtin functions
|
|
(depending on the EVM version) and provides a short description of the
|
|
semantics of the function / opcode.
|
|
This document does not want to be a full description of the Ethereum virtual machine.
|
|
Please refer to a different document if you are interested in the precise semantics.
|
|
|
|
Opcodes marked with ``-`` do not return a result and all others return exactly one value.
|
|
Opcodes marked with ``F``, ``H``, ``B``, ``C`` or ``I`` are present since Frontier, Homestead,
|
|
Byzantium, Constantinople or Istanbul, respectively.
|
|
|
|
In the following, ``mem[a...b)`` signifies the bytes of memory starting at position ``a`` up to
|
|
but not including position ``b`` and ``storage[p]`` signifies the storage contents at slot ``p``.
|
|
|
|
Since Yul manages local variables and control-flow,
|
|
opcodes that interfere with these features are not available. This includes
|
|
the ``dup`` and ``swap`` instructions as well as ``jump`` instructions, labels and the ``push`` instructions.
|
|
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| Instruction | | | Explanation |
|
|
+=========================+=====+===+=================================================================+
|
|
| stop() + `-` | F | stop execution, identical to return(0, 0) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| add(x, y) | | F | x + y |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| sub(x, y) | | F | x - y |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| mul(x, y) | | F | x * y |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| div(x, y) | | F | x / y or 0 if y == 0 |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| sdiv(x, y) | | F | x / y, for signed numbers in two's complement, 0 if y == 0 |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| mod(x, y) | | F | x % y, 0 if y == 0 |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| smod(x, y) | | F | x % y, for signed numbers in two's complement, 0 if y == 0 |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| exp(x, y) | | F | x to the power of y |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| not(x) | | F | bitwise "not" of x (every bit of x is negated) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| lt(x, y) | | F | 1 if x < y, 0 otherwise |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| gt(x, y) | | F | 1 if x > y, 0 otherwise |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| slt(x, y) | | F | 1 if x < y, 0 otherwise, for signed numbers in two's complement |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| sgt(x, y) | | F | 1 if x > y, 0 otherwise, for signed numbers in two's complement |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| eq(x, y) | | F | 1 if x == y, 0 otherwise |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| iszero(x) | | F | 1 if x == 0, 0 otherwise |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| and(x, y) | | F | bitwise "and" of x and y |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| or(x, y) | | F | bitwise "or" of x and y |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| xor(x, y) | | F | bitwise "xor" of x and y |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| byte(n, x) | | F | nth byte of x, where the most significant byte is the 0th byte |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| shl(x, y) | | C | logical shift left y by x bits |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| shr(x, y) | | C | logical shift right y by x bits |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| sar(x, y) | | C | signed arithmetic shift right y by x bits |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| addmod(x, y, m) | | F | (x + y) % m with arbitrary precision arithmetic, 0 if m == 0 |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| mulmod(x, y, m) | | F | (x * y) % m with arbitrary precision arithmetic, 0 if m == 0 |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| signextend(i, x) | | F | sign extend from (i*8+7)th bit counting from least significant |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| keccak256(p, n) | | F | keccak(mem[p...(p+n))) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| pc() | | F | current position in code |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| pop(x) | `-` | F | discard value x |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| mload(p) | | F | mem[p...(p+32)) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| mstore(p, v) | `-` | F | mem[p...(p+32)) := v |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| mstore8(p, v) | `-` | F | mem[p] := v & 0xff (only modifies a single byte) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| sload(p) | | F | storage[p] |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| sstore(p, v) | `-` | F | storage[p] := v |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| msize() | | F | size of memory, i.e. largest accessed memory index |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| gas() | | F | gas still available to execution |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| address() | | F | address of the current contract / execution context |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| balance(a) | | F | wei balance at address a |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| selfbalance() | | I | equivalent to balance(address()), but cheaper |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| caller() | | F | call sender (excluding ``delegatecall``) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| callvalue() | | F | wei sent together with the current call |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| calldataload(p) | | F | call data starting from position p (32 bytes) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| calldatasize() | | F | size of call data in bytes |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| calldatacopy(t, f, s) | `-` | F | copy s bytes from calldata at position f to mem at position t |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| codesize() | | F | size of the code of the current contract / execution context |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| codecopy(t, f, s) | `-` | F | copy s bytes from code at position f to mem at position t |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| extcodesize(a) | | F | size of the code at address a |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| extcodecopy(a, t, f, s) | `-` | F | like codecopy(t, f, s) but take code at address a |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| returndatasize() | | B | size of the last returndata |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| returndatacopy(t, f, s) | `-` | B | copy s bytes from returndata at position f to mem at position t |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| extcodehash(a) | | C | code hash of address a |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| create(v, p, n) | | F | create new contract with code mem[p...(p+n)) and send v wei |
|
|
| | | | and return the new address |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| create2(v, p, n, s) | | C | create new contract with code mem[p...(p+n)) at address |
|
|
| | | | keccak256(0xff . this . s . keccak256(mem[p...(p+n))) |
|
|
| | | | and send v wei and return the new address, where ``0xff`` is a |
|
|
| | | | 1 byte value, ``this`` is the current contract's address |
|
|
| | | | as a 20 byte value and ``s`` is a big-endian 256-bit value |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| call(g, a, v, in, | | F | call contract at address a with input mem[in...(in+insize)) |
|
|
| insize, out, outsize) | | | providing g gas and v wei and output area |
|
|
| | | | mem[out...(out+outsize)) returning 0 on error (eg. out of gas) |
|
|
| | | | and 1 on success |
|
|
| | | | :ref:`See more <yul-call-return-area>` |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| callcode(g, a, v, in, | | F | identical to ``call`` but only use the code from a and stay |
|
|
| insize, out, outsize) | | | in the context of the current contract otherwise |
|
|
| | | | :ref:`See more <yul-call-return-area>` |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| delegatecall(g, a, in, | | H | identical to ``callcode`` but also keep ``caller`` |
|
|
| insize, out, outsize) | | | and ``callvalue`` |
|
|
| | | | :ref:`See more <yul-call-return-area>` |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| staticcall(g, a, in, | | B | identical to ``call(g, a, 0, in, insize, out, outsize)`` but do |
|
|
| insize, out, outsize) | | | not allow state modifications |
|
|
| | | | :ref:`See more <yul-call-return-area>` |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| return(p, s) | `-` | F | end execution, return data mem[p...(p+s)) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| revert(p, s) | `-` | B | end execution, revert state changes, return data mem[p...(p+s)) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| selfdestruct(a) | `-` | F | end execution, destroy current contract and send funds to a |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| invalid() | `-` | F | end execution with invalid instruction |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| log0(p, s) | `-` | F | log without topics and data mem[p...(p+s)) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| log1(p, s, t1) | `-` | F | log with topic t1 and data mem[p...(p+s)) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| log2(p, s, t1, t2) | `-` | F | log with topics t1, t2 and data mem[p...(p+s)) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| log3(p, s, t1, t2, t3) | `-` | F | log with topics t1, t2, t3 and data mem[p...(p+s)) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| log4(p, s, t1, t2, t3, | `-` | F | log with topics t1, t2, t3, t4 and data mem[p...(p+s)) |
|
|
| t4) | | | |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| chainid() | | I | ID of the executing chain (EIP 1344) |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| origin() | | F | transaction sender |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| gasprice() | | F | gas price of the transaction |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| blockhash(b) | | F | hash of block nr b - only for last 256 blocks excluding current |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| coinbase() | | F | current mining beneficiary |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| timestamp() | | F | timestamp of the current block in seconds since the epoch |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| number() | | F | current block number |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| difficulty() | | F | difficulty of the current block |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
| gaslimit() | | F | block gas limit of the current block |
|
|
+-------------------------+-----+---+-----------------------------------------------------------------+
|
|
|
|
There are three additional functions, ``datasize(x)``, ``dataoffset(x)`` and ``datacopy(t, f, l)``,
|
|
which are used to access other parts of a Yul object.
|
|
|
|
``datasize`` and ``dataoffset`` can only take string literals (the names of other objects)
|
|
as arguments and return the size and offset in the data area, respectively.
|
|
For the EVM, the ``datacopy`` function is equivalent to ``codecopy``.
|
|
|
|
.. _yul-call-return-area:
|
|
|
|
.. note::
|
|
The ``call*`` instructions use the ``out`` and ``outsize`` parameters to define an area in memory where
|
|
the return data is placed. This area is written to depending on how many bytes the called contract returns.
|
|
If it returns more data, only the first ``outsize`` bytes are written. You can access the rest of the data
|
|
using the ``returndatacopy`` opcode. If it returns less data, then the remaining bytes are not touched at all.
|
|
You need to use the ``returndatasize`` opcode to check which part of this memory area contains the return data.
|
|
The remaining bytes will retain their values as of before the call. If the call fails (it returns ``0``),
|
|
nothing is written to that area, but you can still retrieve the failure data using ``returndatacopy``.
|
|
|
|
.. _yul-object:
|
|
|
|
Specification of Yul Object
|
|
===========================
|
|
|
|
Yul objects are used to group named code and data sections.
|
|
The functions ``datasize``, ``dataoffset`` and ``datacopy``
|
|
can be used to access these sections from within code.
|
|
Hex strings can be used to specify data in hex encoding,
|
|
regular strings in native encoding. For code,
|
|
``datacopy`` will access its assembled binary representation.
|
|
|
|
Grammar::
|
|
|
|
Object = 'object' StringLiteral '{' Code ( Object | Data )* '}'
|
|
Code = 'code' Block
|
|
Data = 'data' StringLiteral ( HexLiteral | StringLiteral )
|
|
HexLiteral = 'hex' ('"' ([0-9a-fA-F]{2})* '"' | '\'' ([0-9a-fA-F]{2})* '\'')
|
|
StringLiteral = '"' ([^"\r\n\\] | '\\' .)* '"'
|
|
|
|
Above, ``Block`` refers to ``Block`` in the Yul code grammar explained in the previous chapter.
|
|
|
|
An example Yul Object is shown below:
|
|
|
|
.. code::
|
|
|
|
// A contract consists of a single object with sub-objects representing
|
|
// the code to be deployed or other contracts it can create.
|
|
// The single "code" node is the executable code of the object.
|
|
// Every (other) named object or data section is serialized and
|
|
// made accessible to the special built-in functions datacopy / dataoffset / datasize
|
|
// The current object, sub-objects and data items inside the current object
|
|
// are in scope.
|
|
object "Contract1" {
|
|
// This is the constructor code of the contract.
|
|
code {
|
|
function allocate(size) -> ptr {
|
|
ptr := mload(0x40)
|
|
if iszero(ptr) { ptr := 0x60 }
|
|
mstore(0x40, add(ptr, size))
|
|
}
|
|
|
|
// first create "Contract2"
|
|
let size := datasize("Contract2")
|
|
let offset := allocate(size)
|
|
// This will turn into codecopy for EVM
|
|
datacopy(offset, dataoffset("Contract2"), size)
|
|
// constructor parameter is a single number 0x1234
|
|
mstore(add(offset, size), 0x1234)
|
|
pop(create(offset, add(size, 32), 0))
|
|
|
|
// now return the runtime object (the currently
|
|
// executing code is the constructor code)
|
|
size := datasize("runtime")
|
|
offset := allocate(size)
|
|
// This will turn into a memory->memory copy for eWASM and
|
|
// a codecopy for EVM
|
|
datacopy(offset, dataoffset("runtime"), size)
|
|
return(offset, size)
|
|
}
|
|
|
|
data "Table2" hex"4123"
|
|
|
|
object "runtime" {
|
|
code {
|
|
function allocate(size) -> ptr {
|
|
ptr := mload(0x40)
|
|
if iszero(ptr) { ptr := 0x60 }
|
|
mstore(0x40, add(ptr, size))
|
|
}
|
|
|
|
// runtime code
|
|
|
|
mstore(0, "Hello, World!")
|
|
return(0, 0x20)
|
|
}
|
|
}
|
|
|
|
// Embedded object. Use case is that the outside is a factory contract,
|
|
// and Contract2 is the code to be created by the factory
|
|
object "Contract2" {
|
|
code {
|
|
// code here ...
|
|
}
|
|
|
|
object "runtime" {
|
|
code {
|
|
// code here ...
|
|
}
|
|
}
|
|
|
|
data "Table1" hex"4123"
|
|
}
|
|
}
|
|
|
|
Yul Optimizer
|
|
=============
|
|
|
|
The Yul optimizer operates on Yul code and uses the same language for input, output and
|
|
intermediate states. This allows for easy debugging and verification of the optimizer.
|
|
|
|
Please see the
|
|
`documentation in the source code <https://github.com/ethereum/solidity/blob/develop/libyul/optimiser/README.md>`_
|
|
for more details about its internals.
|
|
|
|
If you want to use Solidity in stand-alone Yul mode, you activate the optimizer using ``--optimize``:
|
|
|
|
::
|
|
|
|
solc --strict-assembly --optimize
|
|
|
|
In Solidity mode, the Yul optimizer is activated together with the regular optimizer.
|