mirror of
https://github.com/ethereum/solidity
synced 2023-10-03 13:03:40 +00:00
Move explanatory sections and other small changes.
This commit is contained in:
parent
e92af89ec8
commit
ceac5c5a0c
@ -94,177 +94,11 @@ you really know what you are doing.
|
||||
}
|
||||
}
|
||||
|
||||
Standalone Assembly
|
||||
===================
|
||||
|
||||
This assembly language tries to achieve several goals:
|
||||
|
||||
1. Programs written in it should be readable, even if the code is generated by a compiler from Solidity.
|
||||
2. The translation from assembly to bytecode should contain as few "surprises" as possible.
|
||||
3. Control flow should be easy to detect to help in formal verification and optimization.
|
||||
|
||||
In order to achieve the first and last goal, assembly provides high-level constructs
|
||||
like ``for`` loops, ``switch`` statements and function calls. It should be possible
|
||||
to write assembly programs that do not make use of explicit ``SWAP``, ``DUP``,
|
||||
``JUMP`` and ``JUMPI`` statements, because the first two obfuscate the data flow
|
||||
and the last two obfuscate control flow. Furthermore, functional statements of
|
||||
the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like
|
||||
``7 y x add mul`` because in the first form, it is much easier to see which
|
||||
operand is used for which opcode.
|
||||
|
||||
The second goal is achieved by introducing a desugaring phase that only removes
|
||||
the higher level constructs in a very regular way and still allows inspecting
|
||||
the generated low-level assembly code. The only non-local operation performed
|
||||
by the assembler is name lookup of user-defined identifiers (functions, variables, ...),
|
||||
which follow very simple and regular scoping rules and cleanup of local variables from the stack.
|
||||
|
||||
Scoping: An identifier that is declared (label, variable, function, assembly)
|
||||
is only visible in the block where it was declared (including nested blocks
|
||||
inside the current block). It is not legal to access local variables across
|
||||
function borders, even if they would be in scope. Shadowing is allowed, but
|
||||
two identifiers with the same name cannot be declared in the same block.
|
||||
Local variables cannot be accessed before they were declared, but labels,
|
||||
functions and assemblies can. Assemblies are special blocks that are used
|
||||
for e.g. returning runtime code or creating contracts. No identifier from an
|
||||
outer assembly is visible in a sub-assembly.
|
||||
|
||||
If control flow passes over the end of a block, pop instructions are inserted
|
||||
that match the number of local variables declared in that block, unless the
|
||||
``}`` is directly preceded by an opcode that does not have a continuing control
|
||||
flow path. Whenever a local variable is referenced, the code generator needs
|
||||
to know its current relative position in the stack and thus it needs to
|
||||
keep track of the current so-called stack height.
|
||||
At the end of a block, this implicit stack height is always reduced by the number
|
||||
of local variables whether ther is a continuing control flow or not.
|
||||
|
||||
This means that the stack height before and after the block should be the same.
|
||||
If this is not the case, a warning is issued,
|
||||
unless the last instruction in the block did not have a continuing control flow path.
|
||||
|
||||
Why do we use higher-level constructs like ``switch``, ``for`` and functions:
|
||||
|
||||
Using ``switch``, ``for`` and functions, it should be possible to write
|
||||
complex code without using ``jump`` or ``jumpi`` manually. This makes it much
|
||||
easier to analyze the control flow, which allows for improved formal
|
||||
verification and optimization.
|
||||
|
||||
Furthermore, if manual jumps are allowed, computing the stack height is rather complicated.
|
||||
The position of all local variables on the stack needs to be known, otherwise
|
||||
neither references to local variables nor removing local variables automatically
|
||||
from the stack at the end of a block will work properly. Because of that,
|
||||
every label that is preceded by an instruction that ends or diverts control flow
|
||||
should be annotated with the current stack layout. This annotation is performed
|
||||
automatically during the desugaring phase.
|
||||
|
||||
Example:
|
||||
|
||||
We will follow an example compilation from Solidity to desugared assembly.
|
||||
We consider the runtime bytecode of the following Solidity program::
|
||||
|
||||
contract C {
|
||||
function f(uint x) returns (uint y) {
|
||||
y = 1;
|
||||
for (uint i = 0; i < x; i++)
|
||||
y = 2 * y;
|
||||
}
|
||||
}
|
||||
|
||||
The following assembly will be generated::
|
||||
|
||||
{
|
||||
mstore(0x40, 0x60) // store the "free memory pointer"
|
||||
// function dispatcher
|
||||
switch div(calldataload(0), exp(2, 226))
|
||||
case 0xb3de648b: {
|
||||
let (r,) = f(calldataload(4))
|
||||
let ret := $allocate(0x20)
|
||||
mstore(ret, r)
|
||||
return(ret, 0x20)
|
||||
}
|
||||
default: { jump(invalidJumpLabel) }
|
||||
// memory allocator
|
||||
function $allocate(size) -> (pos) {
|
||||
pos := mload(0x40)
|
||||
mstore(0x40, add(pos, size))
|
||||
}
|
||||
// the contract function
|
||||
function f(x) -> (y) {
|
||||
y := 1
|
||||
for { let i := 0 } lt(i, x) { i := add(i, 1) } {
|
||||
y := mul(2, y)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
After the desugaring phase it looks as follows::
|
||||
|
||||
{
|
||||
mstore(0x40, 0x60)
|
||||
{
|
||||
let $0 := div(calldataload(0), exp(2, 226))
|
||||
jumpi($case1, eq($0, 0xb3de648b))
|
||||
jump($caseDefault)
|
||||
$case1:
|
||||
{
|
||||
// the function call - we put return label and arguments on the stack
|
||||
$ret1 calldataload(4) jump($fun_f)
|
||||
$ret1 [r]: // a label with a [...]-annotation resets the stack height
|
||||
// to "current block + number of local variables". It also
|
||||
// introduces a variable, r:
|
||||
// r is at top of stack, $0 is below (from enclosing block)
|
||||
$ret2 0x20 jump($fun_allocate)
|
||||
$ret2 [ret]: // stack here: $0, r, ret (top)
|
||||
mstore(ret, r)
|
||||
return(ret, 0x20)
|
||||
// although it is useless, the jump is automatically inserted,
|
||||
// since the desugaring process does not analyze control-flow
|
||||
jump($endswitch)
|
||||
}
|
||||
$caseDefault:
|
||||
{
|
||||
jump(invalidJumpLabel)
|
||||
jump($endswitch)
|
||||
}
|
||||
$endswitch:
|
||||
}
|
||||
jump($afterFunction)
|
||||
$fun_allocate:
|
||||
{
|
||||
$start[$retpos, size]:
|
||||
// output variables live in the same scope as the arguments.
|
||||
let pos := 0
|
||||
{
|
||||
pos := mload(0x40)
|
||||
mstore(0x40, add(pos, size))
|
||||
}
|
||||
swap1 pop swap1 jump
|
||||
}
|
||||
$fun_f:
|
||||
{
|
||||
start [$retpos, x]:
|
||||
let y := 0
|
||||
{
|
||||
let i := 0
|
||||
$for_begin:
|
||||
jumpi($for_end, iszero(lt(i, x)))
|
||||
{
|
||||
y := mul(2, y)
|
||||
}
|
||||
$for_continue:
|
||||
{ i := add(i, 1) }
|
||||
jump($for_begin)
|
||||
$for_end:
|
||||
} // Here, a pop instruction is inserted for i
|
||||
swap1 pop swap1 jump
|
||||
}
|
||||
$afterFunction:
|
||||
stop
|
||||
}
|
||||
|
||||
Syntax
|
||||
------
|
||||
|
||||
Inline assembly parses comments, literals and identifiers exactly as Solidity, so you can use the
|
||||
Assembly parses comments, literals and identifiers exactly as Solidity, so you can use the
|
||||
usual ``//`` and ``/* */`` comments. Inline assembly is marked by ``assembly { ... }`` and inside
|
||||
these curly braces, the following can be used (see the later sections for more details)
|
||||
|
||||
@ -273,7 +107,7 @@ these curly braces, the following can be used (see the later sections for more d
|
||||
- opcode in functional style, e.g. ``add(1, mlod(0))``
|
||||
- labels, e.g. ``name:``
|
||||
- variable declarations, e.g. ``let x := 7`` or ``let x := add(y, 3)``
|
||||
- identifiers (externals, labels or assembly-local variables), e.g. ``jump(name)``, ``3 x add``
|
||||
- identifiers (labels or assembly-local variables and externals if used as inline assembly), e.g. ``jump(name)``, ``3 x add``
|
||||
- assignments (in "instruction style"), e.g. ``3 =: x``
|
||||
- assignments in functional style, e.g. ``x := add(y, 3)``
|
||||
- blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }``
|
||||
@ -535,7 +369,7 @@ jumps easier. The following code computes an element in the Fibonacci series.
|
||||
|
||||
Please note that automatically accessing stack variables can only work if the
|
||||
assembler knows the current stack height. This fails to work if the jump source
|
||||
and target have different stack heights. It is still fine to use such jumps,
|
||||
and target have different stack heights. It is still fine to use such jumps, but
|
||||
you should just not access any stack variables (even assembly variables) in that case.
|
||||
|
||||
Furthermore, the stack height analyser goes through the code opcode by opcode
|
||||
@ -593,11 +427,12 @@ Assignments are possible to assembly-local variables and to function-local
|
||||
variables. Take care that when you assign to variables that point to
|
||||
memory or storage, you will only change the pointer and not the data.
|
||||
|
||||
There are two kinds of assignments: Functional-style and instruction-style.
|
||||
There are two kinds of assignments: functional-style and instruction-style.
|
||||
For functional-style assignments (``variable := value``), you need to provide a value in a
|
||||
functional-style expression that results in exactly one stack value
|
||||
and for instruction-style (``=: variable``), the value is just taken from the stack top.
|
||||
For both ways, the colon points to the name of the variable.
|
||||
For both ways, the colon points to the name of the variable. The assignment
|
||||
is performed by replacing the variable's value on the stack by the new value.
|
||||
|
||||
.. code::
|
||||
|
||||
@ -615,7 +450,7 @@ You can use a switch statement as a very basic version of "if/else".
|
||||
It takes the value of an expression and compares it to several constants.
|
||||
The branch corresponding to the matching constant is taken. Contrary to the
|
||||
error-prone behaviour of some programming languages, control flow does
|
||||
not continue from one case to the next. There is a fallback or default
|
||||
not continue from one case to the next. There can be a fallback or default
|
||||
case called ``default``.
|
||||
|
||||
.. code::
|
||||
@ -623,8 +458,12 @@ case called ``default``.
|
||||
assembly {
|
||||
let x := 0
|
||||
switch calldataload(4)
|
||||
case 0: { x := calldataload(0x24) }
|
||||
default: { x := calldataload(0x44) }
|
||||
case 0: {
|
||||
x := calldataload(0x24)
|
||||
}
|
||||
default: {
|
||||
x := calldataload(0x44)
|
||||
}
|
||||
sstore(0, div(x, 2))
|
||||
}
|
||||
|
||||
@ -724,8 +563,175 @@ first slot of the array and then only the array elements follow.
|
||||
please do not rely on that.
|
||||
|
||||
|
||||
Specification
|
||||
=============
|
||||
Standalone Assembly
|
||||
===================
|
||||
|
||||
The assembly language described as inline assembly above can also be used
|
||||
standalone and in fact, the plan is to use it as an intermediate language
|
||||
for the Solidity compiler. In this form, it tries to achieve several goals:
|
||||
|
||||
1. Programs written in it should be readable, even if the code is generated by a compiler from Solidity.
|
||||
2. The translation from assembly to bytecode should contain as few "surprises" as possible.
|
||||
3. Control flow should be easy to detect to help in formal verification and optimization.
|
||||
|
||||
In order to achieve the first and last goal, assembly provides high-level constructs
|
||||
like ``for`` loops, ``switch`` statements and function calls. It should be possible
|
||||
to write assembly programs that do not make use of explicit ``SWAP``, ``DUP``,
|
||||
``JUMP`` and ``JUMPI`` statements, because the first two obfuscate the data flow
|
||||
and the last two obfuscate control flow. Furthermore, functional statements of
|
||||
the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like
|
||||
``7 y x add mul`` because in the first form, it is much easier to see which
|
||||
operand is used for which opcode.
|
||||
|
||||
The second goal is achieved by introducing a desugaring phase that only removes
|
||||
the higher level constructs in a very regular way and still allows inspecting
|
||||
the generated low-level assembly code. The only non-local operation performed
|
||||
by the assembler is name lookup of user-defined identifiers (functions, variables, ...),
|
||||
which follow very simple and regular scoping rules and cleanup of local variables from the stack.
|
||||
|
||||
Scoping: An identifier that is declared (label, variable, function, assembly)
|
||||
is only visible in the block where it was declared (including nested blocks
|
||||
inside the current block). It is not legal to access local variables across
|
||||
function borders, even if they would be in scope. Shadowing is allowed, but
|
||||
two identifiers with the same name cannot be declared in the same block.
|
||||
Local variables cannot be accessed before they were declared, but labels,
|
||||
functions and assemblies can. Assemblies are special blocks that are used
|
||||
for e.g. returning runtime code or creating contracts. No identifier from an
|
||||
outer assembly is visible in a sub-assembly.
|
||||
|
||||
If control flow passes over the end of a block, pop instructions are inserted
|
||||
that match the number of local variables declared in that block, unless the
|
||||
``}`` is directly preceded by an opcode that does not have a continuing control
|
||||
flow path. Whenever a local variable is referenced, the code generator needs
|
||||
to know its current relative position in the stack and thus it needs to
|
||||
keep track of the current so-called stack height.
|
||||
At the end of a block, this implicit stack height is always reduced by the number
|
||||
of local variables whether ther is a continuing control flow or not.
|
||||
|
||||
This means that the stack height before and after the block should be the same.
|
||||
If this is not the case, a warning is issued,
|
||||
unless the last instruction in the block did not have a continuing control flow path.
|
||||
|
||||
Why do we use higher-level constructs like ``switch``, ``for`` and functions:
|
||||
|
||||
Using ``switch``, ``for`` and functions, it should be possible to write
|
||||
complex code without using ``jump`` or ``jumpi`` manually. This makes it much
|
||||
easier to analyze the control flow, which allows for improved formal
|
||||
verification and optimization.
|
||||
|
||||
Furthermore, if manual jumps are allowed, computing the stack height is rather complicated.
|
||||
The position of all local variables on the stack needs to be known, otherwise
|
||||
neither references to local variables nor removing local variables automatically
|
||||
from the stack at the end of a block will work properly. Because of that,
|
||||
every label that is preceded by an instruction that ends or diverts control flow
|
||||
should be annotated with the current stack layout. This annotation is performed
|
||||
automatically during the desugaring phase.
|
||||
|
||||
Example:
|
||||
|
||||
We will follow an example compilation from Solidity to desugared assembly.
|
||||
We consider the runtime bytecode of the following Solidity program::
|
||||
|
||||
contract C {
|
||||
function f(uint x) returns (uint y) {
|
||||
y = 1;
|
||||
for (uint i = 0; i < x; i++)
|
||||
y = 2 * y;
|
||||
}
|
||||
}
|
||||
|
||||
The following assembly will be generated::
|
||||
|
||||
{
|
||||
mstore(0x40, 0x60) // store the "free memory pointer"
|
||||
// function dispatcher
|
||||
switch div(calldataload(0), exp(2, 226))
|
||||
case 0xb3de648b: {
|
||||
let (r) = f(calldataload(4))
|
||||
let ret := $allocate(0x20)
|
||||
mstore(ret, r)
|
||||
return(ret, 0x20)
|
||||
}
|
||||
default: { jump(invalidJumpLabel) }
|
||||
// memory allocator
|
||||
function $allocate(size) -> (pos) {
|
||||
pos := mload(0x40)
|
||||
mstore(0x40, add(pos, size))
|
||||
}
|
||||
// the contract function
|
||||
function f(x) -> (y) {
|
||||
y := 1
|
||||
for { let i := 0 } lt(i, x) { i := add(i, 1) } {
|
||||
y := mul(2, y)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
After the desugaring phase it looks as follows::
|
||||
|
||||
{
|
||||
mstore(0x40, 0x60)
|
||||
{
|
||||
let $0 := div(calldataload(0), exp(2, 226))
|
||||
jumpi($case1, eq($0, 0xb3de648b))
|
||||
jump($caseDefault)
|
||||
$case1:
|
||||
{
|
||||
// the function call - we put return label and arguments on the stack
|
||||
$ret1 calldataload(4) jump($fun_f)
|
||||
$ret1 [r]: // a label with a [...]-annotation resets the stack height
|
||||
// to "current block + number of local variables". It also
|
||||
// introduces a variable, r:
|
||||
// r is at top of stack, $0 is below (from enclosing block)
|
||||
$ret2 0x20 jump($fun_allocate)
|
||||
$ret2 [ret]: // stack here: $0, r, ret (top)
|
||||
mstore(ret, r)
|
||||
return(ret, 0x20)
|
||||
// although it is useless, the jump is automatically inserted,
|
||||
// since the desugaring process does not analyze control-flow
|
||||
jump($endswitch)
|
||||
}
|
||||
$caseDefault:
|
||||
{
|
||||
jump(invalidJumpLabel)
|
||||
jump($endswitch)
|
||||
}
|
||||
$endswitch:
|
||||
}
|
||||
jump($afterFunction)
|
||||
$fun_allocate:
|
||||
{
|
||||
$start[$retpos, size]:
|
||||
// output variables live in the same scope as the arguments.
|
||||
let pos := 0
|
||||
{
|
||||
pos := mload(0x40)
|
||||
mstore(0x40, add(pos, size))
|
||||
}
|
||||
swap1 pop swap1 jump
|
||||
}
|
||||
$fun_f:
|
||||
{
|
||||
start [$retpos, x]:
|
||||
let y := 0
|
||||
{
|
||||
let i := 0
|
||||
$for_begin:
|
||||
jumpi($for_end, iszero(lt(i, x)))
|
||||
{
|
||||
y := mul(2, y)
|
||||
}
|
||||
$for_continue:
|
||||
{ i := add(i, 1) }
|
||||
jump($for_begin)
|
||||
$for_end:
|
||||
} // Here, a pop instruction is inserted for i
|
||||
swap1 pop swap1 jump
|
||||
}
|
||||
$afterFunction:
|
||||
stop
|
||||
}
|
||||
|
||||
|
||||
Assembly happens in four stages:
|
||||
|
||||
@ -734,6 +740,9 @@ Assembly happens in four stages:
|
||||
3. Opcode stream generation
|
||||
4. Bytecode generation
|
||||
|
||||
We will specify steps one to three in a pseudo-formal way. More formal
|
||||
specifications will follow.
|
||||
|
||||
|
||||
Parsing / Grammar
|
||||
-----------------
|
||||
|
Loading…
Reference in New Issue
Block a user