mirror of
https://github.com/ethereum/solidity
synced 2023-10-03 13:03:40 +00:00
Move explanatory sections and other small changes.
This commit is contained in:
parent
e92af89ec8
commit
ceac5c5a0c
@ -94,177 +94,11 @@ you really know what you are doing.
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
Standalone Assembly
|
|
||||||
===================
|
|
||||||
|
|
||||||
This assembly language tries to achieve several goals:
|
|
||||||
|
|
||||||
1. Programs written in it should be readable, even if the code is generated by a compiler from Solidity.
|
|
||||||
2. The translation from assembly to bytecode should contain as few "surprises" as possible.
|
|
||||||
3. Control flow should be easy to detect to help in formal verification and optimization.
|
|
||||||
|
|
||||||
In order to achieve the first and last goal, assembly provides high-level constructs
|
|
||||||
like ``for`` loops, ``switch`` statements and function calls. It should be possible
|
|
||||||
to write assembly programs that do not make use of explicit ``SWAP``, ``DUP``,
|
|
||||||
``JUMP`` and ``JUMPI`` statements, because the first two obfuscate the data flow
|
|
||||||
and the last two obfuscate control flow. Furthermore, functional statements of
|
|
||||||
the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like
|
|
||||||
``7 y x add mul`` because in the first form, it is much easier to see which
|
|
||||||
operand is used for which opcode.
|
|
||||||
|
|
||||||
The second goal is achieved by introducing a desugaring phase that only removes
|
|
||||||
the higher level constructs in a very regular way and still allows inspecting
|
|
||||||
the generated low-level assembly code. The only non-local operation performed
|
|
||||||
by the assembler is name lookup of user-defined identifiers (functions, variables, ...),
|
|
||||||
which follow very simple and regular scoping rules and cleanup of local variables from the stack.
|
|
||||||
|
|
||||||
Scoping: An identifier that is declared (label, variable, function, assembly)
|
|
||||||
is only visible in the block where it was declared (including nested blocks
|
|
||||||
inside the current block). It is not legal to access local variables across
|
|
||||||
function borders, even if they would be in scope. Shadowing is allowed, but
|
|
||||||
two identifiers with the same name cannot be declared in the same block.
|
|
||||||
Local variables cannot be accessed before they were declared, but labels,
|
|
||||||
functions and assemblies can. Assemblies are special blocks that are used
|
|
||||||
for e.g. returning runtime code or creating contracts. No identifier from an
|
|
||||||
outer assembly is visible in a sub-assembly.
|
|
||||||
|
|
||||||
If control flow passes over the end of a block, pop instructions are inserted
|
|
||||||
that match the number of local variables declared in that block, unless the
|
|
||||||
``}`` is directly preceded by an opcode that does not have a continuing control
|
|
||||||
flow path. Whenever a local variable is referenced, the code generator needs
|
|
||||||
to know its current relative position in the stack and thus it needs to
|
|
||||||
keep track of the current so-called stack height.
|
|
||||||
At the end of a block, this implicit stack height is always reduced by the number
|
|
||||||
of local variables whether ther is a continuing control flow or not.
|
|
||||||
|
|
||||||
This means that the stack height before and after the block should be the same.
|
|
||||||
If this is not the case, a warning is issued,
|
|
||||||
unless the last instruction in the block did not have a continuing control flow path.
|
|
||||||
|
|
||||||
Why do we use higher-level constructs like ``switch``, ``for`` and functions:
|
|
||||||
|
|
||||||
Using ``switch``, ``for`` and functions, it should be possible to write
|
|
||||||
complex code without using ``jump`` or ``jumpi`` manually. This makes it much
|
|
||||||
easier to analyze the control flow, which allows for improved formal
|
|
||||||
verification and optimization.
|
|
||||||
|
|
||||||
Furthermore, if manual jumps are allowed, computing the stack height is rather complicated.
|
|
||||||
The position of all local variables on the stack needs to be known, otherwise
|
|
||||||
neither references to local variables nor removing local variables automatically
|
|
||||||
from the stack at the end of a block will work properly. Because of that,
|
|
||||||
every label that is preceded by an instruction that ends or diverts control flow
|
|
||||||
should be annotated with the current stack layout. This annotation is performed
|
|
||||||
automatically during the desugaring phase.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
|
|
||||||
We will follow an example compilation from Solidity to desugared assembly.
|
|
||||||
We consider the runtime bytecode of the following Solidity program::
|
|
||||||
|
|
||||||
contract C {
|
|
||||||
function f(uint x) returns (uint y) {
|
|
||||||
y = 1;
|
|
||||||
for (uint i = 0; i < x; i++)
|
|
||||||
y = 2 * y;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
The following assembly will be generated::
|
|
||||||
|
|
||||||
{
|
|
||||||
mstore(0x40, 0x60) // store the "free memory pointer"
|
|
||||||
// function dispatcher
|
|
||||||
switch div(calldataload(0), exp(2, 226))
|
|
||||||
case 0xb3de648b: {
|
|
||||||
let (r,) = f(calldataload(4))
|
|
||||||
let ret := $allocate(0x20)
|
|
||||||
mstore(ret, r)
|
|
||||||
return(ret, 0x20)
|
|
||||||
}
|
|
||||||
default: { jump(invalidJumpLabel) }
|
|
||||||
// memory allocator
|
|
||||||
function $allocate(size) -> (pos) {
|
|
||||||
pos := mload(0x40)
|
|
||||||
mstore(0x40, add(pos, size))
|
|
||||||
}
|
|
||||||
// the contract function
|
|
||||||
function f(x) -> (y) {
|
|
||||||
y := 1
|
|
||||||
for { let i := 0 } lt(i, x) { i := add(i, 1) } {
|
|
||||||
y := mul(2, y)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
After the desugaring phase it looks as follows::
|
|
||||||
|
|
||||||
{
|
|
||||||
mstore(0x40, 0x60)
|
|
||||||
{
|
|
||||||
let $0 := div(calldataload(0), exp(2, 226))
|
|
||||||
jumpi($case1, eq($0, 0xb3de648b))
|
|
||||||
jump($caseDefault)
|
|
||||||
$case1:
|
|
||||||
{
|
|
||||||
// the function call - we put return label and arguments on the stack
|
|
||||||
$ret1 calldataload(4) jump($fun_f)
|
|
||||||
$ret1 [r]: // a label with a [...]-annotation resets the stack height
|
|
||||||
// to "current block + number of local variables". It also
|
|
||||||
// introduces a variable, r:
|
|
||||||
// r is at top of stack, $0 is below (from enclosing block)
|
|
||||||
$ret2 0x20 jump($fun_allocate)
|
|
||||||
$ret2 [ret]: // stack here: $0, r, ret (top)
|
|
||||||
mstore(ret, r)
|
|
||||||
return(ret, 0x20)
|
|
||||||
// although it is useless, the jump is automatically inserted,
|
|
||||||
// since the desugaring process does not analyze control-flow
|
|
||||||
jump($endswitch)
|
|
||||||
}
|
|
||||||
$caseDefault:
|
|
||||||
{
|
|
||||||
jump(invalidJumpLabel)
|
|
||||||
jump($endswitch)
|
|
||||||
}
|
|
||||||
$endswitch:
|
|
||||||
}
|
|
||||||
jump($afterFunction)
|
|
||||||
$fun_allocate:
|
|
||||||
{
|
|
||||||
$start[$retpos, size]:
|
|
||||||
// output variables live in the same scope as the arguments.
|
|
||||||
let pos := 0
|
|
||||||
{
|
|
||||||
pos := mload(0x40)
|
|
||||||
mstore(0x40, add(pos, size))
|
|
||||||
}
|
|
||||||
swap1 pop swap1 jump
|
|
||||||
}
|
|
||||||
$fun_f:
|
|
||||||
{
|
|
||||||
start [$retpos, x]:
|
|
||||||
let y := 0
|
|
||||||
{
|
|
||||||
let i := 0
|
|
||||||
$for_begin:
|
|
||||||
jumpi($for_end, iszero(lt(i, x)))
|
|
||||||
{
|
|
||||||
y := mul(2, y)
|
|
||||||
}
|
|
||||||
$for_continue:
|
|
||||||
{ i := add(i, 1) }
|
|
||||||
jump($for_begin)
|
|
||||||
$for_end:
|
|
||||||
} // Here, a pop instruction is inserted for i
|
|
||||||
swap1 pop swap1 jump
|
|
||||||
}
|
|
||||||
$afterFunction:
|
|
||||||
stop
|
|
||||||
}
|
|
||||||
|
|
||||||
Syntax
|
Syntax
|
||||||
------
|
------
|
||||||
|
|
||||||
Inline assembly parses comments, literals and identifiers exactly as Solidity, so you can use the
|
Assembly parses comments, literals and identifiers exactly as Solidity, so you can use the
|
||||||
usual ``//`` and ``/* */`` comments. Inline assembly is marked by ``assembly { ... }`` and inside
|
usual ``//`` and ``/* */`` comments. Inline assembly is marked by ``assembly { ... }`` and inside
|
||||||
these curly braces, the following can be used (see the later sections for more details)
|
these curly braces, the following can be used (see the later sections for more details)
|
||||||
|
|
||||||
@ -273,7 +107,7 @@ these curly braces, the following can be used (see the later sections for more d
|
|||||||
- opcode in functional style, e.g. ``add(1, mlod(0))``
|
- opcode in functional style, e.g. ``add(1, mlod(0))``
|
||||||
- labels, e.g. ``name:``
|
- labels, e.g. ``name:``
|
||||||
- variable declarations, e.g. ``let x := 7`` or ``let x := add(y, 3)``
|
- variable declarations, e.g. ``let x := 7`` or ``let x := add(y, 3)``
|
||||||
- identifiers (externals, labels or assembly-local variables), e.g. ``jump(name)``, ``3 x add``
|
- identifiers (labels or assembly-local variables and externals if used as inline assembly), e.g. ``jump(name)``, ``3 x add``
|
||||||
- assignments (in "instruction style"), e.g. ``3 =: x``
|
- assignments (in "instruction style"), e.g. ``3 =: x``
|
||||||
- assignments in functional style, e.g. ``x := add(y, 3)``
|
- assignments in functional style, e.g. ``x := add(y, 3)``
|
||||||
- blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }``
|
- blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }``
|
||||||
@ -535,7 +369,7 @@ jumps easier. The following code computes an element in the Fibonacci series.
|
|||||||
|
|
||||||
Please note that automatically accessing stack variables can only work if the
|
Please note that automatically accessing stack variables can only work if the
|
||||||
assembler knows the current stack height. This fails to work if the jump source
|
assembler knows the current stack height. This fails to work if the jump source
|
||||||
and target have different stack heights. It is still fine to use such jumps,
|
and target have different stack heights. It is still fine to use such jumps, but
|
||||||
you should just not access any stack variables (even assembly variables) in that case.
|
you should just not access any stack variables (even assembly variables) in that case.
|
||||||
|
|
||||||
Furthermore, the stack height analyser goes through the code opcode by opcode
|
Furthermore, the stack height analyser goes through the code opcode by opcode
|
||||||
@ -593,11 +427,12 @@ Assignments are possible to assembly-local variables and to function-local
|
|||||||
variables. Take care that when you assign to variables that point to
|
variables. Take care that when you assign to variables that point to
|
||||||
memory or storage, you will only change the pointer and not the data.
|
memory or storage, you will only change the pointer and not the data.
|
||||||
|
|
||||||
There are two kinds of assignments: Functional-style and instruction-style.
|
There are two kinds of assignments: functional-style and instruction-style.
|
||||||
For functional-style assignments (``variable := value``), you need to provide a value in a
|
For functional-style assignments (``variable := value``), you need to provide a value in a
|
||||||
functional-style expression that results in exactly one stack value
|
functional-style expression that results in exactly one stack value
|
||||||
and for instruction-style (``=: variable``), the value is just taken from the stack top.
|
and for instruction-style (``=: variable``), the value is just taken from the stack top.
|
||||||
For both ways, the colon points to the name of the variable.
|
For both ways, the colon points to the name of the variable. The assignment
|
||||||
|
is performed by replacing the variable's value on the stack by the new value.
|
||||||
|
|
||||||
.. code::
|
.. code::
|
||||||
|
|
||||||
@ -615,7 +450,7 @@ You can use a switch statement as a very basic version of "if/else".
|
|||||||
It takes the value of an expression and compares it to several constants.
|
It takes the value of an expression and compares it to several constants.
|
||||||
The branch corresponding to the matching constant is taken. Contrary to the
|
The branch corresponding to the matching constant is taken. Contrary to the
|
||||||
error-prone behaviour of some programming languages, control flow does
|
error-prone behaviour of some programming languages, control flow does
|
||||||
not continue from one case to the next. There is a fallback or default
|
not continue from one case to the next. There can be a fallback or default
|
||||||
case called ``default``.
|
case called ``default``.
|
||||||
|
|
||||||
.. code::
|
.. code::
|
||||||
@ -623,8 +458,12 @@ case called ``default``.
|
|||||||
assembly {
|
assembly {
|
||||||
let x := 0
|
let x := 0
|
||||||
switch calldataload(4)
|
switch calldataload(4)
|
||||||
case 0: { x := calldataload(0x24) }
|
case 0: {
|
||||||
default: { x := calldataload(0x44) }
|
x := calldataload(0x24)
|
||||||
|
}
|
||||||
|
default: {
|
||||||
|
x := calldataload(0x44)
|
||||||
|
}
|
||||||
sstore(0, div(x, 2))
|
sstore(0, div(x, 2))
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -675,13 +514,13 @@ The following example implements the power function by square-and-multiply.
|
|||||||
assembly {
|
assembly {
|
||||||
function power(base, exponent) -> (result) {
|
function power(base, exponent) -> (result) {
|
||||||
switch exponent
|
switch exponent
|
||||||
0: { result := 1 }
|
0: { result := 1 }
|
||||||
1: { result := base }
|
1: { result := base }
|
||||||
default: {
|
default: {
|
||||||
result := power(mul(base, base), div(exponent, 2))
|
result := power(mul(base, base), div(exponent, 2))
|
||||||
switch mod(exponent, 2)
|
switch mod(exponent, 2)
|
||||||
1: { result := mul(base, result) }
|
1: { result := mul(base, result) }
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -724,8 +563,175 @@ first slot of the array and then only the array elements follow.
|
|||||||
please do not rely on that.
|
please do not rely on that.
|
||||||
|
|
||||||
|
|
||||||
Specification
|
Standalone Assembly
|
||||||
=============
|
===================
|
||||||
|
|
||||||
|
The assembly language described as inline assembly above can also be used
|
||||||
|
standalone and in fact, the plan is to use it as an intermediate language
|
||||||
|
for the Solidity compiler. In this form, it tries to achieve several goals:
|
||||||
|
|
||||||
|
1. Programs written in it should be readable, even if the code is generated by a compiler from Solidity.
|
||||||
|
2. The translation from assembly to bytecode should contain as few "surprises" as possible.
|
||||||
|
3. Control flow should be easy to detect to help in formal verification and optimization.
|
||||||
|
|
||||||
|
In order to achieve the first and last goal, assembly provides high-level constructs
|
||||||
|
like ``for`` loops, ``switch`` statements and function calls. It should be possible
|
||||||
|
to write assembly programs that do not make use of explicit ``SWAP``, ``DUP``,
|
||||||
|
``JUMP`` and ``JUMPI`` statements, because the first two obfuscate the data flow
|
||||||
|
and the last two obfuscate control flow. Furthermore, functional statements of
|
||||||
|
the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like
|
||||||
|
``7 y x add mul`` because in the first form, it is much easier to see which
|
||||||
|
operand is used for which opcode.
|
||||||
|
|
||||||
|
The second goal is achieved by introducing a desugaring phase that only removes
|
||||||
|
the higher level constructs in a very regular way and still allows inspecting
|
||||||
|
the generated low-level assembly code. The only non-local operation performed
|
||||||
|
by the assembler is name lookup of user-defined identifiers (functions, variables, ...),
|
||||||
|
which follow very simple and regular scoping rules and cleanup of local variables from the stack.
|
||||||
|
|
||||||
|
Scoping: An identifier that is declared (label, variable, function, assembly)
|
||||||
|
is only visible in the block where it was declared (including nested blocks
|
||||||
|
inside the current block). It is not legal to access local variables across
|
||||||
|
function borders, even if they would be in scope. Shadowing is allowed, but
|
||||||
|
two identifiers with the same name cannot be declared in the same block.
|
||||||
|
Local variables cannot be accessed before they were declared, but labels,
|
||||||
|
functions and assemblies can. Assemblies are special blocks that are used
|
||||||
|
for e.g. returning runtime code or creating contracts. No identifier from an
|
||||||
|
outer assembly is visible in a sub-assembly.
|
||||||
|
|
||||||
|
If control flow passes over the end of a block, pop instructions are inserted
|
||||||
|
that match the number of local variables declared in that block, unless the
|
||||||
|
``}`` is directly preceded by an opcode that does not have a continuing control
|
||||||
|
flow path. Whenever a local variable is referenced, the code generator needs
|
||||||
|
to know its current relative position in the stack and thus it needs to
|
||||||
|
keep track of the current so-called stack height.
|
||||||
|
At the end of a block, this implicit stack height is always reduced by the number
|
||||||
|
of local variables whether ther is a continuing control flow or not.
|
||||||
|
|
||||||
|
This means that the stack height before and after the block should be the same.
|
||||||
|
If this is not the case, a warning is issued,
|
||||||
|
unless the last instruction in the block did not have a continuing control flow path.
|
||||||
|
|
||||||
|
Why do we use higher-level constructs like ``switch``, ``for`` and functions:
|
||||||
|
|
||||||
|
Using ``switch``, ``for`` and functions, it should be possible to write
|
||||||
|
complex code without using ``jump`` or ``jumpi`` manually. This makes it much
|
||||||
|
easier to analyze the control flow, which allows for improved formal
|
||||||
|
verification and optimization.
|
||||||
|
|
||||||
|
Furthermore, if manual jumps are allowed, computing the stack height is rather complicated.
|
||||||
|
The position of all local variables on the stack needs to be known, otherwise
|
||||||
|
neither references to local variables nor removing local variables automatically
|
||||||
|
from the stack at the end of a block will work properly. Because of that,
|
||||||
|
every label that is preceded by an instruction that ends or diverts control flow
|
||||||
|
should be annotated with the current stack layout. This annotation is performed
|
||||||
|
automatically during the desugaring phase.
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
We will follow an example compilation from Solidity to desugared assembly.
|
||||||
|
We consider the runtime bytecode of the following Solidity program::
|
||||||
|
|
||||||
|
contract C {
|
||||||
|
function f(uint x) returns (uint y) {
|
||||||
|
y = 1;
|
||||||
|
for (uint i = 0; i < x; i++)
|
||||||
|
y = 2 * y;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
The following assembly will be generated::
|
||||||
|
|
||||||
|
{
|
||||||
|
mstore(0x40, 0x60) // store the "free memory pointer"
|
||||||
|
// function dispatcher
|
||||||
|
switch div(calldataload(0), exp(2, 226))
|
||||||
|
case 0xb3de648b: {
|
||||||
|
let (r) = f(calldataload(4))
|
||||||
|
let ret := $allocate(0x20)
|
||||||
|
mstore(ret, r)
|
||||||
|
return(ret, 0x20)
|
||||||
|
}
|
||||||
|
default: { jump(invalidJumpLabel) }
|
||||||
|
// memory allocator
|
||||||
|
function $allocate(size) -> (pos) {
|
||||||
|
pos := mload(0x40)
|
||||||
|
mstore(0x40, add(pos, size))
|
||||||
|
}
|
||||||
|
// the contract function
|
||||||
|
function f(x) -> (y) {
|
||||||
|
y := 1
|
||||||
|
for { let i := 0 } lt(i, x) { i := add(i, 1) } {
|
||||||
|
y := mul(2, y)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
After the desugaring phase it looks as follows::
|
||||||
|
|
||||||
|
{
|
||||||
|
mstore(0x40, 0x60)
|
||||||
|
{
|
||||||
|
let $0 := div(calldataload(0), exp(2, 226))
|
||||||
|
jumpi($case1, eq($0, 0xb3de648b))
|
||||||
|
jump($caseDefault)
|
||||||
|
$case1:
|
||||||
|
{
|
||||||
|
// the function call - we put return label and arguments on the stack
|
||||||
|
$ret1 calldataload(4) jump($fun_f)
|
||||||
|
$ret1 [r]: // a label with a [...]-annotation resets the stack height
|
||||||
|
// to "current block + number of local variables". It also
|
||||||
|
// introduces a variable, r:
|
||||||
|
// r is at top of stack, $0 is below (from enclosing block)
|
||||||
|
$ret2 0x20 jump($fun_allocate)
|
||||||
|
$ret2 [ret]: // stack here: $0, r, ret (top)
|
||||||
|
mstore(ret, r)
|
||||||
|
return(ret, 0x20)
|
||||||
|
// although it is useless, the jump is automatically inserted,
|
||||||
|
// since the desugaring process does not analyze control-flow
|
||||||
|
jump($endswitch)
|
||||||
|
}
|
||||||
|
$caseDefault:
|
||||||
|
{
|
||||||
|
jump(invalidJumpLabel)
|
||||||
|
jump($endswitch)
|
||||||
|
}
|
||||||
|
$endswitch:
|
||||||
|
}
|
||||||
|
jump($afterFunction)
|
||||||
|
$fun_allocate:
|
||||||
|
{
|
||||||
|
$start[$retpos, size]:
|
||||||
|
// output variables live in the same scope as the arguments.
|
||||||
|
let pos := 0
|
||||||
|
{
|
||||||
|
pos := mload(0x40)
|
||||||
|
mstore(0x40, add(pos, size))
|
||||||
|
}
|
||||||
|
swap1 pop swap1 jump
|
||||||
|
}
|
||||||
|
$fun_f:
|
||||||
|
{
|
||||||
|
start [$retpos, x]:
|
||||||
|
let y := 0
|
||||||
|
{
|
||||||
|
let i := 0
|
||||||
|
$for_begin:
|
||||||
|
jumpi($for_end, iszero(lt(i, x)))
|
||||||
|
{
|
||||||
|
y := mul(2, y)
|
||||||
|
}
|
||||||
|
$for_continue:
|
||||||
|
{ i := add(i, 1) }
|
||||||
|
jump($for_begin)
|
||||||
|
$for_end:
|
||||||
|
} // Here, a pop instruction is inserted for i
|
||||||
|
swap1 pop swap1 jump
|
||||||
|
}
|
||||||
|
$afterFunction:
|
||||||
|
stop
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
Assembly happens in four stages:
|
Assembly happens in four stages:
|
||||||
|
|
||||||
@ -734,6 +740,9 @@ Assembly happens in four stages:
|
|||||||
3. Opcode stream generation
|
3. Opcode stream generation
|
||||||
4. Bytecode generation
|
4. Bytecode generation
|
||||||
|
|
||||||
|
We will specify steps one to three in a pseudo-formal way. More formal
|
||||||
|
specifications will follow.
|
||||||
|
|
||||||
|
|
||||||
Parsing / Grammar
|
Parsing / Grammar
|
||||||
-----------------
|
-----------------
|
||||||
|
Loading…
Reference in New Issue
Block a user