Move explanatory sections and other small changes.

2023-10-03 13:03:40 +00:00 · 2017-01-09 15:15:30 +01:00 · 2017-01-09 15:15:30 +01:00 · ceac5c5a0c
commit ceac5c5a0c
parent e92af89ec8
1 changed files with 192 additions and 183 deletions
--- a/docs/assembly.rst
+++ b/docs/assembly.rst
@ -94,177 +94,11 @@ you really know what you are doing.
        }
    }
 Standalone Assembly
 ===================
 This assembly language tries to achieve several goals:
 1. Programs written in it should be readable, even if the code is generated by a compiler from Solidity.
 2. The translation from assembly to bytecode should contain as few "surprises" as possible.
 3. Control flow should be easy to detect to help in formal verification and optimization.
 In order to achieve the first and last goal, assembly provides high-level constructs
 like ``for`` loops, ``switch`` statements and function calls. It should be possible
 to write assembly programs that do not make use of explicit ``SWAP``, ``DUP``,
 ``JUMP`` and ``JUMPI`` statements, because the first two obfuscate the data flow
 and the last two obfuscate control flow. Furthermore, functional statements of
 the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like
 ``7 y x add mul`` because in the first form, it is much easier to see which
 operand is used for which opcode.
 The second goal is achieved by introducing a desugaring phase that only removes
 the higher level constructs in a very regular way and still allows inspecting
 the generated low-level assembly code. The only non-local operation performed
 by the assembler is name lookup of user-defined identifiers (functions, variables, ...),
 which follow very simple and regular scoping rules and cleanup of local variables from the stack.
 Scoping: An identifier that is declared (label, variable, function, assembly)
 is only visible in the block where it was declared (including nested blocks
 inside the current block). It is not legal to access local variables across
 function borders, even if they would be in scope. Shadowing is allowed, but
 two identifiers with the same name cannot be declared in the same block.
 Local variables cannot be accessed before they were declared, but labels,
 functions and assemblies can. Assemblies are special blocks that are used
 for e.g. returning runtime code or creating contracts. No identifier from an
 outer assembly is visible in a sub-assembly.
 If control flow passes over the end of a block, pop instructions are inserted
 that match the number of local variables declared in that block, unless the
 ``}`` is directly preceded by an opcode that does not have a continuing control
 flow path. Whenever a local variable is referenced, the code generator needs
 to know its current relative position in the stack and thus it needs to
 keep track of the current so-called stack height.
 At the end of a block, this implicit stack height is always reduced by the number
 of local variables whether ther is a continuing control flow or not.
 This means that the stack height before and after the block should be the same.
 If this is not the case, a warning is issued,
 unless the last instruction in the block did not have a continuing control flow path.
 Why do we use higher-level constructs like ``switch``, ``for`` and functions:
 Using ``switch``, ``for`` and functions, it should be possible to write
 complex code without using ``jump`` or ``jumpi`` manually. This makes it much
 easier to analyze the control flow, which allows for improved formal
 verification and optimization.
 Furthermore, if manual jumps are allowed, computing the stack height is rather complicated.
 The position of all local variables on the stack needs to be known, otherwise
 neither references to local variables nor removing local variables automatically
 from the stack at the end of a block will work properly. Because of that,
 every label that is preceded by an instruction that ends or diverts control flow
 should be annotated with the current stack layout. This annotation is performed
 automatically during the desugaring phase.
 Example:
 We will follow an example compilation from Solidity to desugared assembly.
 We consider the runtime bytecode of the following Solidity program::
    contract C {
      function f(uint x) returns (uint y) {
        y = 1;
        for (uint i = 0; i < x; i++)
          y = 2 * y;
      }
    }
 The following assembly will be generated::
    {
      mstore(0x40, 0x60) // store the "free memory pointer"
      // function dispatcher
      switch div(calldataload(0), exp(2, 226))
        case 0xb3de648b: {
          let (r,) = f(calldataload(4))
          let ret := $allocate(0x20)
          mstore(ret, r)
          return(ret, 0x20)
        }
        default: { jump(invalidJumpLabel) }
      // memory allocator
      function $allocate(size) -> (pos) {
        pos := mload(0x40)
        mstore(0x40, add(pos, size))
      }
      // the contract function
      function f(x) -> (y) {
        y := 1
        for { let i := 0 } lt(i, x) { i := add(i, 1) } {
          y := mul(2, y)
        }
      }
    }
 After the desugaring phase it looks as follows::
    {
      mstore(0x40, 0x60)
      {
        let $0 := div(calldataload(0), exp(2, 226))
        jumpi($case1, eq($0, 0xb3de648b))
        jump($caseDefault)
        $case1:
        {
          // the function call - we put return label and arguments on the stack
          $ret1 calldataload(4) jump($fun_f)
          $ret1 [r]: // a label with a [...]-annotation resets the stack height
                    // to "current block + number of local variables". It also
                    // introduces a variable, r:
                    // r is at top of stack, $0 is below (from enclosing block)
          $ret2 0x20 jump($fun_allocate)
          $ret2 [ret]: // stack here: $0, r, ret (top)
          mstore(ret, r)
          return(ret, 0x20)
          // although it is useless, the jump is automatically inserted,
          // since the desugaring process does not analyze control-flow
          jump($endswitch)
        }
        $caseDefault:
        {
          jump(invalidJumpLabel)
          jump($endswitch)
        }
        $endswitch:
      }
      jump($afterFunction)
      $fun_allocate:
      {
        $start[$retpos, size]:
        // output variables live in the same scope as the arguments.
        let pos := 0
        {
          pos := mload(0x40)
          mstore(0x40, add(pos, size))
        }
        swap1 pop swap1 jump
      }
      $fun_f:
      {
        start [$retpos, x]:
        let y := 0
        {
          let i := 0
          $for_begin:
          jumpi($for_end, iszero(lt(i, x)))
          {
            y := mul(2, y)
          }
          $for_continue:
          { i := add(i, 1) }
          jump($for_begin)
          $for_end:
        } // Here, a pop instruction is inserted for i
        swap1 pop swap1 jump
      }
      $afterFunction:
      stop
    }
 Syntax
 ------
-Inline assembly parses comments, literals and identifiers exactly as Solidity, so you can use the
+Assembly parses comments, literals and identifiers exactly as Solidity, so you can use the
 usual ``//`` and ``/* */`` comments. Inline assembly is marked by ``assembly { ... }`` and inside
 these curly braces, the following can be used (see the later sections for more details)
@ -273,7 +107,7 @@ these curly braces, the following can be used (see the later sections for more d
 - opcode in functional style, e.g. ``add(1, mlod(0))``
 - labels, e.g. ``name:``
 - variable declarations, e.g. ``let x := 7`` or ``let x := add(y, 3)``
- - identifiers (externals, labels or assembly-local variables), e.g. ``jump(name)``, ``3 x add``
+ - identifiers (labels or assembly-local variables and externals if used as inline assembly), e.g. ``jump(name)``, ``3 x add``
 - assignments (in "instruction style"), e.g. ``3 =: x``
 - assignments in functional style, e.g. ``x := add(y, 3)``
 - blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }``
@ -535,7 +369,7 @@ jumps easier. The following code computes an element in the Fibonacci series.
 Please note that automatically accessing stack variables can only work if the
 assembler knows the current stack height. This fails to work if the jump source
-and target have different stack heights. It is still fine to use such jumps,
+and target have different stack heights. It is still fine to use such jumps, but
 you should just not access any stack variables (even assembly variables) in that case.
 Furthermore, the stack height analyser goes through the code opcode by opcode
@ -593,11 +427,12 @@ Assignments are possible to assembly-local variables and to function-local
 variables. Take care that when you assign to variables that point to
 memory or storage, you will only change the pointer and not the data.
-There are two kinds of assignments: Functional-style and instruction-style.
+There are two kinds of assignments: functional-style and instruction-style.
 For functional-style assignments (``variable := value``), you need to provide a value in a
 functional-style expression that results in exactly one stack value
 and for instruction-style (``=: variable``), the value is just taken from the stack top.
-For both ways, the colon points to the name of the variable.
+For both ways, the colon points to the name of the variable. The assignment
 is performed by replacing the variable's value on the stack by the new value.
 .. code::
@ -615,7 +450,7 @@ You can use a switch statement as a very basic version of "if/else".
 It takes the value of an expression and compares it to several constants.
 The branch corresponding to the matching constant is taken. Contrary to the
 error-prone behaviour of some programming languages, control flow does
-not continue from one case to the next. There is a fallback or default
+not continue from one case to the next. There can be a fallback or default
 case called ``default``.
 .. code::
@ -623,8 +458,12 @@ case called ``default``.
    assembly {
        let x := 0
        switch calldataload(4)
-            case 0: { x := calldataload(0x24) }
+        case 0: {
-            default: { x := calldataload(0x44) }
+            x := calldataload(0x24)
        }
        default: {
            x := calldataload(0x44)
        }
        sstore(0, div(x, 2))
    }
@ -675,13 +514,13 @@ The following example implements the power function by square-and-multiply.
    assembly {
        function power(base, exponent) -> (result) {
            switch exponent
-                0: { result := 1 }
+            0: { result := 1 }
-                1: { result := base }
+            1: { result := base }
-                default: {
+            default: {
-                    result := power(mul(base, base), div(exponent, 2))
+                result := power(mul(base, base), div(exponent, 2))
-                    switch mod(exponent, 2)
+                switch mod(exponent, 2)
-                        1: { result := mul(base, result) }
+                    1: { result := mul(base, result) }
-                }
+            }
        }
    }
@ -724,8 +563,175 @@ first slot of the array and then only the array elements follow.
    please do not rely on that.
-Specification
+Standalone Assembly
-=============
+===================
 The assembly language described as inline assembly above can also be used
 standalone and in fact, the plan is to use it as an intermediate language
 for the Solidity compiler. In this form, it tries to achieve several goals:
 1. Programs written in it should be readable, even if the code is generated by a compiler from Solidity.
 2. The translation from assembly to bytecode should contain as few "surprises" as possible.
 3. Control flow should be easy to detect to help in formal verification and optimization.
 In order to achieve the first and last goal, assembly provides high-level constructs
 like ``for`` loops, ``switch`` statements and function calls. It should be possible
 to write assembly programs that do not make use of explicit ``SWAP``, ``DUP``,
 ``JUMP`` and ``JUMPI`` statements, because the first two obfuscate the data flow
 and the last two obfuscate control flow. Furthermore, functional statements of
 the form ``mul(add(x, y), 7)`` are preferred over pure opcode statements like
 ``7 y x add mul`` because in the first form, it is much easier to see which
 operand is used for which opcode.
 The second goal is achieved by introducing a desugaring phase that only removes
 the higher level constructs in a very regular way and still allows inspecting
 the generated low-level assembly code. The only non-local operation performed
 by the assembler is name lookup of user-defined identifiers (functions, variables, ...),
 which follow very simple and regular scoping rules and cleanup of local variables from the stack.
 Scoping: An identifier that is declared (label, variable, function, assembly)
 is only visible in the block where it was declared (including nested blocks
 inside the current block). It is not legal to access local variables across
 function borders, even if they would be in scope. Shadowing is allowed, but
 two identifiers with the same name cannot be declared in the same block.
 Local variables cannot be accessed before they were declared, but labels,
 functions and assemblies can. Assemblies are special blocks that are used
 for e.g. returning runtime code or creating contracts. No identifier from an
 outer assembly is visible in a sub-assembly.
 If control flow passes over the end of a block, pop instructions are inserted
 that match the number of local variables declared in that block, unless the
 ``}`` is directly preceded by an opcode that does not have a continuing control
 flow path. Whenever a local variable is referenced, the code generator needs
 to know its current relative position in the stack and thus it needs to
 keep track of the current so-called stack height.
 At the end of a block, this implicit stack height is always reduced by the number
 of local variables whether ther is a continuing control flow or not.
 This means that the stack height before and after the block should be the same.
 If this is not the case, a warning is issued,
 unless the last instruction in the block did not have a continuing control flow path.
 Why do we use higher-level constructs like ``switch``, ``for`` and functions:
 Using ``switch``, ``for`` and functions, it should be possible to write
 complex code without using ``jump`` or ``jumpi`` manually. This makes it much
 easier to analyze the control flow, which allows for improved formal
 verification and optimization.
 Furthermore, if manual jumps are allowed, computing the stack height is rather complicated.
 The position of all local variables on the stack needs to be known, otherwise
 neither references to local variables nor removing local variables automatically
 from the stack at the end of a block will work properly. Because of that,
 every label that is preceded by an instruction that ends or diverts control flow
 should be annotated with the current stack layout. This annotation is performed
 automatically during the desugaring phase.
 Example:
 We will follow an example compilation from Solidity to desugared assembly.
 We consider the runtime bytecode of the following Solidity program::
    contract C {
      function f(uint x) returns (uint y) {
        y = 1;
        for (uint i = 0; i < x; i++)
          y = 2 * y;
      }
    }
 The following assembly will be generated::
    {
      mstore(0x40, 0x60) // store the "free memory pointer"
      // function dispatcher
      switch div(calldataload(0), exp(2, 226))
      case 0xb3de648b: {
        let (r) = f(calldataload(4))
        let ret := $allocate(0x20)
        mstore(ret, r)
        return(ret, 0x20)
      }
      default: { jump(invalidJumpLabel) }
      // memory allocator
      function $allocate(size) -> (pos) {
        pos := mload(0x40)
        mstore(0x40, add(pos, size))
      }
      // the contract function
      function f(x) -> (y) {
        y := 1
        for { let i := 0 } lt(i, x) { i := add(i, 1) } {
          y := mul(2, y)
        }
      }
    }
 After the desugaring phase it looks as follows::
    {
      mstore(0x40, 0x60)
      {
        let $0 := div(calldataload(0), exp(2, 226))
        jumpi($case1, eq($0, 0xb3de648b))
        jump($caseDefault)
        $case1:
        {
          // the function call - we put return label and arguments on the stack
          $ret1 calldataload(4) jump($fun_f)
          $ret1 [r]: // a label with a [...]-annotation resets the stack height
                    // to "current block + number of local variables". It also
                    // introduces a variable, r:
                    // r is at top of stack, $0 is below (from enclosing block)
          $ret2 0x20 jump($fun_allocate)
          $ret2 [ret]: // stack here: $0, r, ret (top)
          mstore(ret, r)
          return(ret, 0x20)
          // although it is useless, the jump is automatically inserted,
          // since the desugaring process does not analyze control-flow
          jump($endswitch)
        }
        $caseDefault:
        {
          jump(invalidJumpLabel)
          jump($endswitch)
        }
        $endswitch:
      }
      jump($afterFunction)
      $fun_allocate:
      {
        $start[$retpos, size]:
        // output variables live in the same scope as the arguments.
        let pos := 0
        {
          pos := mload(0x40)
          mstore(0x40, add(pos, size))
        }
        swap1 pop swap1 jump
      }
      $fun_f:
      {
        start [$retpos, x]:
        let y := 0
        {
          let i := 0
          $for_begin:
          jumpi($for_end, iszero(lt(i, x)))
          {
            y := mul(2, y)
          }
          $for_continue:
          { i := add(i, 1) }
          jump($for_begin)
          $for_end:
        } // Here, a pop instruction is inserted for i
        swap1 pop swap1 jump
      }
      $afterFunction:
      stop
    }
 Assembly happens in four stages:
@ -734,6 +740,9 @@ Assembly happens in four stages:
 3. Opcode stream generation
 4. Bytecode generation
 We will specify steps one to three in a pseudo-formal way. More formal
 specifications will follow.
 Parsing / Grammar
 -----------------