diff --git a/docs/060-breaking-changes.rst b/docs/060-breaking-changes.rst index 32397677c..915a9f335 100644 --- a/docs/060-breaking-changes.rst +++ b/docs/060-breaking-changes.rst @@ -89,14 +89,14 @@ New Features This section lists things that were not possible prior to Solidity 0.6.0 or were more difficult to achieve. - * The :ref:`try/catch statement ` allows you to react on failed external calls. - * ``struct`` and ``enum`` types can be declared at file level. - * Array slices can be used for calldata arrays, for example ``abi.decode(msg.data[4:], (uint, uint))`` - is a low-level way to decode the function call payload. - * Natspec supports multiple return parameters in developer documentation, enforcing the same naming check as ``@param``. - * Yul and Inline Assembly have a new statement called ``leave`` that exits the current function. - * Conversions from ``address`` to ``address payable`` are now possible via ``payable(x)``, where - ``x`` must be of type ``address``. +* The :ref:`try/catch statement ` allows you to react on failed external calls. +* ``struct`` and ``enum`` types can be declared at file level. +* Array slices can be used for calldata arrays, for example ``abi.decode(msg.data[4:], (uint, uint))`` + is a low-level way to decode the function call payload. +* Natspec supports multiple return parameters in developer documentation, enforcing the same naming check as ``@param``. +* Yul and Inline Assembly have a new statement called ``leave`` that exits the current function. +* Conversions from ``address`` to ``address payable`` are now possible via ``payable(x)``, where + ``x`` must be of type ``address``. Interface Changes diff --git a/docs/abi-spec.rst b/docs/abi-spec.rst index d412a2a2e..0c38e0f27 100644 --- a/docs/abi-spec.rst +++ b/docs/abi-spec.rst @@ -77,8 +77,9 @@ The following (fixed-size) array type exists: - ``[M]``: a fixed-length array of ``M`` elements, ``M >= 0``, of the given type. - .. note:: - While this ABI specification can express fixed-length arrays with zero elements, they're not supported by the compiler. + .. note:: + + While this ABI specification can express fixed-length arrays with zero elements, they're not supported by the compiler. The following non-fixed-size types exist: @@ -124,13 +125,13 @@ Design Criteria for the Encoding The encoding is designed to have the following properties, which are especially useful if some arguments are nested arrays: - 1. The number of reads necessary to access a value is at most the depth of the value - inside the argument array structure, i.e. four reads are needed to retrieve ``a_i[k][l][r]``. In a - previous version of the ABI, the number of reads scaled linearly with the total number of dynamic - parameters in the worst case. +1. The number of reads necessary to access a value is at most the depth of the value + inside the argument array structure, i.e. four reads are needed to retrieve ``a_i[k][l][r]``. In a + previous version of the ABI, the number of reads scaled linearly with the total number of dynamic + parameters in the worst case. - 2. The data of a variable or array element is not interleaved with other data and it is - relocatable, i.e. it only uses relative "addresses". +2. The data of a variable or array element is not interleaved with other data and it is + relocatable, i.e. it only uses relative "addresses". Formal Specification of the Encoding @@ -312,21 +313,21 @@ these are directly the values we want to pass, whereas for the dynamic types ``u we use the offset in bytes to the start of their data area, measured from the start of the value encoding (i.e. not counting the first four bytes containing the hash of the function signature). These are: - - ``0x0000000000000000000000000000000000000000000000000000000000000123`` (``0x123`` padded to 32 bytes) - - ``0x0000000000000000000000000000000000000000000000000000000000000080`` (offset to start of data part of second parameter, 4*32 bytes, exactly the size of the head part) - - ``0x3132333435363738393000000000000000000000000000000000000000000000`` (``"1234567890"`` padded to 32 bytes on the right) - - ``0x00000000000000000000000000000000000000000000000000000000000000e0`` (offset to start of data part of fourth parameter = offset to start of data part of first dynamic parameter + size of data part of first dynamic parameter = 4\*32 + 3\*32 (see below)) +- ``0x0000000000000000000000000000000000000000000000000000000000000123`` (``0x123`` padded to 32 bytes) +- ``0x0000000000000000000000000000000000000000000000000000000000000080`` (offset to start of data part of second parameter, 4*32 bytes, exactly the size of the head part) +- ``0x3132333435363738393000000000000000000000000000000000000000000000`` (``"1234567890"`` padded to 32 bytes on the right) +- ``0x00000000000000000000000000000000000000000000000000000000000000e0`` (offset to start of data part of fourth parameter = offset to start of data part of first dynamic parameter + size of data part of first dynamic parameter = 4\*32 + 3\*32 (see below)) After this, the data part of the first dynamic argument, ``[0x456, 0x789]`` follows: - - ``0x0000000000000000000000000000000000000000000000000000000000000002`` (number of elements of the array, 2) - - ``0x0000000000000000000000000000000000000000000000000000000000000456`` (first element) - - ``0x0000000000000000000000000000000000000000000000000000000000000789`` (second element) +- ``0x0000000000000000000000000000000000000000000000000000000000000002`` (number of elements of the array, 2) +- ``0x0000000000000000000000000000000000000000000000000000000000000456`` (first element) +- ``0x0000000000000000000000000000000000000000000000000000000000000789`` (second element) Finally, we encode the data part of the second dynamic argument, ``"Hello, world!"``: - - ``0x000000000000000000000000000000000000000000000000000000000000000d`` (number of elements (bytes in this case): 13) - - ``0x48656c6c6f2c20776f726c642100000000000000000000000000000000000000`` (``"Hello, world!"`` padded to 32 bytes on the right) +- ``0x000000000000000000000000000000000000000000000000000000000000000d`` (number of elements (bytes in this case): 13) +- ``0x48656c6c6f2c20776f726c642100000000000000000000000000000000000000`` (``"Hello, world!"`` padded to 32 bytes on the right) All together, the encoding is (newline after function selector and each 32-bytes for clarity): @@ -348,14 +349,14 @@ with values ``([[1, 2], [3]], ["one", "two", "three"])`` but start from the most First we encode the length and data of the first embedded dynamic array ``[1, 2]`` of the first root array ``[[1, 2], [3]]``: - - ``0x0000000000000000000000000000000000000000000000000000000000000002`` (number of elements in the first array, 2; the elements themselves are ``1`` and ``2``) - - ``0x0000000000000000000000000000000000000000000000000000000000000001`` (first element) - - ``0x0000000000000000000000000000000000000000000000000000000000000002`` (second element) +- ``0x0000000000000000000000000000000000000000000000000000000000000002`` (number of elements in the first array, 2; the elements themselves are ``1`` and ``2``) +- ``0x0000000000000000000000000000000000000000000000000000000000000001`` (first element) +- ``0x0000000000000000000000000000000000000000000000000000000000000002`` (second element) Then we encode the length and data of the second embedded dynamic array ``[3]`` of the first root array ``[[1, 2], [3]]``: - - ``0x0000000000000000000000000000000000000000000000000000000000000001`` (number of elements in the second array, 1; the element is ``3``) - - ``0x0000000000000000000000000000000000000000000000000000000000000003`` (first element) +- ``0x0000000000000000000000000000000000000000000000000000000000000001`` (number of elements in the second array, 1; the element is ``3``) +- ``0x0000000000000000000000000000000000000000000000000000000000000003`` (first element) Then we need to find the offsets ``a`` and ``b`` for their respective dynamic arrays ``[1, 2]`` and ``[3]``. To calculate the offsets we can take a look at the encoded data of the first root array ``[[1, 2], [3]]`` @@ -380,12 +381,12 @@ thus ``b = 0x00000000000000000000000000000000000000000000000000000000000000a0``. Then we encode the embedded strings of the second root array: - - ``0x0000000000000000000000000000000000000000000000000000000000000003`` (number of characters in word ``"one"``) - - ``0x6f6e650000000000000000000000000000000000000000000000000000000000`` (utf8 representation of word ``"one"``) - - ``0x0000000000000000000000000000000000000000000000000000000000000003`` (number of characters in word ``"two"``) - - ``0x74776f0000000000000000000000000000000000000000000000000000000000`` (utf8 representation of word ``"two"``) - - ``0x0000000000000000000000000000000000000000000000000000000000000005`` (number of characters in word ``"three"``) - - ``0x7468726565000000000000000000000000000000000000000000000000000000`` (utf8 representation of word ``"three"``) +- ``0x0000000000000000000000000000000000000000000000000000000000000003`` (number of characters in word ``"one"``) +- ``0x6f6e650000000000000000000000000000000000000000000000000000000000`` (utf8 representation of word ``"one"``) +- ``0x0000000000000000000000000000000000000000000000000000000000000003`` (number of characters in word ``"two"``) +- ``0x74776f0000000000000000000000000000000000000000000000000000000000`` (utf8 representation of word ``"two"``) +- ``0x0000000000000000000000000000000000000000000000000000000000000005`` (number of characters in word ``"three"``) +- ``0x7468726565000000000000000000000000000000000000000000000000000000`` (utf8 representation of word ``"three"``) In parallel to the first root array, since strings are dynamic elements we need to find their offsets ``c``, ``d`` and ``e``: @@ -416,11 +417,11 @@ and have the same encodings for a function with a signature ``g(string[],uint[][ Then we encode the length of the first root array: - - ``0x0000000000000000000000000000000000000000000000000000000000000002`` (number of elements in the first root array, 2; the elements themselves are ``[1, 2]`` and ``[3]``) +- ``0x0000000000000000000000000000000000000000000000000000000000000002`` (number of elements in the first root array, 2; the elements themselves are ``[1, 2]`` and ``[3]``) Then we encode the length of the second root array: - - ``0x0000000000000000000000000000000000000000000000000000000000000003`` (number of strings in the second root array, 3; the strings themselves are ``"one"``, ``"two"`` and ``"three"``) +- ``0x0000000000000000000000000000000000000000000000000000000000000003`` (number of strings in the second root array, 3; the strings themselves are ``"one"``, ``"two"`` and ``"three"``) Finally we find the offsets ``f`` and ``g`` for their respective root dynamic arrays ``[[1, 2], [3]]`` and ``["one", "two", "three"]``, and assemble parts in the correct order: @@ -761,18 +762,19 @@ As an example, the encoding of ``int16(-1), bytes1(0x42), uint16(0x03), string(" ^^^^^^^^^^^^^^^^^^^^^^^^^^ string("Hello, world!") without a length field More specifically: - - During the encoding, everything is encoded in-place. This means that there is - no distinction between head and tail, as in the ABI encoding, and the length - of an array is not encoded. - - The direct arguments of ``abi.encodePacked`` are encoded without padding, - as long as they are not arrays (or ``string`` or ``bytes``). - - The encoding of an array is the concatenation of the - encoding of its elements **with** padding. - - Dynamically-sized types like ``string``, ``bytes`` or ``uint[]`` are encoded - without their length field. - - The encoding of ``string`` or ``bytes`` does not apply padding at the end - unless it is part of an array or struct (then it is padded to a multiple of - 32 bytes). + +- During the encoding, everything is encoded in-place. This means that there is + no distinction between head and tail, as in the ABI encoding, and the length + of an array is not encoded. +- The direct arguments of ``abi.encodePacked`` are encoded without padding, + as long as they are not arrays (or ``string`` or ``bytes``). +- The encoding of an array is the concatenation of the + encoding of its elements **with** padding. +- Dynamically-sized types like ``string``, ``bytes`` or ``uint[]`` are encoded + without their length field. +- The encoding of ``string`` or ``bytes`` does not apply padding at the end + unless it is part of an array or struct (then it is padded to a multiple of + 32 bytes). In general, the encoding is ambiguous as soon as there are two dynamically-sized elements, because of the missing length field. @@ -801,13 +803,13 @@ Indexed event parameters that are not value types, i.e. arrays and structs are n stored directly but instead a keccak256-hash of an encoding is stored. This encoding is defined as follows: - - the encoding of a ``bytes`` and ``string`` value is just the string contents - without any padding or length prefix. - - the encoding of a struct is the concatenation of the encoding of its members, - always padded to a multiple of 32 bytes (even ``bytes`` and ``string``). - - the encoding of an array (both dynamically- and statically-sized) is - the concatenation of the encoding of its elements, always padded to a multiple - of 32 bytes (even ``bytes`` and ``string``) and without any length prefix +- the encoding of a ``bytes`` and ``string`` value is just the string contents + without any padding or length prefix. +- the encoding of a struct is the concatenation of the encoding of its members, + always padded to a multiple of 32 bytes (even ``bytes`` and ``string``). +- the encoding of an array (both dynamically- and statically-sized) is + the concatenation of the encoding of its elements, always padded to a multiple + of 32 bytes (even ``bytes`` and ``string``) and without any length prefix In the above, as usual, a negative number is padded by sign extension and not zero padded. ``bytesNN`` types are padded on the right while ``uintNN`` / ``intNN`` are padded on the left. diff --git a/docs/bugs.rst b/docs/bugs.rst index 73700adf3..75a23e499 100644 --- a/docs/bugs.rst +++ b/docs/bugs.rst @@ -19,16 +19,16 @@ which can be used to check which bugs affect a specific version of the compiler. Contract source verification tools and also other tools interacting with contracts should consult this list according to the following criteria: - - It is mildly suspicious if a contract was compiled with a nightly - compiler version instead of a released version. This list does not keep - track of unreleased or nightly versions. - - It is also mildly suspicious if a contract was compiled with a version that was - not the most recent at the time the contract was created. For contracts - created from other contracts, you have to follow the creation chain - back to a transaction and use the date of that transaction as creation date. - - It is highly suspicious if a contract was compiled with a compiler that - contains a known bug and the contract was created at a time where a newer - compiler version containing a fix was already released. +- It is mildly suspicious if a contract was compiled with a nightly + compiler version instead of a released version. This list does not keep + track of unreleased or nightly versions. +- It is also mildly suspicious if a contract was compiled with a version that was + not the most recent at the time the contract was created. For contracts + created from other contracts, you have to follow the creation chain + back to a transaction and use the date of that transaction as creation date. +- It is highly suspicious if a contract was compiled with a compiler that + contains a known bug and the contract was created at a time where a newer + compiler version containing a fix was already released. The JSON file of known bugs below is an array of objects, one for each bug, with the following keys: diff --git a/docs/contracts/libraries.rst b/docs/contracts/libraries.rst index eee9de765..a6a062501 100644 --- a/docs/contracts/libraries.rst +++ b/docs/contracts/libraries.rst @@ -223,14 +223,14 @@ following an internal naming schema and arguments of types not supported in the The following identifiers are used for the types in the signatures: - - Value types, non-storage ``string`` and non-storage ``bytes`` use the same identifiers as in the contract ABI. - - Non-storage array types follow the same convention as in the contract ABI, i.e. ``[]`` for dynamic arrays and - ``[M]`` for fixed-size arrays of ``M`` elements. - - Non-storage structs are referred to by their fully qualified name, i.e. ``C.S`` for ``contract C { struct S { ... } }``. - - Storage pointer mappings use ``mapping( => ) storage`` where ```` and ```` are - the identifiers for the key and value types of the mapping, respectively. - - Other storage pointer types use the type identifier of their corresponding non-storage type, but append a single space - followed by ``storage`` to it. +- Value types, non-storage ``string`` and non-storage ``bytes`` use the same identifiers as in the contract ABI. +- Non-storage array types follow the same convention as in the contract ABI, i.e. ``[]`` for dynamic arrays and + ``[M]`` for fixed-size arrays of ``M`` elements. +- Non-storage structs are referred to by their fully qualified name, i.e. ``C.S`` for ``contract C { struct S { ... } }``. +- Storage pointer mappings use ``mapping( => ) storage`` where ```` and ```` are + the identifiers for the key and value types of the mapping, respectively. +- Other storage pointer types use the type identifier of their corresponding non-storage type, but append a single space + followed by ``storage`` to it. The argument encoding is the same as for the regular contract ABI, except for storage pointers, which are encoded as a ``uint256`` value referring to the storage slot to which they point. diff --git a/docs/internals/optimizer.rst b/docs/internals/optimizer.rst index 142a89602..64c8b1460 100644 --- a/docs/internals/optimizer.rst +++ b/docs/internals/optimizer.rst @@ -269,11 +269,11 @@ backtracking. All components of the Yul-based optimizer module are explained below. The following transformation steps are the main components: - - SSA Transform - - Common Subexpression Eliminator - - Expression Simplifier - - Redundant Assign Eliminator - - Full Function Inliner +- SSA Transform +- Common Subexpression Eliminator +- Expression Simplifier +- Redundant Assign Eliminator +- Full Function Inliner Optimizer Steps --------------- @@ -281,36 +281,36 @@ Optimizer Steps This is a list of all steps the Yul-based optimizer sorted alphabetically. You can find more information on the individual steps and their sequence below. - - :ref:`block-flattener`. - - :ref:`circular-reference-pruner`. - - :ref:`common-subexpression-eliminator`. - - :ref:`conditional-simplifier`. - - :ref:`conditional-unsimplifier`. - - :ref:`control-flow-simplifier`. - - :ref:`dead-code-eliminator`. - - :ref:`equivalent-function-combiner`. - - :ref:`expression-joiner`. - - :ref:`expression-simplifier`. - - :ref:`expression-splitter`. - - :ref:`for-loop-condition-into-body`. - - :ref:`for-loop-condition-out-of-body`. - - :ref:`for-loop-init-rewriter`. - - :ref:`functional-inliner`. - - :ref:`function-grouper`. - - :ref:`function-hoister`. - - :ref:`function-specializer`. - - :ref:`literal-rematerialiser`. - - :ref:`load-resolver`. - - :ref:`loop-invariant-code-motion`. - - :ref:`redundant-assign-eliminator`. - - :ref:`reasoning-based-simplifier`. - - :ref:`rematerialiser`. - - :ref:`SSA-reverser`. - - :ref:`SSA-transform`. - - :ref:`structural-simplifier`. - - :ref:`unused-function-parameter-pruner`. - - :ref:`unused-pruner`. - - :ref:`var-decl-initializer`. +- :ref:`block-flattener`. +- :ref:`circular-reference-pruner`. +- :ref:`common-subexpression-eliminator`. +- :ref:`conditional-simplifier`. +- :ref:`conditional-unsimplifier`. +- :ref:`control-flow-simplifier`. +- :ref:`dead-code-eliminator`. +- :ref:`equivalent-function-combiner`. +- :ref:`expression-joiner`. +- :ref:`expression-simplifier`. +- :ref:`expression-splitter`. +- :ref:`for-loop-condition-into-body`. +- :ref:`for-loop-condition-out-of-body`. +- :ref:`for-loop-init-rewriter`. +- :ref:`functional-inliner`. +- :ref:`function-grouper`. +- :ref:`function-hoister`. +- :ref:`function-specializer`. +- :ref:`literal-rematerialiser`. +- :ref:`load-resolver`. +- :ref:`loop-invariant-code-motion`. +- :ref:`redundant-assign-eliminator`. +- :ref:`reasoning-based-simplifier`. +- :ref:`rematerialiser`. +- :ref:`SSA-reverser`. +- :ref:`SSA-transform`. +- :ref:`structural-simplifier`. +- :ref:`unused-function-parameter-pruner`. +- :ref:`unused-pruner`. +- :ref:`var-decl-initializer`. Selecting Optimizations ----------------------- @@ -589,8 +589,8 @@ For any variable ``a`` that is assigned to somewhere in the code (variables that are declared with value and never re-assigned are not modified) perform the following transforms: - - replace ``let a := v`` by ``let a_i := v let a := a_i`` - - replace ``a := v`` by ``let a_i := v a := a_i`` where ``i`` is a number such that ``a_i`` is yet unused. +- replace ``let a := v`` by ``let a_i := v let a := a_i`` +- replace ``a := v`` by ``let a_i := v a := a_i`` where ``i`` is a number such that ``a_i`` is yet unused. Furthermore, always record the current value of ``i`` used for ``a`` and replace each reference to ``a`` by ``a_i``. @@ -677,9 +677,9 @@ joins, the two mappings coming from the two branches are combined in the followi Statements that are only in one mapping or have the same state are used unchanged. Conflicting values are resolved in the following way: - - "unused", "undecided" -> "undecided" - - "unused", "used" -> "used" - - "undecided, "used" -> "used" +- "unused", "undecided" -> "undecided" +- "unused", "used" -> "used" +- "undecided, "used" -> "used" For for-loops, the condition, body and post-part are visited twice, taking the joining control-flow at the condition into account. @@ -735,10 +735,10 @@ is side-effect free and its evaluation only depends on the values of variables and the call-constant state of the environment. Most expressions are movable. The following parts make an expression non-movable: - - function calls (might be relaxed in the future if all statements in the function are movable) - - opcodes that (can) have side-effects (like ``call`` or ``selfdestruct``) - - opcodes that read or write memory, storage or external state information - - opcodes that depend on the current PC, memory size or returndata size +- function calls (might be relaxed in the future if all statements in the function are movable) +- opcodes that (can) have side-effects (like ``call`` or ``selfdestruct``) +- opcodes that read or write memory, storage or external state information +- opcodes that depend on the current PC, memory size or returndata size DataflowAnalyzer ^^^^^^^^^^^^^^^^ @@ -836,8 +836,8 @@ ReasoningBasedSimplifier This optimizer uses SMT solvers to check whether ``if`` conditions are constant. - - If ``constraints AND condition`` is UNSAT, the condition is never true and the whole body can be removed. - - If ``constraints AND NOT condition`` is UNSAT, the condition is always true and can be replaced by ``1``. +- If ``constraints AND condition`` is UNSAT, the condition is never true and the whole body can be removed. +- If ``constraints AND NOT condition`` is UNSAT, the condition is always true and can be replaced by ``1``. The simplifications above can only be applied if the condition is movable. @@ -872,13 +872,13 @@ we cannot assign a specific value. Current features: - - switch cases: insert " := " - - after if statement with terminating control-flow, insert " := 0" +- switch cases: insert " := " +- after if statement with terminating control-flow, insert " := 0" Future features: - - allow replacements by "1" - - take termination of user-defined functions into account +- allow replacements by "1" +- take termination of user-defined functions into account Works best with SSA form and if dead code removal has run before. @@ -898,15 +898,15 @@ ControlFlowSimplifier Simplifies several control-flow structures: - - replace if with empty body with pop(condition) - - remove empty default switch case - - remove empty switch case if no default case exists - - replace switch with no cases with pop(expression) - - turn switch with single case into if - - replace switch with only default case with pop(expression) and body - - replace switch with const expr with matching case body - - replace ``for`` with terminating control flow and without other break/continue by ``if`` - - remove ``leave`` at the end of a function. +- replace if with empty body with pop(condition) +- remove empty default switch case +- remove empty switch case if no default case exists +- replace switch with no cases with pop(expression) +- turn switch with single case into if +- replace switch with only default case with pop(expression) and body +- replace switch with const expr with matching case body +- replace ``for`` with terminating control flow and without other break/continue by ``if`` +- remove ``leave`` at the end of a function. None of these operations depend on the data flow. The StructuralSimplifier performs similar tasks that do depend on data flow. @@ -956,13 +956,13 @@ StructuralSimplifier This is a general step that performs various kinds of simplifications on a structural level: - - replace if statement with empty body by ``pop(condition)`` - - replace if statement with true condition by its body - - remove if statement with false condition - - turn switch with single case into if - - replace switch with only default case by ``pop(expression)`` and body - - replace switch with literal expression by matching case body - - replace for loop with false condition by its initialization part +- replace if statement with empty body by ``pop(condition)`` +- replace if statement with true condition by its body +- remove if statement with false condition +- turn switch with single case into if +- replace switch with only default case by ``pop(expression)`` and body +- replace switch with literal expression by matching case body +- replace for loop with false condition by its initialization part This component uses the Dataflow Analyzer. @@ -1008,8 +1008,8 @@ declarations inside conditional branches will not be moved out of the loop. Requirements: - - The Disambiguator, ForLoopInitRewriter and FunctionHoister must be run upfront. - - Expression splitter and SSA transform should be run upfront to obtain better result. +- The Disambiguator, ForLoopInitRewriter and FunctionHoister must be run upfront. +- Expression splitter and SSA transform should be run upfront to obtain better result. Function-Level Optimizations @@ -1089,15 +1089,15 @@ FunctionalInliner This component of the optimizer performs restricted function inlining by inlining functions that can be inlined inside functional expressions, i.e. functions that: - - return a single value. - - have a body like ``r := ``. - - neither reference themselves nor ``r`` in the right hand side. +- return a single value. +- have a body like ``r := ``. +- neither reference themselves nor ``r`` in the right hand side. Furthermore, for all parameters, all of the following need to be true: - - The argument is movable. - - The parameter is either referenced less than twice in the function body, or the argument is rather cheap - ("cost" of at most 1, like a constant up to 0xff). +- The argument is movable. +- The parameter is either referenced less than twice in the function body, or the argument is rather cheap + ("cost" of at most 1, like a constant up to 0xff). Example: The function to be inlined has the form of ``function f(...) -> r { r := E }`` where ``E`` is an expression that does not reference ``r`` and all arguments in the function call are movable expressions. diff --git a/docs/internals/source_mappings.rst b/docs/internals/source_mappings.rst index 0ee215533..fbb84e2d1 100644 --- a/docs/internals/source_mappings.rst +++ b/docs/internals/source_mappings.rst @@ -56,8 +56,8 @@ used in a single modifier. In order to compress these source mappings especially for bytecode, the following rules are used: - - If a field is empty, the value of the preceding element is used. - - If a ``:`` is missing, all following fields are considered empty. +- If a field is empty, the value of the preceding element is used. +- If a ``:`` is missing, all following fields are considered empty. This means the following source mappings represent the same information: diff --git a/docs/ir/ir-breaking-changes.rst b/docs/ir/ir-breaking-changes.rst index 7d6b6d54f..29b548654 100644 --- a/docs/ir/ir-breaking-changes.rst +++ b/docs/ir/ir-breaking-changes.rst @@ -11,180 +11,187 @@ Semantic Only Changes This section lists the changes that are semantic-only, thus potentially hiding new and different behavior in existing code. - * When storage structs are deleted, every storage slot that contains a member of the struct is set to zero entirely. Formally, padding space was left untouched. -Consequently, if the padding space within a struct is used to store data (e.g. in the context of a contract upgrade), you have to be aware that ``delete`` will now also clear the added member (while it wouldn't have been cleared in the past). +- When storage structs are deleted, every storage slot that contains a member of the struct is set to zero entirely. Formally, padding space was left untouched. + Consequently, if the padding space within a struct is used to store data (e.g. in the context of a contract upgrade), you have to be aware that ``delete`` will now also clear the added member (while it wouldn't have been cleared in the past). -.. code-block:: solidity + .. code-block:: solidity - // SPDX-License-Identifier: GPL-3.0 - pragma solidity >0.7.0; + // SPDX-License-Identifier: GPL-3.0 + pragma solidity >0.7.0; - contract C { - struct S { - uint64 y; - uint64 z; - } - S s; - function f() public { - // ... - delete s; - // s occupies only first 16 bytes of the 32 bytes slot - // delete will write zero to the full slot - } - } + contract C { + struct S { + uint64 y; + uint64 z; + } + S s; + function f() public { + // ... + delete s; + // s occupies only first 16 bytes of the 32 bytes slot + // delete will write zero to the full slot + } + } -We have the same behavior for implicit delete, for example when array of structs is shortened. + We have the same behavior for implicit delete, for example when array of structs is shortened. - * Function modifiers are implemented in a slightly different way regarding function parameters. - This especially has an effect if the placeholder ``_;`` is evaluated multiple times in a modifier. - In the old code generator, each function parameter has a fixed slot on the stack. If the function - is run multiple times because ``_;`` is used multiple times or used in a loop, then a change to the - function parameter's value is visible in the next execution of the function. - The new code generator implements modifiers using actual functions and passes function parameters on. - This means that multiple executions of a function will get the same values for the parameters. +- Function modifiers are implemented in a slightly different way regarding function parameters. + This especially has an effect if the placeholder ``_;`` is evaluated multiple times in a modifier. + In the old code generator, each function parameter has a fixed slot on the stack. If the function + is run multiple times because ``_;`` is used multiple times or used in a loop, then a change to the + function parameter's value is visible in the next execution of the function. + The new code generator implements modifiers using actual functions and passes function parameters on. + This means that multiple executions of a function will get the same values for the parameters. -.. code-block:: solidity + .. code-block:: solidity - // SPDX-License-Identifier: GPL-3.0 - pragma solidity >=0.7.0; - contract C { - function f(uint _a) public pure mod() returns (uint _r) { - _r = _a++; - } - modifier mod() { _; _; } - } + // SPDX-License-Identifier: GPL-3.0 + pragma solidity >=0.7.0; + contract C { + function f(uint _a) public pure mod() returns (uint _r) { + _r = _a++; + } + modifier mod() { _; _; } + } -If you execute ``f(0)`` in the old code generator, it will return ``2``, while -it will return ``1`` when using the new code generator. + If you execute ``f(0)`` in the old code generator, it will return ``2``, while + it will return ``1`` when using the new code generator. - * The order of contract initialization has changed in case of inheritance. +- The order of contract initialization has changed in case of inheritance. -The order used to be: - - All state variables are zero-initialized at the beginning. - - Evaluate base constructor arguments from most derived to most base contract. - - Initialize all state variables in the whole inheritance hierarchy from most base to most derived. - - Run the constructor, if present, for all contracts in the linearized hierarchy from most base to most derived. + The order used to be: -New order: - - All state variables are zero-initialized at the beginning. - - Evaluate base constructor arguments from most derived to most base contract. - - For every contract in order from most base to most derived in the linearized hierarchy execute: - 1. If present at declaration, initial values are assigned to state variables. - 2. Constructor, if present. + - All state variables are zero-initialized at the beginning. + - Evaluate base constructor arguments from most derived to most base contract. + - Initialize all state variables in the whole inheritance hierarchy from most base to most derived. + - Run the constructor, if present, for all contracts in the linearized hierarchy from most base to most derived. + + New order: + + - All state variables are zero-initialized at the beginning. + - Evaluate base constructor arguments from most derived to most base contract. + - For every contract in order from most base to most derived in the linearized hierarchy execute: + + 1. If present at declaration, initial values are assigned to state variables. + 2. Constructor, if present. This causes differences in some contracts, for example: -.. code-block:: solidity + .. code-block:: solidity - // SPDX-License-Identifier: GPL-3.0 - pragma solidity >0.7.0; + // SPDX-License-Identifier: GPL-3.0 + pragma solidity >0.7.0; - contract A { - uint x; - constructor() { - x = 42; - } - function f() public view returns(uint256) { - return x; - } - } - contract B is A { - uint public y = f(); - } + contract A { + uint x; + constructor() { + x = 42; + } + function f() public view returns(uint256) { + return x; + } + } + contract B is A { + uint public y = f(); + } -Previously, ``y`` would be set to 0. This is due to the fact that we would first initialize state variables: First, ``x`` is set to 0, and when initializing ``y``, ``f()`` would return 0 causing ``y`` to be 0 as well. -With the new rules, ``y`` will be set to 42. We first initialize ``x`` to 0, then call A's constructor which sets ``x`` to 42. Finally, when initializing ``y``, ``f()`` returns 42 causing ``y`` to be 42. + Previously, ``y`` would be set to 0. This is due to the fact that we would first initialize state variables: First, ``x`` is set to 0, and when initializing ``y``, ``f()`` would return 0 causing ``y`` to be 0 as well. + With the new rules, ``y`` will be set to 42. We first initialize ``x`` to 0, then call A's constructor which sets ``x`` to 42. Finally, when initializing ``y``, ``f()`` returns 42 causing ``y`` to be 42. - * Copying ``bytes`` arrays from memory to storage is implemented in a different way. The old code generator always copies full words, while the new one cuts the byte array after its end. The old behaviour can lead to dirty data being copied after the end of the array (but still in the same storage slot). -This causes differences in some contracts, for example: +- Copying ``bytes`` arrays from memory to storage is implemented in a different way. The old code generator always copies full words, while the new one cuts the byte array after its end. The old behaviour can lead to dirty data being copied after the end of the array (but still in the same storage slot). + This causes differences in some contracts, for example: -.. code-block:: solidity + .. code-block:: solidity - // SPDX-License-Identifier: GPL-3.0 - pragma solidity >0.8.0; + // SPDX-License-Identifier: GPL-3.0 + pragma solidity >0.8.0; - contract C { - bytes x; - function f() public returns (uint _r) { - bytes memory m = "tmp"; - assembly { - mstore(m, 8) - mstore(add(m, 32), "deadbeef15dead") - } - x = m; - assembly { - _r := sload(x.slot) - } - } - } + contract C { + bytes x; + function f() public returns (uint _r) { + bytes memory m = "tmp"; + assembly { + mstore(m, 8) + mstore(add(m, 32), "deadbeef15dead") + } + x = m; + assembly { + _r := sload(x.slot) + } + } + } -Previously ``f()`` would return ``0x6465616462656566313564656164000000000000000000000000000000000010`` (it has correct length, and correct first 8 elements, but then it contains dirty data which was set via assembly). -Now it is returning ``0x6465616462656566000000000000000000000000000000000000000000000010`` (it has correct length, and correct elements, but does not contain superfluous data). + Previously ``f()`` would return ``0x6465616462656566313564656164000000000000000000000000000000000010`` (it has correct length, and correct first 8 elements, but then it contains dirty data which was set via assembly). + Now it is returning ``0x6465616462656566000000000000000000000000000000000000000000000010`` (it has correct length, and correct elements, but does not contain superfluous data). -.. index:: ! evaluation order; expression + .. index:: ! evaluation order; expression -* For the old code generator, the evaluation order of expressions is unspecified. +- For the old code generator, the evaluation order of expressions is unspecified. For the new code generator, we try to evaluate in source order (left to right), but do not guarantee it. This can lead to semantic differences. -For example: + For example: -.. code-block:: solidity + .. code-block:: solidity - // SPDX-License-Identifier: GPL-3.0 - pragma solidity >0.8.0; - contract C { - function preincr_u8(uint8 _a) public pure returns (uint8) { - return ++_a + _a; - } - } + // SPDX-License-Identifier: GPL-3.0 + pragma solidity >0.8.0; + contract C { + function preincr_u8(uint8 _a) public pure returns (uint8) { + return ++_a + _a; + } + } -The function ``preincr_u8(1)`` returns the following values: -- Old code generator: 3 (``1 + 2``) but the return value is unspecified in general -- New code generator: 4 (``2 + 2``) but the return value is not guaranteed + The function ``preincr_u8(1)`` returns the following values: -.. index:: ! evaluation order; function arguments + - Old code generator: 3 (``1 + 2``) but the return value is unspecified in general + - New code generator: 4 (``2 + 2``) but the return value is not guaranteed -On the other hand, function argument expressions are evaluated in the same order by both code generators with the exception of the global functions ``addmod`` and ``mulmod``. -For example: + .. index:: ! evaluation order; function arguments -.. code-block:: solidity + On the other hand, function argument expressions are evaluated in the same order by both code generators with the exception of the global functions ``addmod`` and ``mulmod``. + For example: - // SPDX-License-Identifier: GPL-3.0 - pragma solidity >0.8.0; - contract C { - function add(uint8 _a, uint8 _b) public pure returns (uint8) { - return _a + _b; - } - function g(uint8 _a, uint8 _b) public pure returns (uint8) { - return add(++_a + ++_b, _a + _b); - } - } + .. code-block:: solidity -The function ``g(1, 2)`` returns the following values: -- Old code generator: ``10`` (``add(2 + 3, 2 + 3)``) but the return value is unspecified in general -- New code generator: ``10`` but the return value is not guaranteed + // SPDX-License-Identifier: GPL-3.0 + pragma solidity >0.8.0; + contract C { + function add(uint8 _a, uint8 _b) public pure returns (uint8) { + return _a + _b; + } + function g(uint8 _a, uint8 _b) public pure returns (uint8) { + return add(++_a + ++_b, _a + _b); + } + } -The arguments to the global functions ``addmod`` and ``mulmod`` are evaluated right-to-left by the old code generator -and left-to-right by the new code generator. -For example: + The function ``g(1, 2)`` returns the following values: -:: - // SPDX-License-Identifier: GPL-3.0 - pragma solidity >0.8.0; - contract C { - function f() public pure returns (uint256 aMod, uint256 mMod) { - uint256 x = 3; - // Old code gen: add/mulmod(5, 4, 3) - // New code gen: add/mulmod(4, 5, 5) - aMod = addmod(++x, ++x, x); - mMod = mulmod(++x, ++x, x); - } - } + - Old code generator: ``10`` (``add(2 + 3, 2 + 3)``) but the return value is unspecified in general + - New code generator: ``10`` but the return value is not guaranteed -The function ``f()`` returns the following values: -- Old code generator: ``aMod = 0`` and ``mMod = 2`` -- New code generator: ``aMod = 4`` and ``mMod = 0`` + The arguments to the global functions ``addmod`` and ``mulmod`` are evaluated right-to-left by the old code generator + and left-to-right by the new code generator. + For example: + + :: + + // SPDX-License-Identifier: GPL-3.0 + pragma solidity >0.8.0; + contract C { + function f() public pure returns (uint256 aMod, uint256 mMod) { + uint256 x = 3; + // Old code gen: add/mulmod(5, 4, 3) + // New code gen: add/mulmod(4, 5, 5) + aMod = addmod(++x, ++x, x); + mMod = mulmod(++x, ++x, x); + } + } + + The function ``f()`` returns the following values: + + - Old code generator: ``aMod = 0`` and ``mMod = 2`` + - New code generator: ``aMod = 4`` and ``mMod = 0`` Internals @@ -234,6 +241,7 @@ For example: } The function ``f(1)`` returns the following values: + - Old code generator: (``fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe``, ``00000000000000000000000000000000000000000000000000000000000000fe``) - New code generator: (``00000000000000000000000000000000000000000000000000000000000000fe``, ``00000000000000000000000000000000000000000000000000000000000000fe``) diff --git a/docs/natspec-format.rst b/docs/natspec-format.rst index a49ddd895..297bbc8c6 100644 --- a/docs/natspec-format.rst +++ b/docs/natspec-format.rst @@ -166,9 +166,9 @@ Inheritance Notes Functions without NatSpec will automatically inherit the documentation of their base function. Exceptions to this are: - * When the parameter names are different. - * When there is more than one base function. - * When there is an explicit ``@inheritdoc`` tag which specifies which contract should be used to inherit. +* When the parameter names are different. +* When there is more than one base function. +* When there is an explicit ``@inheritdoc`` tag which specifies which contract should be used to inherit. .. _header-output: diff --git a/docs/types/value-types.rst b/docs/types/value-types.rst index caa00a83e..8ec1edf27 100644 --- a/docs/types/value-types.rst +++ b/docs/types/value-types.rst @@ -128,10 +128,10 @@ The modulo operation ``a % n`` yields the remainder ``r`` after the division of by the operand ``n``, where ``q = int(a / n)`` and ``r = a - (n * q)``. This means that modulo results in the same sign as its left operand (or zero) and ``a % n == -(-a % n)`` holds for negative ``a``: - * ``int256(5) % int256(2) == int256(1)`` - * ``int256(5) % int256(-2) == int256(1)`` - * ``int256(-5) % int256(2) == int256(-1)`` - * ``int256(-5) % int256(-2) == int256(-1)`` +* ``int256(5) % int256(2) == int256(1)`` +* ``int256(5) % int256(-2) == int256(1)`` +* ``int256(-5) % int256(2) == int256(-1)`` +* ``int256(-5) % int256(-2) == int256(-1)`` .. note:: Modulo with zero causes a :ref:`Panic error`. This check can **not** be disabled through ``unchecked { ... }``. @@ -184,8 +184,8 @@ Address The address type comes in two flavours, which are largely identical: - - ``address``: Holds a 20 byte value (size of an Ethereum address). - - ``address payable``: Same as ``address``, but with the additional members ``transfer`` and ``send``. +- ``address``: Holds a 20 byte value (size of an Ethereum address). +- ``address payable``: Same as ``address``, but with the additional members ``transfer`` and ``send``. The idea behind this distinction is that ``address payable`` is an address you can send Ether to, while a plain ``address`` cannot be sent Ether. @@ -510,15 +510,15 @@ String literals can only contain printable ASCII characters, which means the cha Additionally, string literals also support the following escape characters: - - ``\`` (escapes an actual newline) - - ``\\`` (backslash) - - ``\'`` (single quote) - - ``\"`` (double quote) - - ``\n`` (newline) - - ``\r`` (carriage return) - - ``\t`` (tab) - - ``\xNN`` (hex escape, see below) - - ``\uNNNN`` (unicode escape, see below) +- ``\`` (escapes an actual newline) +- ``\\`` (backslash) +- ``\'`` (single quote) +- ``\"`` (double quote) +- ``\n`` (newline) +- ``\r`` (carriage return) +- ``\t`` (tab) +- ``\xNN`` (hex escape, see below) +- ``\uNNNN`` (unicode escape, see below) ``\xNN`` takes a hex value and inserts the appropriate byte, while ``\uNNNN`` takes a Unicode codepoint and inserts an UTF-8 sequence. @@ -660,9 +660,9 @@ their parameter types are identical, their return types are identical, their internal/external property is identical and the state mutability of ``A`` is more restrictive than the state mutability of ``B``. In particular: - - ``pure`` functions can be converted to ``view`` and ``non-payable`` functions - - ``view`` functions can be converted to ``non-payable`` functions - - ``payable`` functions can be converted to ``non-payable`` functions +- ``pure`` functions can be converted to ``view`` and ``non-payable`` functions +- ``view`` functions can be converted to ``non-payable`` functions +- ``payable`` functions can be converted to ``non-payable`` functions No other conversions between function types are possible. diff --git a/docs/units-and-global-variables.rst b/docs/units-and-global-variables.rst index 2e912ae84..df79d1b13 100644 --- a/docs/units-and-global-variables.rst +++ b/docs/units-and-global-variables.rst @@ -29,11 +29,11 @@ Suffixes like ``seconds``, ``minutes``, ``hours``, ``days`` and ``weeks`` after literal numbers can be used to specify units of time where seconds are the base unit and units are considered naively in the following way: - * ``1 == 1 seconds`` - * ``1 minutes == 60 seconds`` - * ``1 hours == 60 minutes`` - * ``1 days == 24 hours`` - * ``1 weeks == 7 days`` +* ``1 == 1 seconds`` +* ``1 minutes == 60 seconds`` +* ``1 hours == 60 minutes`` +* ``1 days == 24 hours`` +* ``1 weeks == 7 days`` Take care if you perform calendar calculations using these units, because not every year equals 365 days and not even every day has 24 hours diff --git a/docs/using-the-compiler.rst b/docs/using-the-compiler.rst index a023a8877..5f261c165 100644 --- a/docs/using-the-compiler.rst +++ b/docs/using-the-compiler.rst @@ -30,8 +30,8 @@ set it to ``--optimize-runs=1``. If you expect many transactions and do not care output size, set ``--optimize-runs`` to a high number. This parameter has effects on the following (this might change in the future): - - the size of the binary search in the function dispatch routine - - the way constants like large numbers or strings are stored +- the size of the binary search in the function dispatch routine +- the way constants like large numbers or strings are stored .. index:: allowed paths, --allow-paths, base path, --base-path diff --git a/docs/yul.rst b/docs/yul.rst index 093bf6597..b8557475f 100644 --- a/docs/yul.rst +++ b/docs/yul.rst @@ -157,16 +157,16 @@ where an object is expected. Inside a code block, the following elements can be used (see the later sections for more details): - - literals, i.e. ``0x123``, ``42`` or ``"abc"`` (strings up to 32 characters) - - calls to builtin functions, e.g. ``add(1, mload(0))`` - - variable declarations, e.g. ``let x := 7``, ``let x := add(y, 3)`` or ``let x`` (initial value of 0 is assigned) - - identifiers (variables), e.g. ``add(3, x)`` - - assignments, e.g. ``x := add(y, 3)`` - - blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }`` - - if statements, e.g. ``if lt(a, b) { sstore(0, 1) }`` - - switch statements, e.g. ``switch mload(0) case 0 { revert() } default { mstore(0, 1) }`` - - for loops, e.g. ``for { let i := 0} lt(i, 10) { i := add(i, 1) } { mstore(i, 7) }`` - - function definitions, e.g. ``function f(a, b) -> c { c := add(a, b) }``` +- literals, i.e. ``0x123``, ``42`` or ``"abc"`` (strings up to 32 characters) +- calls to builtin functions, e.g. ``add(1, mload(0))`` +- variable declarations, e.g. ``let x := 7``, ``let x := add(y, 3)`` or ``let x`` (initial value of 0 is assigned) +- identifiers (variables), e.g. ``add(3, x)`` +- assignments, e.g. ``x := add(y, 3)`` +- blocks where local variables are scoped inside, e.g. ``{ let x := 3 { let y := add(x, 1) } }`` +- if statements, e.g. ``if lt(a, b) { sstore(0, 1) }`` +- switch statements, e.g. ``switch mload(0) case 0 { revert() } default { mstore(0, 1) }`` +- for loops, e.g. ``for { let i := 0} lt(i, 10) { i := add(i, 1) } { mstore(i, 7) }`` +- function definitions, e.g. ``function f(a, b) -> c { c := add(a, b) }``` Multiple syntactical elements can follow each other simply separated by whitespace, i.e. there is no terminating ``;`` or newline required. @@ -985,9 +985,10 @@ that are not known to the Yul compiler. It also allows you to create bytecode sequences that will not be modified by the optimizer. The functions are ``verbatim_i_o("", ...)``, where - - ``n`` is a decimal between 0 and 99 that specifies the number of input stack slots / variables - - ``m`` is a decimal between 0 and 99 that specifies the number of output stack slots / variables - - ``data`` is a string literal that contains the sequence of bytes + +- ``n`` is a decimal between 0 and 99 that specifies the number of input stack slots / variables +- ``m`` is a decimal between 0 and 99 that specifies the number of output stack slots / variables +- ``data`` is a string literal that contains the sequence of bytes If you for example want to define a function that multiplies the input by two, without the optimizer touching the constant two, you can use @@ -1022,15 +1023,15 @@ verbatim bytecode that are not checked by the compiler. Violations of these restrictions can result in undefined behaviour. - - Control-flow should not jump into or out of verbatim blocks, - but it can jump within the same verbatim block. - - Stack contents apart from the input and output parameters - should not be accessed. - - The stack height difference should be exactly ``m - n`` - (output slots minus input slots). - - Verbatim bytecode cannot make any assumptions about the - surrounding bytecode. All required parameters have to be - passed in as stack variables. +- Control-flow should not jump into or out of verbatim blocks, + but it can jump within the same verbatim block. +- Stack contents apart from the input and output parameters + should not be accessed. +- The stack height difference should be exactly ``m - n`` + (output slots minus input slots). +- Verbatim bytecode cannot make any assumptions about the + surrounding bytecode. All required parameters have to be + passed in as stack variables. The optimizer does not analyze verbatim bytecode and always assumes that it modifies all aspects of state and thus can only diff --git a/libsolidity/codegen/ir/README.md b/libsolidity/codegen/ir/README.md index 468ecd269..cfabd83b9 100644 --- a/libsolidity/codegen/ir/README.md +++ b/libsolidity/codegen/ir/README.md @@ -6,5 +6,5 @@ with EVM dialect. The main semantic differences to the legacy code generator are the following: - - Arithmetic operations cause a failing assertion if the result is not in range. - - Resizing a storage array to a length larger than 2**64 causes a failing assertion. \ No newline at end of file +- Arithmetic operations cause a failing assertion if the result is not in range. +- Resizing a storage array to a length larger than 2**64 causes a failing assertion.