mirror of
				https://github.com/ethereum/solidity
				synced 2023-10-03 13:03:40 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			312 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			312 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| 
 | |
| .. index: ir breaking changes
 | |
| 
 | |
| .. _ir-breaking-changes:
 | |
| 
 | |
| *********************************
 | |
| Solidity IR-based Codegen Changes
 | |
| *********************************
 | |
| 
 | |
| Solidity can generate EVM bytecode in two different ways:
 | |
| Either directly from Solidity to EVM opcodes ("old codegen") or through
 | |
| an intermediate representation ("IR") in Yul ("new codegen" or "IR-based codegen").
 | |
| 
 | |
| The IR-based code generator was introduced with an aim to not only allow
 | |
| code generation to be more transparent and auditable but also
 | |
| to enable more powerful optimization passes that span across functions.
 | |
| 
 | |
| You can enable it on the command line using ``--via-ir``
 | |
| or with the option ``{"viaIR": true}`` in standard-json and we
 | |
| encourage everyone to try it out!
 | |
| 
 | |
| For several reasons, there are tiny semantic differences between the old
 | |
| and the IR-based code generator, mostly in areas where we would not
 | |
| expect people to rely on this behaviour anyway.
 | |
| This section highlights the main differences between the old and the IR-based codegen.
 | |
| 
 | |
| Semantic Only Changes
 | |
| =====================
 | |
| 
 | |
| This section lists the changes that are semantic-only, thus potentially
 | |
| hiding new and different behavior in existing code.
 | |
| 
 | |
| - The order of state variable initialization has changed in case of inheritance.
 | |
| 
 | |
|   The order used to be:
 | |
| 
 | |
|   - All state variables are zero-initialized at the beginning.
 | |
|   - Evaluate base constructor arguments from most derived to most base contract.
 | |
|   - Initialize all state variables in the whole inheritance hierarchy from most base to most derived.
 | |
|   - Run the constructor, if present, for all contracts in the linearized hierarchy from most base to most derived.
 | |
| 
 | |
|   New order:
 | |
| 
 | |
|   - All state variables are zero-initialized at the beginning.
 | |
|   - Evaluate base constructor arguments from most derived to most base contract.
 | |
|   - For every contract in order from most base to most derived in the linearized hierarchy:
 | |
| 
 | |
|       1. Initialize state variables.
 | |
|       2. Run the constructor (if present).
 | |
| 
 | |
|   This causes differences in contracts where the initial value of a state
 | |
|   variable relies on the result of the constructor in another contract:
 | |
| 
 | |
|   .. code-block:: solidity
 | |
| 
 | |
|       // SPDX-License-Identifier: GPL-3.0
 | |
|       pragma solidity >=0.7.1;
 | |
| 
 | |
|       contract A {
 | |
|           uint x;
 | |
|           constructor() {
 | |
|               x = 42;
 | |
|           }
 | |
|           function f() public view returns(uint256) {
 | |
|               return x;
 | |
|           }
 | |
|       }
 | |
|       contract B is A {
 | |
|           uint public y = f();
 | |
|       }
 | |
| 
 | |
|   Previously, ``y`` would be set to 0. This is due to the fact that we would first initialize state variables: First, ``x`` is set to 0, and when initializing ``y``, ``f()`` would return 0 causing ``y`` to be 0 as well.
 | |
|   With the new rules, ``y`` will be set to 42. We first initialize ``x`` to 0, then call A's constructor which sets ``x`` to 42. Finally, when initializing ``y``, ``f()`` returns 42 causing ``y`` to be 42.
 | |
| 
 | |
| - When storage structs are deleted, every storage slot that contains
 | |
|   a member of the struct is set to zero entirely. Formerly, padding space
 | |
|   was left untouched.
 | |
|   Consequently, if the padding space within a struct is used to store data
 | |
|   (e.g. in the context of a contract upgrade), you have to be aware that
 | |
|   ``delete`` will now also clear the added member (while it wouldn't
 | |
|   have been cleared in the past).
 | |
| 
 | |
|   .. code-block:: solidity
 | |
| 
 | |
|       // SPDX-License-Identifier: GPL-3.0
 | |
|       pragma solidity >=0.7.1;
 | |
| 
 | |
|       contract C {
 | |
|           struct S {
 | |
|               uint64 y;
 | |
|               uint64 z;
 | |
|           }
 | |
|           S s;
 | |
|           function f() public {
 | |
|               // ...
 | |
|               delete s;
 | |
|               // s occupies only first 16 bytes of the 32 bytes slot
 | |
|               // delete will write zero to the full slot
 | |
|           }
 | |
|       }
 | |
| 
 | |
|   We have the same behavior for implicit delete, for example when array of structs is shortened.
 | |
| 
 | |
| - Function modifiers are implemented in a slightly different way regarding function parameters and return variables.
 | |
|   This especially has an effect if the placeholder ``_;`` is evaluated multiple times in a modifier.
 | |
|   In the old code generator, each function parameter and return variable has a fixed slot on the stack.
 | |
|   If the function is run multiple times because ``_;`` is used multiple times or used in a loop, then a
 | |
|   change to the function parameter's or return variable's value is visible in the next execution of the function.
 | |
|   The new code generator implements modifiers using actual functions and passes function parameters on.
 | |
|   This means that multiple evaluations of a function's body will get the same values for the parameters,
 | |
|   and the effect on return variables is that they are reset to their default (zero) value for each
 | |
|   execution.
 | |
| 
 | |
|   .. code-block:: solidity
 | |
| 
 | |
|       // SPDX-License-Identifier: GPL-3.0
 | |
|       pragma solidity >=0.7.0;
 | |
|       contract C {
 | |
|           function f(uint a) public pure mod() returns (uint r) {
 | |
|               r = a++;
 | |
|           }
 | |
|           modifier mod() { _; _; }
 | |
|       }
 | |
| 
 | |
|   If you execute ``f(0)`` in the old code generator, it will return ``2``, while
 | |
|   it will return ``1`` when using the new code generator.
 | |
| 
 | |
|   .. code-block:: solidity
 | |
| 
 | |
|       // SPDX-License-Identifier: GPL-3.0
 | |
|       pragma solidity >=0.7.1 <0.9.0;
 | |
| 
 | |
|       contract C {
 | |
|           bool active = true;
 | |
|           modifier mod()
 | |
|           {
 | |
|               _;
 | |
|               active = false;
 | |
|               _;
 | |
|           }
 | |
|           function foo() external mod() returns (uint ret)
 | |
|           {
 | |
|               if (active)
 | |
|                   ret = 1; // Same as ``return 1``
 | |
|           }
 | |
|       }
 | |
| 
 | |
|   The function ``C.foo()`` returns the following values:
 | |
| 
 | |
|   - Old code generator: ``1`` as the return variable is initialized to ``0`` only once before the first ``_;``
 | |
|     evaluation and then overwritten by the ``return 1;``. It is not initialized again for the second ``_;``
 | |
|     evaluation and ``foo()`` does not explicitly assign it either (due to ``active == false``), thus it keeps
 | |
|     its first value.
 | |
|   - New code generator: ``0`` as all parameters, including return parameters, will be re-initialized before
 | |
|     each ``_;`` evaluation.
 | |
| 
 | |
|   .. index:: ! evaluation order; expression
 | |
| 
 | |
| - For the old code generator, the evaluation order of expressions is unspecified.
 | |
|   For the new code generator, we try to evaluate in source order (left to right), but do not guarantee it.
 | |
|   This can lead to semantic differences.
 | |
| 
 | |
|   For example:
 | |
| 
 | |
|   .. code-block:: solidity
 | |
| 
 | |
|       // SPDX-License-Identifier: GPL-3.0
 | |
|       pragma solidity >=0.8.1;
 | |
|       contract C {
 | |
|           function preincr_u8(uint8 a) public pure returns (uint8) {
 | |
|               return ++a + a;
 | |
|           }
 | |
|       }
 | |
| 
 | |
|   The function ``preincr_u8(1)`` returns the following values:
 | |
| 
 | |
|   - Old code generator: 3 (``1 + 2``) but the return value is unspecified in general
 | |
|   - New code generator: 4 (``2 + 2``) but the return value is not guaranteed
 | |
| 
 | |
|   .. index:: ! evaluation order; function arguments
 | |
| 
 | |
|   On the other hand, function argument expressions are evaluated in the same order
 | |
|   by both code generators with the exception of the global functions ``addmod`` and ``mulmod``.
 | |
|   For example:
 | |
| 
 | |
|   .. code-block:: solidity
 | |
| 
 | |
|       // SPDX-License-Identifier: GPL-3.0
 | |
|       pragma solidity >=0.8.1;
 | |
|       contract C {
 | |
|           function add(uint8 a, uint8 b) public pure returns (uint8) {
 | |
|               return a + b;
 | |
|           }
 | |
|           function g(uint8 a, uint8 b) public pure returns (uint8) {
 | |
|               return add(++a + ++b, a + b);
 | |
|           }
 | |
|       }
 | |
| 
 | |
|   The function ``g(1, 2)`` returns the following values:
 | |
| 
 | |
|   - Old code generator: ``10`` (``add(2 + 3, 2 + 3)``) but the return value is unspecified in general
 | |
|   - New code generator: ``10`` but the return value is not guaranteed
 | |
| 
 | |
|   The arguments to the global functions ``addmod`` and ``mulmod`` are evaluated right-to-left by the old code generator
 | |
|   and left-to-right by the new code generator.
 | |
|   For example:
 | |
| 
 | |
|   .. code-block:: solidity
 | |
| 
 | |
|       // SPDX-License-Identifier: GPL-3.0
 | |
|       pragma solidity >=0.8.1;
 | |
|       contract C {
 | |
|           function f() public pure returns (uint256 aMod, uint256 mMod) {
 | |
|               uint256 x = 3;
 | |
|               // Old code gen: add/mulmod(5, 4, 3)
 | |
|               // New code gen: add/mulmod(4, 5, 5)
 | |
|               aMod = addmod(++x, ++x, x);
 | |
|               mMod = mulmod(++x, ++x, x);
 | |
|           }
 | |
|       }
 | |
| 
 | |
|   The function ``f()`` returns the following values:
 | |
| 
 | |
|   - Old code generator: ``aMod = 0`` and ``mMod = 2``
 | |
|   - New code generator: ``aMod = 4`` and ``mMod = 0``
 | |
| 
 | |
| - The new code generator imposes a hard limit of ``type(uint64).max``
 | |
|   (``0xffffffffffffffff``) for the free memory pointer. Allocations that would
 | |
|   increase its value beyond this limit revert. The old code generator does not
 | |
|   have this limit.
 | |
| 
 | |
|   For example:
 | |
| 
 | |
|   .. code-block:: solidity
 | |
|       :force:
 | |
| 
 | |
|       // SPDX-License-Identifier: GPL-3.0
 | |
|       pragma solidity >0.8.0;
 | |
|       contract C {
 | |
|           function f() public {
 | |
|               uint[] memory arr;
 | |
|               // allocation size: 576460752303423481
 | |
|               // assumes freeMemPtr points to 0x80 initially
 | |
|               uint solYulMaxAllocationBeforeMemPtrOverflow = (type(uint64).max - 0x80 - 31) / 32;
 | |
|               // freeMemPtr overflows UINT64_MAX
 | |
|               arr = new uint[](solYulMaxAllocationBeforeMemPtrOverflow);
 | |
|           }
 | |
|       }
 | |
| 
 | |
|   The function `f()` behaves as follows:
 | |
| 
 | |
|   - Old code generator: runs out of gas while zeroing the array contents after the large memory allocation
 | |
|   - New code generator: reverts due to free memory pointer overflow (does not run out of gas)
 | |
| 
 | |
| 
 | |
| Internals
 | |
| =========
 | |
| 
 | |
| Internal function pointers
 | |
| --------------------------
 | |
| 
 | |
| .. index:: function pointers
 | |
| 
 | |
| The old code generator uses code offsets or tags for values of internal function pointers. This is especially complicated since
 | |
| these offsets are different at construction time and after deployment and the values can cross this border via storage.
 | |
| Because of that, both offsets are encoded at construction time into the same value (into different bytes).
 | |
| 
 | |
| In the new code generator, function pointers use internal IDs that are allocated in sequence. Since calls via jumps are not possible,
 | |
| calls through function pointers always have to use an internal dispatch function that uses the ``switch`` statement to select
 | |
| the right function.
 | |
| 
 | |
| The ID ``0`` is reserved for uninitialized function pointers which then cause a panic in the dispatch function when called.
 | |
| 
 | |
| In the old code generator, internal function pointers are initialized with a special function that always causes a panic.
 | |
| This causes a storage write at construction time for internal function pointers in storage.
 | |
| 
 | |
| Cleanup
 | |
| -------
 | |
| 
 | |
| .. index:: cleanup, dirty bits
 | |
| 
 | |
| The old code generator only performs cleanup before an operation whose result could be affected by the values of the dirty bits.
 | |
| The new code generator performs cleanup after any operation that can result in dirty bits.
 | |
| The hope is that the optimizer will be powerful enough to eliminate redundant cleanup operations.
 | |
| 
 | |
| For example:
 | |
| 
 | |
| .. code-block:: solidity
 | |
|     :force:
 | |
| 
 | |
|     // SPDX-License-Identifier: GPL-3.0
 | |
|     pragma solidity >=0.8.1;
 | |
|     contract C {
 | |
|         function f(uint8 a) public pure returns (uint r1, uint r2)
 | |
|         {
 | |
|             a = ~a;
 | |
|             assembly {
 | |
|                 r1 := a
 | |
|             }
 | |
|             r2 = a;
 | |
|         }
 | |
|     }
 | |
| 
 | |
| The function ``f(1)`` returns the following values:
 | |
| 
 | |
| - Old code generator: (``fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe``, ``00000000000000000000000000000000000000000000000000000000000000fe``)
 | |
| - New code generator: (``00000000000000000000000000000000000000000000000000000000000000fe``, ``00000000000000000000000000000000000000000000000000000000000000fe``)
 | |
| 
 | |
| Note that, unlike the new code generator, the old code generator does not perform a cleanup after the bit-not assignment (``a = ~a``).
 | |
| This results in different values being assigned (within the inline assembly block) to return value ``r1`` between the old and new code generators.
 | |
| However, both code generators perform a cleanup before the new value of ``a`` is assigned to ``r2``.
 |