mirror of
https://github.com/ethereum/solidity
synced 2023-10-03 13:03:40 +00:00
344 lines
13 KiB
ReStructuredText
344 lines
13 KiB
ReStructuredText
|
|
.. index: ir breaking changes
|
|
|
|
*********************************
|
|
Solidity IR-based Codegen Changes
|
|
*********************************
|
|
|
|
Solidity can generate EVM bytecode in two different ways:
|
|
Either directly from Solidity to EVM opcodes ("old codegen") or through
|
|
an intermediate representation ("IR") in Yul ("new codegen" or "IR-based codegen").
|
|
|
|
The IR-based code generator was introduced with an aim to not only allow
|
|
code generation to be more transparent and auditable but also
|
|
to enable more powerful optimization passes that span across functions.
|
|
|
|
Currently, the IR-based code generator is still marked experimental,
|
|
but it supports all language features and has received a lot of testing,
|
|
so we consider it almost ready for production use.
|
|
|
|
You can enable it on the command line using ``--experimental-via-ir``
|
|
or with the option ``{"viaIR": true}`` in standard-json and we
|
|
encourage everyone to try it out!
|
|
|
|
For several reasons, there are tiny semantic differences between the old
|
|
and the IR-based code generator, mostly in areas where we would not
|
|
expect people to rely on this behaviour anyway.
|
|
This section highlights the main differences between the old and the IR-based codegen.
|
|
|
|
Semantic Only Changes
|
|
=====================
|
|
|
|
This section lists the changes that are semantic-only, thus potentially
|
|
hiding new and different behavior in existing code.
|
|
|
|
- When storage structs are deleted, every storage slot that contains
|
|
a member of the struct is set to zero entirely. Formerly, padding space
|
|
was left untouched.
|
|
Consequently, if the padding space within a struct is used to store data
|
|
(e.g. in the context of a contract upgrade), you have to be aware that
|
|
``delete`` will now also clear the added member (while it wouldn't
|
|
have been cleared in the past).
|
|
|
|
.. code-block:: solidity
|
|
|
|
// SPDX-License-Identifier: GPL-3.0
|
|
pragma solidity >=0.7.1;
|
|
|
|
contract C {
|
|
struct S {
|
|
uint64 y;
|
|
uint64 z;
|
|
}
|
|
S s;
|
|
function f() public {
|
|
// ...
|
|
delete s;
|
|
// s occupies only first 16 bytes of the 32 bytes slot
|
|
// delete will write zero to the full slot
|
|
}
|
|
}
|
|
|
|
We have the same behavior for implicit delete, for example when array of structs is shortened.
|
|
|
|
- Function modifiers are implemented in a slightly different way regarding function parameters and return variables.
|
|
This especially has an effect if the placeholder ``_;`` is evaluated multiple times in a modifier.
|
|
In the old code generator, each function parameter and return variable has a fixed slot on the stack.
|
|
If the function is run multiple times because ``_;`` is used multiple times or used in a loop, then a
|
|
change to the function parameter's or return variable's value is visible in the next execution of the function.
|
|
The new code generator implements modifiers using actual functions and passes function parameters on.
|
|
This means that multiple evaluations of a function's body will get the same values for the parameters,
|
|
and the effect on return variables is that they are reset to their default (zero) value for each
|
|
execution.
|
|
|
|
.. code-block:: solidity
|
|
|
|
// SPDX-License-Identifier: GPL-3.0
|
|
pragma solidity >=0.7.0;
|
|
contract C {
|
|
function f(uint _a) public pure mod() returns (uint _r) {
|
|
_r = _a++;
|
|
}
|
|
modifier mod() { _; _; }
|
|
}
|
|
|
|
If you execute ``f(0)`` in the old code generator, it will return ``2``, while
|
|
it will return ``1`` when using the new code generator.
|
|
|
|
.. code-block:: solidity
|
|
|
|
// SPDX-License-Identifier: GPL-3.0
|
|
pragma solidity >=0.7.1 <0.9.0;
|
|
|
|
contract C {
|
|
bool active = true;
|
|
modifier mod()
|
|
{
|
|
_;
|
|
active = false;
|
|
_;
|
|
}
|
|
function foo() external mod() returns (uint ret)
|
|
{
|
|
if (active)
|
|
ret = 1; // Same as ``return 1``
|
|
}
|
|
}
|
|
|
|
The function ``C.foo()`` returns the following values:
|
|
|
|
- Old code generator: ``1`` as the return variable is initialized to ``0`` only once before the first ``_;``
|
|
evaluation and then overwritten by the ``return 1;``. It is not initialized again for the second ``_;``
|
|
evaluation and ``foo()`` does not explicitly assign it either (due to ``active == false``), thus it keeps
|
|
its first value.
|
|
- New code generator: ``0`` as all parameters, including return parameters, will be re-initialized before
|
|
each ``_;`` evaluation.
|
|
|
|
- The order of contract initialization has changed in case of inheritance.
|
|
|
|
The order used to be:
|
|
|
|
- All state variables are zero-initialized at the beginning.
|
|
- Evaluate base constructor arguments from most derived to most base contract.
|
|
- Initialize all state variables in the whole inheritance hierarchy from most base to most derived.
|
|
- Run the constructor, if present, for all contracts in the linearized hierarchy from most base to most derived.
|
|
|
|
New order:
|
|
|
|
- All state variables are zero-initialized at the beginning.
|
|
- Evaluate base constructor arguments from most derived to most base contract.
|
|
- For every contract in order from most base to most derived in the linearized hierarchy execute:
|
|
|
|
1. If present at declaration, initial values are assigned to state variables.
|
|
2. Constructor, if present.
|
|
|
|
This causes differences in some contracts, for example:
|
|
|
|
.. code-block:: solidity
|
|
|
|
// SPDX-License-Identifier: GPL-3.0
|
|
pragma solidity >=0.7.1;
|
|
|
|
contract A {
|
|
uint x;
|
|
constructor() {
|
|
x = 42;
|
|
}
|
|
function f() public view returns(uint256) {
|
|
return x;
|
|
}
|
|
}
|
|
contract B is A {
|
|
uint public y = f();
|
|
}
|
|
|
|
Previously, ``y`` would be set to 0. This is due to the fact that we would first initialize state variables: First, ``x`` is set to 0, and when initializing ``y``, ``f()`` would return 0 causing ``y`` to be 0 as well.
|
|
With the new rules, ``y`` will be set to 42. We first initialize ``x`` to 0, then call A's constructor which sets ``x`` to 42. Finally, when initializing ``y``, ``f()`` returns 42 causing ``y`` to be 42.
|
|
|
|
- Copying ``bytes`` arrays from memory to storage is implemented in a different way.
|
|
The old code generator always copies full words, while the new one cuts the byte
|
|
array after its end. The old behaviour can lead to dirty data being copied after
|
|
the end of the array (but still in the same storage slot).
|
|
This causes differences in some contracts, for example:
|
|
|
|
.. code-block:: solidity
|
|
|
|
// SPDX-License-Identifier: GPL-3.0
|
|
pragma solidity >=0.8.1;
|
|
|
|
contract C {
|
|
bytes x;
|
|
function f() public returns (uint _r) {
|
|
bytes memory m = "tmp";
|
|
assembly {
|
|
mstore(m, 8)
|
|
mstore(add(m, 32), "deadbeef15dead")
|
|
}
|
|
x = m;
|
|
assembly {
|
|
_r := sload(x.slot)
|
|
}
|
|
}
|
|
}
|
|
|
|
Previously ``f()`` would return ``0x6465616462656566313564656164000000000000000000000000000000000010``
|
|
(it has correct length, and correct first 8 elements, but then it contains dirty data which was set via assembly).
|
|
Now it is returning ``0x6465616462656566000000000000000000000000000000000000000000000010`` (it has
|
|
correct length, and correct elements, but does not contain superfluous data).
|
|
|
|
.. index:: ! evaluation order; expression
|
|
|
|
- For the old code generator, the evaluation order of expressions is unspecified.
|
|
For the new code generator, we try to evaluate in source order (left to right), but do not guarantee it.
|
|
This can lead to semantic differences.
|
|
|
|
For example:
|
|
|
|
.. code-block:: solidity
|
|
|
|
// SPDX-License-Identifier: GPL-3.0
|
|
pragma solidity >=0.8.1;
|
|
contract C {
|
|
function preincr_u8(uint8 _a) public pure returns (uint8) {
|
|
return ++_a + _a;
|
|
}
|
|
}
|
|
|
|
The function ``preincr_u8(1)`` returns the following values:
|
|
|
|
- Old code generator: 3 (``1 + 2``) but the return value is unspecified in general
|
|
- New code generator: 4 (``2 + 2``) but the return value is not guaranteed
|
|
|
|
.. index:: ! evaluation order; function arguments
|
|
|
|
On the other hand, function argument expressions are evaluated in the same order
|
|
by both code generators with the exception of the global functions ``addmod`` and ``mulmod``.
|
|
For example:
|
|
|
|
.. code-block:: solidity
|
|
|
|
// SPDX-License-Identifier: GPL-3.0
|
|
pragma solidity >=0.8.1;
|
|
contract C {
|
|
function add(uint8 _a, uint8 _b) public pure returns (uint8) {
|
|
return _a + _b;
|
|
}
|
|
function g(uint8 _a, uint8 _b) public pure returns (uint8) {
|
|
return add(++_a + ++_b, _a + _b);
|
|
}
|
|
}
|
|
|
|
The function ``g(1, 2)`` returns the following values:
|
|
|
|
- Old code generator: ``10`` (``add(2 + 3, 2 + 3)``) but the return value is unspecified in general
|
|
- New code generator: ``10`` but the return value is not guaranteed
|
|
|
|
The arguments to the global functions ``addmod`` and ``mulmod`` are evaluated right-to-left by the old code generator
|
|
and left-to-right by the new code generator.
|
|
For example:
|
|
|
|
.. code-block:: solidity
|
|
|
|
// SPDX-License-Identifier: GPL-3.0
|
|
pragma solidity >=0.8.1;
|
|
contract C {
|
|
function f() public pure returns (uint256 aMod, uint256 mMod) {
|
|
uint256 x = 3;
|
|
// Old code gen: add/mulmod(5, 4, 3)
|
|
// New code gen: add/mulmod(4, 5, 5)
|
|
aMod = addmod(++x, ++x, x);
|
|
mMod = mulmod(++x, ++x, x);
|
|
}
|
|
}
|
|
|
|
The function ``f()`` returns the following values:
|
|
|
|
- Old code generator: ``aMod = 0`` and ``mMod = 2``
|
|
- New code generator: ``aMod = 4`` and ``mMod = 0``
|
|
|
|
- The new code generator imposes a hard limit of ``type(uint64).max``
|
|
(``0xffffffffffffffff``) for the free memory pointer. Allocations that would
|
|
increase its value beyond this limit revert. The old code generator does not
|
|
have this limit.
|
|
|
|
For example:
|
|
|
|
.. code-block:: solidity
|
|
:force:
|
|
|
|
// SPDX-License-Identifier: GPL-3.0
|
|
pragma solidity >0.8.0;
|
|
contract C {
|
|
function f() public {
|
|
uint[] memory arr;
|
|
// allocation size: 576460752303423481
|
|
// assumes freeMemPtr points to 0x80 initially
|
|
uint solYulMaxAllocationBeforeMemPtrOverflow = (type(uint64).max - 0x80 - 31) / 32;
|
|
// freeMemPtr overflows UINT64_MAX
|
|
arr = new uint[](solYulMaxAllocationBeforeMemPtrOverflow);
|
|
}
|
|
}
|
|
|
|
The function `f()` behaves as follows:
|
|
|
|
- Old code generator: runs out of gas while zeroing the array contents after the large memory allocation
|
|
- New code generator: reverts due to free memory pointer overflow (does not run out of gas)
|
|
|
|
|
|
Internals
|
|
=========
|
|
|
|
Internal function pointers
|
|
--------------------------
|
|
|
|
.. index:: function pointers
|
|
|
|
The old code generator uses code offsets or tags for values of internal function pointers. This is especially complicated since
|
|
these offsets are different at construction time and after deployment and the values can cross this border via storage.
|
|
Because of that, both offsets are encoded at construction time into the same value (into different bytes).
|
|
|
|
In the new code generator, function pointers use internal IDs that are allocated in sequence. Since calls via jumps are not possible,
|
|
calls through function pointers always have to use an internal dispatch function that uses the ``switch`` statement to select
|
|
the right function.
|
|
|
|
The ID ``0`` is reserved for uninitialized function pointers which then cause a panic in the dispatch function when called.
|
|
|
|
In the old code generator, internal function pointers are initialized with a special function that always causes a panic.
|
|
This causes a storage write at construction time for internal function pointers in storage.
|
|
|
|
Cleanup
|
|
-------
|
|
|
|
.. index:: cleanup, dirty bits
|
|
|
|
The old code generator only performs cleanup before an operation whose result could be affected by the values of the dirty bits.
|
|
The new code generator performs cleanup after any operation that can result in dirty bits.
|
|
The hope is that the optimizer will be powerful enough to eliminate redundant cleanup operations.
|
|
|
|
For example:
|
|
|
|
.. code-block:: solidity
|
|
:force:
|
|
|
|
// SPDX-License-Identifier: GPL-3.0
|
|
pragma solidity >=0.8.1;
|
|
contract C {
|
|
function f(uint8 _a) public pure returns (uint _r1, uint _r2)
|
|
{
|
|
_a = ~_a;
|
|
assembly {
|
|
_r1 := _a
|
|
}
|
|
_r2 = _a;
|
|
}
|
|
}
|
|
|
|
The function ``f(1)`` returns the following values:
|
|
|
|
- Old code generator: (``fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffe``, ``00000000000000000000000000000000000000000000000000000000000000fe``)
|
|
- New code generator: (``00000000000000000000000000000000000000000000000000000000000000fe``, ``00000000000000000000000000000000000000000000000000000000000000fe``)
|
|
|
|
Note that, unlike the new code generator, the old code generator does not perform a cleanup after the bit-not assignment (``_a = ~_a``).
|
|
This results in different values being assigned (within the inline assembly block) to return value ``_r1`` between the old and new code generators.
|
|
However, both code generators perform a cleanup before the new value of ``_a`` is assigned to ``_r2``.
|