mirror of
				https://github.com/ethereum/solidity
				synced 2023-10-03 13:03:40 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			72 lines
		
	
	
		
			3.6 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			72 lines
		
	
	
		
			3.6 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
.. index:: optimizer, common subexpression elimination, constant propagation
 | 
						|
 | 
						|
*************
 | 
						|
The Optimiser
 | 
						|
*************
 | 
						|
 | 
						|
This section discusses the optimiser that was first added to Solidity,
 | 
						|
which operates on opcode streams. For information on the new Yul-based optimiser,
 | 
						|
please see the `readme on github <https://github.com/ethereum/solidity/blob/develop/libyul/optimiser/README.md>`_.
 | 
						|
 | 
						|
The Solidity optimiser operates on assembly. It splits the sequence of instructions into basic blocks
 | 
						|
at ``JUMPs`` and ``JUMPDESTs``. Inside these blocks, the optimiser
 | 
						|
analyses the instructions and records every modification to the stack,
 | 
						|
memory, or storage as an expression which consists of an instruction and
 | 
						|
a list of arguments which are pointers to other expressions. The optimiser
 | 
						|
uses a component called "CommonSubexpressionEliminator" that amongst other
 | 
						|
tasks, finds expressions that are always equal (on every input) and combines
 | 
						|
them into an expression class. The optimiser first tries to find each new
 | 
						|
expression in a list of already known expressions. If this does not work,
 | 
						|
it simplifies the expression according to rules like
 | 
						|
``constant + constant = sum_of_constants`` or ``X * 1 = X``. Since this is
 | 
						|
a recursive process, we can also apply the latter rule if the second factor
 | 
						|
is a more complex expression where we know that it always evaluates to one.
 | 
						|
Modifications to storage and memory locations have to erase knowledge about
 | 
						|
storage and memory locations which are not known to be different. If we first
 | 
						|
write to location x and then to location y and both are input variables, the
 | 
						|
second could overwrite the first, so we do not know what is stored at x after
 | 
						|
we wrote to y. If simplification of the expression x - y evaluates to a
 | 
						|
non-zero constant, we know that we can keep our knowledge about what is stored at x.
 | 
						|
 | 
						|
After this process, we know which expressions have to be on the stack at
 | 
						|
the end, and have a list of modifications to memory and storage. This information
 | 
						|
is stored together with the basic blocks and is used to link them. Furthermore,
 | 
						|
knowledge about the stack, storage and memory configuration is forwarded to
 | 
						|
the next block(s). If we know the targets of all ``JUMP`` and ``JUMPI`` instructions,
 | 
						|
we can build a complete control flow graph of the program. If there is only
 | 
						|
one target we do not know (this can happen as in principle, jump targets can
 | 
						|
be computed from inputs), we have to erase all knowledge about the input state
 | 
						|
of a block as it can be the target of the unknown ``JUMP``. If the optimiser
 | 
						|
finds a ``JUMPI`` whose condition evaluates to a constant, it transforms it
 | 
						|
to an unconditional jump.
 | 
						|
 | 
						|
As the last step, the code in each block is re-generated. The optimiser creates
 | 
						|
a dependency graph from the expressions on the stack at the end of the block,
 | 
						|
and it drops every operation that is not part of this graph. It generates code
 | 
						|
that applies the modifications to memory and storage in the order they were
 | 
						|
made in the original code (dropping modifications which were found not to be
 | 
						|
needed). Finally, it generates all values that are required to be on the
 | 
						|
stack in the correct place.
 | 
						|
 | 
						|
These steps are applied to each basic block and the newly generated code
 | 
						|
is used as replacement if it is smaller. If a basic block is split at a
 | 
						|
``JUMPI`` and during the analysis, the condition evaluates to a constant,
 | 
						|
the ``JUMPI`` is replaced depending on the value of the constant. Thus code like
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
    uint x = 7;
 | 
						|
    data[7] = 9;
 | 
						|
    if (data[x] != x + 2)
 | 
						|
      return 2;
 | 
						|
    else
 | 
						|
      return 1;
 | 
						|
 | 
						|
still simplifies to code which you can compile even though the instructions contained
 | 
						|
a jump in the beginning of the process:
 | 
						|
 | 
						|
::
 | 
						|
 | 
						|
    data[7] = 9;
 | 
						|
    return 1;
 |