Merge pull request #5145 from ethereum/hashLinker

Hash linker
This commit is contained in:
chriseth 2018-10-12 15:53:45 +02:00 committed by GitHub
commit 94526b2d92
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
11 changed files with 114 additions and 20 deletions

View File

@ -17,6 +17,7 @@ Breaking Changes:
* Commandline interface: Remove obsolete ``--formal`` option.
* Commandline interface: Rename the ``--julia`` option to ``--yul``.
* Commandline interface: Require ``-`` if standard input is used as source.
* Commandline interface: Use hash of library name for link placeholder instead of name itself.
* Compiler interface: Disallow remappings with empty prefix.
* Control Flow Analyzer: Consider mappings as well when checking for uninitialized return values.
* Control Flow Analyzer: Turn warning about returning uninitialized storage pointers into an error.

View File

@ -158,6 +158,14 @@ Command Line and JSON Interfaces
node was replaced by a field called ``kind`` which can have the
value ``"constructor"``, ``"fallback"`` or ``"function"``.
* In unlinked binary hex files, library address placeholders are now
the first 36 hex characters of the keccak256 hash of the fully qualified
library name, surrounded by ``$...$``. Previously,
just the fully qualified library name was used.
This recudes the chances of collisions, especially when long paths are used.
Binary files now also contain a list of mappings from these placeholders
to the fully qualified names.
Constructors
------------

View File

@ -41,14 +41,26 @@ If there are multiple matches due to remappings, the one with the longest common
For security reasons the compiler has restrictions what directories it can access. Paths (and their subdirectories) of source files specified on the commandline and paths defined by remappings are allowed for import statements, but everything else is rejected. Additional paths (and their subdirectories) can be allowed via the ``--allow-paths /sample/path,/another/sample/path`` switch.
If your contracts use :ref:`libraries <libraries>`, you will notice that the bytecode contains substrings of the form ``__LibraryName______``. You can use ``solc`` as a linker meaning that it will insert the library addresses for you at those points:
If your contracts use :ref:`libraries <libraries>`, you will notice that the bytecode contains substrings of the form ``__$53aea86b7d70b31448b230b20ae141a537$__``. These are placeholders for the actual library addresses.
The placeholder is a 34 character prefix of the hex encoding of the keccak256 hash of the fully qualified library name.
The bytecode file will also contain lines of the form ``// <placeholder> -> <fq library name>`` at the end to help
identify which libraries the placeholders represent. Note that the fully qualified library name
is the path of its source file and the library name separated by ``:``.
You can use ``solc`` as a linker meaning that it will insert the library addresses for you at those points:
Either add ``--libraries "Math:0x12345678901234567890 Heap:0xabcdef0123456"`` to your command to provide an address for each library or store the string in a file (one library per line) and run ``solc`` using ``--libraries fileName``.
Either add ``--libraries "file.sol:Math:0x1234567890123456789012345678901234567890 file.sol:Heap:0xabCD567890123456789012345678901234567890"`` to your command to provide an address for each library or store the string in a file (one library per line) and run ``solc`` using ``--libraries fileName``.
If ``solc`` is called with the option ``--link``, all input files are interpreted to be unlinked binaries (hex-encoded) in the ``__LibraryName____``-format given above and are linked in-place (if the input is read from stdin, it is written to stdout). All options except ``--libraries`` are ignored (including ``-o``) in this case.
If ``solc`` is called with the option ``--link``, all input files are interpreted to be unlinked binaries (hex-encoded) in the ``__$53aea86b7d70b31448b230b20ae141a537$__``-format given above and are linked in-place (if the input is read from stdin, it is written to stdout). All options except ``--libraries`` are ignored (including ``-o``) in this case.
If ``solc`` is called with the option ``--standard-json``, it will expect a JSON input (as explained below) on the standard input, and return a JSON output on the standard output. This is the recommended interface for more complex and especially automated uses.
.. note::
The library placeholder used to be the fully qualified name of the library itself
instead of the hash of it. This format is still supported by ``solc --link`` but
the compiler will no longer output it. This change was made to reduce
the likelihood of a collision between libraries, since only the first 36 characters
of the fully qualified library name could be used.
.. _evm-version:
.. index:: ! EVM version, compile target

View File

@ -76,18 +76,18 @@ bytes dev::fromHex(std::string const& _s, WhenError _throw)
bool dev::passesAddressChecksum(string const& _str, bool _strict)
{
string s = _str.substr(0, 2) == "0x" ? _str.substr(2) : _str;
string s = _str.substr(0, 2) == "0x" ? _str : "0x" + _str;
if (s.length() != 40)
if (s.length() != 42)
return false;
if (!_strict && (
_str.find_first_of("abcdef") == string::npos ||
_str.find_first_of("ABCDEF") == string::npos
s.find_first_of("abcdef") == string::npos ||
s.find_first_of("ABCDEF") == string::npos
))
return true;
return _str == dev::getChecksummedAddress(_str);
return s == dev::getChecksummedAddress(s);
}
string dev::getChecksummedAddress(string const& _addr)

View File

@ -94,7 +94,7 @@ void dev::writeFile(std::string const& _file, bytesConstRef _data, bool _writeDe
{
// create directory if not existent
fs::path p(_file);
if (!fs::exists(p.parent_path()))
if (!p.parent_path().empty() && !fs::exists(p.parent_path()))
{
fs::create_directories(p.parent_path());
try

View File

@ -21,6 +21,7 @@
#include <libevmasm/LinkerObject.h>
#include <libdevcore/CommonData.h>
#include <libdevcore/SHA3.h>
using namespace dev;
using namespace dev::eth;
@ -50,14 +51,19 @@ string LinkerObject::toHex() const
for (auto const& ref: linkReferences)
{
size_t pos = ref.first * 2;
string const& name = ref.second;
string hash = libraryPlaceholder(ref.second);
hex[pos] = hex[pos + 1] = hex[pos + 38] = hex[pos + 39] = '_';
for (size_t i = 0; i < 36; ++i)
hex[pos + 2 + i] = i < name.size() ? name[i] : '_';
hex[pos + 2 + i] = hash.at(i);
}
return hex;
}
string LinkerObject::libraryPlaceholder(string const& _libraryName)
{
return "$" + keccak256(_libraryName).hex().substr(0, 34) + "$";
}
h160 const*
LinkerObject::matchLibrary(
string const& _linkRefName,

View File

@ -50,6 +50,11 @@ struct LinkerObject
/// addresses by placeholders.
std::string toHex() const;
/// @returns a 36 character string that is used as a placeholder for the library
/// address (enclosed by `__` on both sides). The placeholder is the hex representation
/// of the first 18 bytes of the keccak-256 hash of @a _libraryName.
static std::string libraryPlaceholder(std::string const& _libraryName);
private:
static h160 const* matchLibrary(
std::string const& _linkRefName,

View File

@ -226,21 +226,21 @@ void CommandLineInterface::handleBinary(string const& _contract)
if (m_args.count(g_argBinary))
{
if (m_args.count(g_argOutputDir))
createFile(m_compiler->filesystemFriendlyName(_contract) + ".bin", m_compiler->object(_contract).toHex());
createFile(m_compiler->filesystemFriendlyName(_contract) + ".bin", objectWithLinkRefsHex(m_compiler->object(_contract)));
else
{
cout << "Binary: " << endl;
cout << m_compiler->object(_contract).toHex() << endl;
cout << objectWithLinkRefsHex(m_compiler->object(_contract)) << endl;
}
}
if (m_args.count(g_argBinaryRuntime))
{
if (m_args.count(g_argOutputDir))
createFile(m_compiler->filesystemFriendlyName(_contract) + ".bin-runtime", m_compiler->runtimeObject(_contract).toHex());
createFile(m_compiler->filesystemFriendlyName(_contract) + ".bin-runtime", objectWithLinkRefsHex(m_compiler->runtimeObject(_contract)));
else
{
cout << "Binary of the runtime part: " << endl;
cout << m_compiler->runtimeObject(_contract).toHex() << endl;
cout << objectWithLinkRefsHex(m_compiler->runtimeObject(_contract)) << endl;
}
}
}
@ -482,9 +482,23 @@ bool CommandLineInterface::parseLibraryOption(string const& _input)
string addrString(lib.begin() + colon + 1, lib.end());
boost::trim(libName);
boost::trim(addrString);
if (addrString.substr(0, 2) == "0x")
addrString = addrString.substr(2);
if (addrString.empty())
{
cerr << "Empty address provided for library \"" << libName << "\": " << endl;
cerr << "Note that there should not be any whitespace after the colon." << endl;
return false;
}
else if (addrString.length() != 40)
{
cerr << "Invalid length for address for library \"" << libName << "\": " << addrString.length() << " instead of 40 characters." << endl;
return false;
}
if (!passesAddressChecksum(addrString, false))
{
cerr << "Invalid checksum on library address \"" << libName << "\": " << addrString << endl;
cerr << "Invalid checksum on address for library \"" << libName << "\": " << addrString << endl;
cerr << "The correct checksum is " << dev::getChecksummedAddress(addrString) << endl;
return false;
}
bytes binAddr = fromHex(addrString);
@ -569,7 +583,7 @@ Allowed options)",
g_argLibraries.c_str(),
po::value<vector<string>>()->value_name("libs"),
"Direct string or file containing library addresses. Syntax: "
"<libraryName>: <address> [, or whitespace] ...\n"
"<libraryName>:<address> [, or whitespace] ...\n"
"Address is interpreted as a hex string optionally prefixed by 0x."
)
(
@ -1056,8 +1070,12 @@ bool CommandLineInterface::link()
{
string const& name = library.first;
// Library placeholders are 40 hex digits (20 bytes) that start and end with '__'.
// This leaves 36 characters for the library name, while too short library names are
// padded on the right with '_' and too long names are truncated.
// This leaves 36 characters for the library identifier. The identifier used to
// be just the cropped or '_'-padded library name, but this changed to
// the cropped hex representation of the hash of the library name.
// We support both ways of linking here.
librariesReplacements["__" + eth::LinkerObject::libraryPlaceholder(name) + "__"] = library.second;
string replacement = "__";
for (size_t i = 0; i < placeholderSize - 4; ++i)
replacement.push_back(i < name.size() ? name[i] : '_');
@ -1087,6 +1105,11 @@ bool CommandLineInterface::link()
cerr << "Reference \"" << name << "\" in file \"" << src.first << "\" still unresolved." << endl;
it += placeholderSize;
}
// Remove hints for resolved libraries.
for (auto const& library: m_libraries)
boost::algorithm::erase_all(src.second, "\n" + libraryPlaceholderHint(library.first));
while (!src.second.empty() && *prev(src.second.end()) == '\n')
src.second.resize(src.second.size() - 1);
}
return true;
}
@ -1100,6 +1123,23 @@ void CommandLineInterface::writeLinkedFiles()
writeFile(src.first, src.second);
}
string CommandLineInterface::libraryPlaceholderHint(string const& _libraryName)
{
return "// " + eth::LinkerObject::libraryPlaceholder(_libraryName) + " -> " + _libraryName;
}
string CommandLineInterface::objectWithLinkRefsHex(eth::LinkerObject const& _obj)
{
string out = _obj.toHex();
if (!_obj.linkReferences.empty())
{
out += "\n";
for (auto const& linkRef: _obj.linkReferences)
out += "\n" + libraryPlaceholderHint(linkRef.second);
}
return out;
}
bool CommandLineInterface::assemble(
AssemblyStack::Language _language,
AssemblyStack::Machine _targetMachine

View File

@ -54,6 +54,10 @@ public:
private:
bool link();
void writeLinkedFiles();
/// @returns the ``// <identifier> -> name`` hint for library placeholders.
static std::string libraryPlaceholderHint(std::string const& _libraryName);
/// @returns the full object with library placeholder hints in hex.
static std::string objectWithLinkRefsHex(eth::LinkerObject const& _obj);
bool assemble(AssemblyStack::Language _language, AssemblyStack::Machine _targetMachine);

View File

@ -233,6 +233,24 @@ echo '' | "$SOLC" - --link --libraries a:0x90f20564390eAe531E810af625A22f51385Cd
printTask "Testing long library names..."
echo '' | "$SOLC" - --link --libraries aveeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeerylonglibraryname:0x90f20564390eAe531E810af625A22f51385Cd222 >/dev/null
printTask "Testing linking itself..."
SOLTMPDIR=$(mktemp -d)
(
cd "$SOLTMPDIR"
set -e
echo 'library L { function f() public pure {} } contract C { function f() public pure { L.f(); } }' > x.sol
"$SOLC" --bin -o . x.sol 2>/dev/null
# Explanation and placeholder should be there
grep -q '//' C.bin && grep -q '__' C.bin
# But not in library file.
grep -q -v '[/_]' L.bin
# Now link
"$SOLC" --link --libraries x.sol:L:0x90f20564390eAe531E810af625A22f51385Cd222 C.bin
# Now the placeholder and explanation should be gone.
grep -q -v '[/_]' C.bin
)
rm -rf "$SOLTMPDIR"
printTask "Testing overwriting files..."
SOLTMPDIR=$(mktemp -d)
(

View File

@ -94,7 +94,7 @@ BOOST_AUTO_TEST_CASE(all_assembly_items)
BOOST_CHECK_EQUAL(
_assembly.assemble().toHex(),
"5b6001600220606773__someLibrary___________________________"
"5b6001600220606773__$bf005014d9d0f534b8fcb268bd84c491a2$__"
"6000567f556e75736564206665617475726520666f722070757368696e"
"6720737472696e605f6001605e73000000000000000000000000000000000000000000fe"
"fe010203044266eeaa"