# Delving into the unknown This write-up summarises how to debug what appears to be a mischievous Lotus instance during our Testground tests. It also goes enumerates which assets are useful to report suspicious behaviours upstream, in a way that they are actionable. ## Querying the Lotus RPC API The `local:docker` and `cluster:k8s` map ports that you specify in the composition.toml, so you can access them externally. All our compositions should carry this fragment: ```toml [global.run_config] exposed_ports = ["6060", "1234", "2345"] ``` This tells Testground to expose the following ports: * `6060` => Go pprof. * `1234` => Lotus full node RPC. * `2345` => Lotus storage miner RPC. ### local:docker 1. Install the `lotus` binary on your host. 2. Find the container that you want to connect to in `docker ps`. * Note that our _container names_ are slightly long, and they're the last field on every line, so if your terminal is wrapping text, the port numbers will end up ABOVE the friendly/recognizable container name (e.g. `tg-lotus-soup-deals-e2e-acfc60bc1727-miners-1`). * The testground output displays the _container ID_ inside coloured angle brackets, so if you spot something spurious in a particular node, you can hone in on that one, e.g. `<< 54dd5ad916b2 >>`. ``` ⟩ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 54dd5ad916b2 be3c18d7f0d4 "/testplan" 10 seconds ago Up 8 seconds 0.0.0.0:32788->1234/tcp, 0.0.0.0:32783->2345/tcp, 0.0.0.0:32773->6060/tcp, 0.0.0.0:32777->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-clients-2 53757489ce71 be3c18d7f0d4 "/testplan" 10 seconds ago Up 8 seconds 0.0.0.0:32792->1234/tcp, 0.0.0.0:32790->2345/tcp, 0.0.0.0:32781->6060/tcp, 0.0.0.0:32786->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-clients-1 9d3e83b71087 be3c18d7f0d4 "/testplan" 10 seconds ago Up 8 seconds 0.0.0.0:32791->1234/tcp, 0.0.0.0:32789->2345/tcp, 0.0.0.0:32779->6060/tcp, 0.0.0.0:32784->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-clients-0 7bd60e75ed0e be3c18d7f0d4 "/testplan" 10 seconds ago Up 8 seconds 0.0.0.0:32787->1234/tcp, 0.0.0.0:32782->2345/tcp, 0.0.0.0:32772->6060/tcp, 0.0.0.0:32776->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-miners-1 dff229d7b342 be3c18d7f0d4 "/testplan" 10 seconds ago Up 9 seconds 0.0.0.0:32778->1234/tcp, 0.0.0.0:32774->2345/tcp, 0.0.0.0:32769->6060/tcp, 0.0.0.0:32770->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-miners-0 4cd67690e3b8 be3c18d7f0d4 "/testplan" 11 seconds ago Up 8 seconds 0.0.0.0:32785->1234/tcp, 0.0.0.0:32780->2345/tcp, 0.0.0.0:32771->6060/tcp, 0.0.0.0:32775->6060/tcp tg-lotus-soup-deals-e2e-acfc60bc1727-bootstrapper-0 aeb334adf88d iptestground/sidecar:edge "testground sidecar …" 43 hours ago Up About an hour 0.0.0.0:32768->6060/tcp testground-sidecar c1157500282b influxdb:1.8 "/entrypoint.sh infl…" 43 hours ago Up 25 seconds 0.0.0.0:8086->8086/tcp testground-influxdb 99ca4c07fecc redis "docker-entrypoint.s…" 43 hours ago Up About an hour 0.0.0.0:6379->6379/tcp testground-redis bf25c87488a5 bitnami/grafana "/run.sh" 43 hours ago Up 26 seconds 0.0.0.0:3000->3000/tcp testground-grafana cd1d6383eff7 goproxy/goproxy "/goproxy" 45 hours ago Up About a minute 8081/tcp testground-goproxy ``` 3. Take note of the port mapping. Imagine in the output above, we want to query `54dd5ad916b2`. We'd use `localhost:32788`, as it forwards to the container's 1234 port (Lotus Full Node RPC). 4. Run your Lotus CLI command setting the `FULLNODE_API_INFO` env variable, which is a multiaddr: ```sh $ FULLNODE_API_INFO=":/ip4/127.0.0.1/tcp/$port/http" lotus chain list [...] ``` ### cluster:k8s WIP @nonsense. ### Useful commands / checks * **Making sure miners are on the same chain:** compare outputs of `lotus chain list`. * **Checking deals:** `lotus client list-deals`. * **Sector queries:** WIP @nonsense. ## Viewing logs of a particular container (local:docker) This works for both started and stopped containers. Just get the container ID (in double angle brackets in Testground output, on every log line), and do a: ```shell script $ docker logs $container_id ``` ## Accessing the golang instrumentation Testground exposes a pprof endpoint under local port 6060, which both `local:docker` and `cluster:k8s` map. For `local:docker`, see above to figure out which host port maps to the container's 6060 port. ## Acquiring a goroutine dump When things appear to be stuck, get a goroutine dump. ```shell script $ wget -o goroutine.out http://localhost:${pprof_port}/debug/pprof/goroutine?debug=2 ``` You can use whyrusleeping/stackparse to extract a summary: ```shell script $ go get https://github.com/whyrusleeping/stackparse $ stackparse --summary goroutine.out ``` ## Acquiring a CPU profile When the CPU appears to be spiking/rallying, grab a CPU profile. ```shell script $ wget -o profile.out http://localhost:${pprof_port}/debug/pprof/profile ``` Analyse it using `go tool pprof`. Usually, generating a `png` graph is useful: ```shell script $ go tool pprof profile.out File: testground Type: cpu Time: Jul 3, 2020 at 12:00am (WEST) Duration: 30.07s, Total samples = 2.81s ( 9.34%) Entering interactive mode (type "help" for commands, "o" for options) (pprof) png Generating report in profile003.png ``` ## Submitting actionable reports / findings This is useful both internally (within the Oni team, so that peers can help) and externally (when submitting a finding upstream). We don't need to play the full bug-hunting game on Lotus, but it's tremendously useful to provide the necessary data so that any reports are actionable. These include: * test outputs (use `testground collect`). * stack traces that appear in logs (whether panics or not). * output of relevant Lotus CLI commands. * if this is some kind of blockage / deadlock, goroutine dumps. * if this is a CPU hotspot, a CPU profile would be useful. * if this is a memory issue, a heap dump would be useful.