This is showing up in profiles so I figured I'd optimize it a bit:
1. Avoid holding locks while recording metrics.
2. Slightly reduce allocations by re-using the metrics "mutators".
Also, use the passed context for better tracing.
This is unlikely to make a huge difference, but it may help RPC
providers a _tiny_ bit and doesn't really move the complexity needle.