This upgrade should bring some significant performance improvements to
instrumentation. These are mostly achieved by disabling features (by
default) that are likely not widely used by default – collection of
callstacks and support for fibers that wasn't used for anything in
particular yet. For callstack collection it might be worthwhile to
provide a mechanism to enable this at runtime by calling
`TracyLayer::with_stackdepth`.
These should bring the cost of a single span down from 30+µs per span to
a more reasonable 1.5µs or so and down to the ns scale for events (on my
1st gen Ryzen machine, anyway.) There is still a fair amount of overhead
over plain tracy_client instrumentation in formatting and such, but
dealing with it requires significant effort and this is a
straightforward improvement to have for the time being.
Co-authored-by: Simonas Kazlauskas <git@kazlauskas.me>