Thanks for this amazing project! When I use NCCL over MSCCL++ v0.6.0, I notice that the current implementation only considers the following condition for choosing an execution plan for a certain collective operation: bytes >= p.key.minMessageSize && bytes < p.key.maxMessageSize && inPlace == p.key.isInPlace
. It does not consider whether NVLS is available, world size (since each execution plan only works for a particular world size), and many other factors. It would be better if there were an interface or some environment variables that allow users to control the execution plan.