Skip to content

Conversation

@khangng-ampere
Copy link
Contributor

As discussed on OpenBMC Discord, according to the DSP0236 spec,
there is no good way to discover all active endpoints aside from
polling/scanning the entire network.

This adds an initial routing table polling mechanism. For each bridge,
mctpd will send Get Routing Table and add the endpoints to D-Bus.
If some of the endpoints are bridges, mctpd recursively enables
polling for those bridges.

To kick start the process, mctpd will enable polling on its bus owner,
discovered when mctpd receives the Set Endpoint ID message.

mctp.routing_table_polling_interval_ms config can be used to tweak the
delay between pollings.


I have just rebased this patch on top of my other patches. Thought I should also open a PR for it too.

It is still a bit rough and depends on other PRs so I am marking this as draft.

@jk-ozlabs
Copy link
Member

We have a PR (#85) for the polling implementation at present. There was some earlier discussion that we don't have reliable enough information in Get Routing Table Entries to assume that downstream endpoints are present.

So, we'll probably need to discuss the approach first. The linked Discord chat was some time ago :)

As a summary of my thoughts here: Get Routing Table Entries would be more efficient than individual-endpoint polling, but I'm not convinced we get accurate enough data from that. For example, a bridge may indicate that is has a downstream route to a range of EIDs (through data in the GRTE), but those individual endpoints may not exist at all.

Also, given mctpd does not implement GRTE commands itself, I figure we'll need some infrastructure for that too.

@khangng-ampere
Copy link
Contributor Author

We have a PR (#85) for the polling implementation at present.

Oh, I totally missed the PR. I will review it then!

@khangng-ampere
Copy link
Contributor Author

There was some earlier discussion that we don't have reliable enough information in Get Routing Table Entries to assume that downstream endpoints are present.

So, we'll probably need to discuss the approach first. The linked Discord chat was some time ago :)

As a summary of my thoughts here: Get Routing Table Entries would be more efficient than individual-endpoint polling, but I'm not convinced we get accurate enough data from that. For example, a bridge may indicate that is has a downstream route to a range of EIDs (through data in the GRTE), but those individual endpoints may not exist at all.

As for what I am thinking, it is indeed not accurate, but such is also the nature of the endpoint objects published on D-Bus. In normal operation, even without this PR (or the other one), some of the endpoints can go offline, and the D-Bus endpoint objects can get outdated/inaccurate. Applications/Consumers of the D-Bus list also already had to deal with that using the .Recovery mechanism, so I think it should apply here as well.

Also, given mctpd does not implement GRTE commands itself, I figure we'll need some infrastructure for that too.

Agreed.

@jk-ozlabs
Copy link
Member

Perhaps we could use the GRTE responses as first-pass data on what to start polling, plus a configuration for "just poll the entire downstream pool" for cases where bridges do not support (or do not reliably support) GRTE

In tests, replace all sd_event timer usages with trio.testing.MockClock
backed I/O sources.

Because we still have real timeouts (D-Bus call to mctpd subprocess,
mctpd SO_RCVTIMEO wait, ...), autojump_threshold (how much to wait
before skipping Trio timeouts) is set to a reasonable value to take that
into account.

Signed-off-by: Khang D Nguyen <khangng@os.amperecomputing.com>
Signed-off-by: Khang D Nguyen <khangng@os.amperecomputing.com>
This commit adds support for Discovery Notify messages, specified in
DSP0236 section 12.14.

In our implementation, a Discovery Notify message is sent when mctpd
sets an interface role from Unknown to Endpoint.

To avoid notify discovery messages getting lost, retry the messages for
a few time with delays.

Signed-off-by: Khang D Nguyen <khangng@os.amperecomputing.com>
As discussed on OpenBMC Discord [1], according to the DSP0236 spec,
there is no good way to discover all active endpoints aside from
polling/scanning the entire network.

This adds an initial routing table polling mechanism. For each bridge,
mctpd will send Get Routing Table and add the endpoints to D-Bus.
If some of the endpoints are bridges, mctpd recursively enables
polling for those bridges.

To kick start the process, mctpd will enable polling on its bus owner,
discovered when mctpd receives the Set Endpoint ID message.

mctp.routing_table_polling_interval_ms config can be used to tweak the
delay between pollings. Config files have been updated accordingly, we
want a reasonable delay in the default config file, and a shorter delay
in the test config file for running tests.


Tested: For the following topology:

    ┌───────┐  ┌───────┐            
    │       ┼──► EID 9 │            
    │       │  └───────┘            
    │       │                       
    │       │                       
    │ EID 8 │                       
    │       │  ┌────────┐  ┌──────┐ 
    │       │  │        ┼──►EID 11│ 
    │       │  │        │  └──────┘ 
    │       ┼──► EID 10 │           
    │       │  │        │  ┌──────┐ 
    │       │  │        ├──►EID 12│ 
    └───────┘  └────────┘  └──────┘ 

The result D-Bus object tree should contain:

- /au/com/codeconstruct/mctp1/networks/1/endpoints/8
- /au/com/codeconstruct/mctp1/networks/1/endpoints/9
- /au/com/codeconstruct/mctp1/networks/1/endpoints/10
- /au/com/codeconstruct/mctp1/networks/1/endpoints/11
- /au/com/codeconstruct/mctp1/networks/1/endpoints/12


[1] https://discord.com/channels/775381525260664832/778790638563885086/1282947932093415446

Signed-off-by: Khang D Nguyen <khangng@os.amperecomputing.com>
@khangng-ampere khangng-ampere force-pushed the khangng/push-kpnxqpytummr branch from 46b49a4 to 7ab1b57 Compare October 31, 2025 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants