Skip to content

Conversation

nishitha-burman
Copy link
Contributor

No description provided.

Explainer for decoder error
Added links and made edits to sample code.
Copy link
Contributor

@gabrielsanbrito gabrielsanbrito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving some small comments. I think that the explainer is looking good overall.

console.warn("Decoder error: decoder fell back or failed");

// Log telemetry signal
logMetric("decoder_error");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we augment this example showing how different errors could be handled based on RTCRtpSenderErrorEvent::errorCode? For example: "software decoding fallback" vs "decoder unavailable".

Copy link
Contributor Author

@nishitha-burman nishitha-burman Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to not add the scenario specific error codes because it may add potential fingerprinting vectors. Right now the event fires when there is an error, fallback happens, or when software decoder is not available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But aren't them part of your proposal? I was reading the slide deck that you mentioned above. Just want to make sure that the example shows how clients could fully leverage this new event. I am not sure about this adding fingerprinting concerns though (I would say that it is not a risk).

Also, is there any value for the clients of this new API in being able to differentiate between "an error", "fallback happens", and when "software decoder is not available"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error sounds like a terminal state. In the case of a decoder error, it might be terminal. However, in the case of a fallback, streaming may continue. Should we create separate events for decoder fallback and decoder error to differentiate between these states? After decoder fallback, is it possible to transition back to hardware decode? If so, do we need a way to signal this transition to inform developers that software decode is no longer used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You both raise a good point. Perhaps we can differentiate between fallback and error without adding new fingerprinting vectors.

@gabrielsanbrito to answer your question if there is value in differentiating, I do think there is value because site developers can take different action based on the issue and their scenarios. For example, if there is a decoder issue the developer can switch to a different codec or profile or trigger error UI for end users. And if decoder fallback happens then the developer can adapt quality dynamically, pause non-critical efforts, and warn users that performance may be reduced.

@SteveBeckerMSFT instead of a separate event, how about an error code that differentiates between the states?
As for transitioning back to hardware decode I am not sure if this is possible, will look into this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteveBeckerMSFT: decoderstatechange is something I'd consider more useful for cases where the decoder switches e.g. from AV1 to VP9 in a multiparty scenario, basically informing the application that the codec and its characteristic changed without the need for polling getStats. I like it!

decodeerror as name SGTM but I think we want an error (EncodingError which is more specific than DataError and OperationError?) so we can differentiate between fatal ("you need to switch from H265 to H264") and non-fatal ("we went to SW and this is going to drain your battery") similar to slide 20 .
From that slide we want timestamp which should be named rtpTimestamp because that is what pinpointing where in a bitstream the error occured.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fippo is it common for the decoder to go between hardware <-> software and between codecs?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nishitha-burman currently hardware->software due to decode errors can happen only once since there is no mechanism to switch back to hardware and retry yet.

Switching codecs is common in multiparty scenarios where clients might be picking the "best" codec for a certain group of peers (ideally hardware accelerated) and then a browser not supporting that codec joins and forces a downgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks everyone for the feedback! I chatted with Steve and we are thinking of updating the API shape to the following:

partial interface RTCRtpReceiver {
attribute EventHandler ondecoderstatechange;
};

interface RTCDecoderStateChangeEvent : Event {
constructor(DOMString type, RTCDecoderStateChangeEventInit eventInitDict);

// Media timeline reference
readonly attribute unsigned long rtpTimestamp;

// Codec now in effect after the change.
readonly attribute DOMString codecString; 

// Align with MediaCapabilitiesInfo, powerEfficient changes primarily based on hardware/software decoder
// https://www.w3.org/TR/media-capabilities/#media-capabilities-info
readonly attribute boolean powerEfficient;
};

* Web Developers: Positive
* [Xbox Cloud Gaming](https://github.com/w3c/webrtc-stats/pull/725#discussion_r1093134014) & Nvidia GeForce Now have direct use cases.
* Chromium: Positive; actively pursuing proposal.
* WebKit & Gecko: Overall positive feedback, but privacy/fingerprinting is a common concern.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have links to their official positions on this? Mozilla, WebKit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't/can't request those until after the initial explainer is checked in.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And @xingri can tell us what GFN thinks!

Copy link

@xingri xingri Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fippo for sharing this.
Thanks @nishitha-burman for the proposal to surface decoder errors to developers.

This is a much-needed improvement for media diagnostics and reliability. We’ve had some internal discussions inside NVIDIA and would like to share the following feedback:

  1. Differentiating Between Fallback and Error
    We agree with the comment in the PR that it’s important to distinguish between a decoder error and a fallback. These are conceptually different, and conflating them could lead to misinterpretation in telemetry or user-facing diagnostics. Additionally, we’d like clarity on whether an error that leads to a fallback would trigger multiple events. Understanding which events are terminal (i.e., indicate unrecoverable failure) versus transitional (e.g., fallback to software decoding) is crucial for building robust error handling logic.

  2. What Triggers a Fallback?
    When discussing fallback scenarios, it would be helpful to clarify what conditions—besides decoder errors—might prompt a fallback to software decoding. For example:

  • Hardware limitations or unsupported configurations.
  • Performance-related decisions (e.g., slow decode).
  • Queue flushes or resource constraints.
    Providing a taxonomy of fallback triggers would help developers better interpret the context of these events.
  1. Linking Decoder Anomalies to Outcomes
    We’re also interested in understanding how developers can determine whether a decoder anomaly resulted in a fallback or a failure. For instance:
  • Could slow decode or queue flushes be surfaced as errors?
  • Would there be a way to correlate anomalies with fallback decisions?
    It may be valuable to include metadata such as the frame number where the fallback occurred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your feedback @xingri! I just replied to the conversation above with a new proposal based on the feedback we received. Let me know if that addresses your concerns, especially around what triggers a fallback.

console.warn("Decoder error: decoder fell back or failed");

// Log telemetry signal
logMetric("decoder_error");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But aren't them part of your proposal? I was reading the slide deck that you mentioned above. Just want to make sure that the example shows how clients could fully leverage this new event. I am not sure about this adding fingerprinting concerns though (I would say that it is not a risk).

Also, is there any value for the clients of this new API in being able to differentiate between "an error", "fallback happens", and when "software decoder is not available"?

* Without this signal, developers cannot confidently diagnose or reduce fallback incidence.

## Proposed Approach
Introduce an event on [`RTCRtpReceiver`](https://developer.mozilla.org/en-US/docs/Web/API/RTCRtpReceiver) ([see slide 30](https://docs.google.com/presentation/d/1FpCAlxvRuC0e52JrthMkx-ILklB5eHszbk8D3FIuSZ0/edit?slide=id.g2452ff65d17_0_71#slide=id.g2452ff65d17_0_71)) that fires when:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we give a name to the new event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the example code I have it as decodererror, I'll add the API shape to make this clear. But from discussions above, this name may change.

Copy link
Contributor

@SteveBeckerMSFT SteveBeckerMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I have a few questions we should answer about what error states and state transitions we should expose to web developers.

Copy link

@fippo fippo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM!

console.warn("Decoder error: decoder fell back or failed");

// Log telemetry signal
logMetric("decoder_error");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SteveBeckerMSFT: decoderstatechange is something I'd consider more useful for cases where the decoder switches e.g. from AV1 to VP9 in a multiparty scenario, basically informing the application that the codec and its characteristic changed without the need for polling getStats. I like it!

decodeerror as name SGTM but I think we want an error (EncodingError which is more specific than DataError and OperationError?) so we can differentiate between fatal ("you need to switch from H265 to H264") and non-fatal ("we went to SW and this is going to drain your battery") similar to slide 20 .
From that slide we want timestamp which should be named rtpTimestamp because that is what pinpointing where in a bitstream the error occured.

@nishitha-burman nishitha-burman removed the request for review from slightlyoff October 20, 2025 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants