Skip to content

Conversation

@dean-long
Copy link
Member

@dean-long dean-long commented Oct 24, 2025

The problem is code called from a signal handler, like SharedRuntime::handle_unsafe_access(), can call os::malloc(), and when NMT is enabled, we try to get a stack backtrace. But os::get_native_stack() does not know how to walk through signal handler frames.

This fix introduces FirstNativeFrameMark to be used by the POSIX version of os::get_native_stack() to set a frame to stop at in the POSIX signal handler.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8358725: RunThese30M: assert(nm->insts_contains_inclusive(original_pc)) failed: original PC must be in the main code section of the compiled method (or must be immediately following it) (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27985/head:pull/27985
$ git checkout pull/27985

Update a local copy of the PR:
$ git checkout pull/27985
$ git pull https://git.openjdk.org/jdk.git pull/27985/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27985

View PR using the GUI difftool:
$ git pr show -t 27985

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27985.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 24, 2025

👋 Welcome back dlong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 24, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Oct 24, 2025
@openjdk
Copy link

openjdk bot commented Oct 24, 2025

@dean-long The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 24, 2025
@mlbridge
Copy link

mlbridge bot commented Oct 24, 2025

Webrevs

@dholmes-ora
Copy link
Member

If we only the print the stack up to the signal handler, and not all the way to the allocation, then won't the resulting stack trace be confusing for the reader?

The problem is code called from a signal handler, like SharedRuntime::handle_unsafe_access(), can call os::malloc(),

Yeah and it really should not do that. :(

@dean-long
Copy link
Member Author

If we only the print the stack up to the signal handler, and not all the way to the allocation, then won't the resulting stack trace be confusing for the reader?

The stack trace starts at the most recent stack frames (the allocation) and works backwards through the callers (signal handler). So the signal handler frame looks like a special entry or "first" frame, similar to a thread start function or libc init code.

@dean-long
Copy link
Member Author

Yeah and it really should not do that. :(

I agree, but I am not addressing that in this fix. There might be other legitimate reasons for wanting a stack trace during a signal handler, other than NMT allocation tracking, though off the top of my head I can't think of any :-)

@dholmes-ora
Copy link
Member

If we only the print the stack up to the signal handler, and not all the way to the allocation, then won't the resulting stack trace be confusing for the reader?

The stack trace starts at the most recent stack frames (the allocation) and works backwards through the callers (signal handler). So the signal handler frame looks like a special entry or "first" frame, similar to a thread start function or libc init code.

Sorry I'm confused about which part of the stack - that preceding the signal handler, or that after - will be printed after this fix.

@dean-long
Copy link
Member Author

Sorry I'm confused about which part of the stack - that preceding the signal handler, or that after - will be printed after this fix.

After, temporally. The history before the signal handler happened is erased. This should match the meaning of os::is_first_C_frame().

Comment on lines +573 to +575
// We are called from a signal handler, so stop the stack backtrace here.
// See os::is_first_C_frame() in os::get_native_stack().
os::FirstNativeFrameMark fnfm;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this break stack-walking in hs_err file generation when we get a SEGV for example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose so. Good catch. We shouldn't consider the "first frame" marker if we are starting before it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hs_err stack trace only seems to report from before the signal handler, though I'm unclear if that is because the signal context sets the initial frame, or because we skip over things till we get to that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call report_and_die from the signal handler we use the signal context as the starting point to walk the stack. So it shouldn’t affect the hs_err stack trace unless we crashed in the signal handler itself somewhere before calling report_and_die (and that we don’t hit the second time). I guess we still want to support that case. The other case I was thinking was if we call report_and_die due to hitting an assert and there is no context set, but then we crash before printing the stack. So the second attempt to print the stack is done within the signal handler and VMError::_context is still nullptr (only set first time), so we start walking from the current frame. I tested this case with this patch applied and we walk the full stack including the signal handler fine. I was confused first but then realized we execute the secondary signal handler, which doesn’t have this mark.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm David's concern, than if we get a SEGV, then the stack backtrace only contains a single frame. One solution would be to do something more like anchor frames, which are chained. Imagine wanting to start a stack walk in the middle of the stack between anchor frame A and anchor frame B, and stop at the anchor frame boundary. That is comparable to what error reporting is attempting to do by providing a saved context as a starting point.

@dean-long dean-long marked this pull request as draft October 29, 2025 22:33
@openjdk openjdk bot removed the rfr Pull request is ready for review label Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-runtime hotspot-runtime-dev@openjdk.org

Development

Successfully merging this pull request may close these issues.

3 participants