8358725: RunThese30M: assert(nm->insts_contains_inclusive(original_pc)) failed: original PC must be in the main code section of the compiled method (or must be immediately following it) #27985

dean-long · 2025-10-24T23:09:20Z

The problem is code called from a signal handler, like SharedRuntime::handle_unsafe_access(), can call os::malloc(), and when NMT is enabled, we try to get a stack backtrace. But os::get_native_stack() does not know how to walk through signal handler frames.

This fix introduces FirstNativeFrameMark to be used by the POSIX version of os::get_native_stack() to set a frame to stop at in the POSIX signal handler.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8358725: RunThese30M: assert(nm->insts_contains_inclusive(original_pc)) failed: original PC must be in the main code section of the compiled method (or must be immediately following it) (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27985/head:pull/27985
$ git checkout pull/27985

Update a local copy of the PR:
$ git checkout pull/27985
$ git pull https://git.openjdk.org/jdk.git pull/27985/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27985

View PR using the GUI difftool:
$ git pr show -t 27985

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27985.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-10-24T23:10:47Z

👋 Welcome back dlong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-10-24T23:11:42Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-10-24T23:12:23Z

@dean-long The following label will be automatically applied to this pull request:

hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-10-24T23:16:11Z

Webrevs

00: Full (d5a0b53e)

dholmes-ora · 2025-10-27T02:56:36Z

If we only the print the stack up to the signal handler, and not all the way to the allocation, then won't the resulting stack trace be confusing for the reader?

The problem is code called from a signal handler, like SharedRuntime::handle_unsafe_access(), can call os::malloc(),

Yeah and it really should not do that. :(

dean-long · 2025-10-28T01:32:14Z

If we only the print the stack up to the signal handler, and not all the way to the allocation, then won't the resulting stack trace be confusing for the reader?

The stack trace starts at the most recent stack frames (the allocation) and works backwards through the callers (signal handler). So the signal handler frame looks like a special entry or "first" frame, similar to a thread start function or libc init code.

dean-long · 2025-10-28T01:37:08Z

Yeah and it really should not do that. :(

I agree, but I am not addressing that in this fix. There might be other legitimate reasons for wanting a stack trace during a signal handler, other than NMT allocation tracking, though off the top of my head I can't think of any :-)

dholmes-ora · 2025-10-28T03:35:10Z

If we only the print the stack up to the signal handler, and not all the way to the allocation, then won't the resulting stack trace be confusing for the reader?

The stack trace starts at the most recent stack frames (the allocation) and works backwards through the callers (signal handler). So the signal handler frame looks like a special entry or "first" frame, similar to a thread start function or libc init code.

Sorry I'm confused about which part of the stack - that preceding the signal handler, or that after - will be printed after this fix.

dean-long · 2025-10-28T04:01:32Z

Sorry I'm confused about which part of the stack - that preceding the signal handler, or that after - will be printed after this fix.

After, temporally. The history before the signal handler happened is erased. This should match the meaning of os::is_first_C_frame().

dholmes-ora · 2025-10-29T05:01:28Z

src/hotspot/os/posix/signals_posix.cpp

+  // We are called from a signal handler, so stop the stack backtrace here.
+  // See os::is_first_C_frame() in os::get_native_stack().
+  os::FirstNativeFrameMark fnfm;


Won't this break stack-walking in hs_err file generation when we get a SEGV for example?

Yes, I suppose so. Good catch. We shouldn't consider the "first frame" marker if we are starting before it.

The hs_err stack trace only seems to report from before the signal handler, though I'm unclear if that is because the signal context sets the initial frame, or because we skip over things till we get to that point.

If we call report_and_die from the signal handler we use the signal context as the starting point to walk the stack. So it shouldn’t affect the hs_err stack trace unless we crashed in the signal handler itself somewhere before calling report_and_die (and that we don’t hit the second time). I guess we still want to support that case. The other case I was thinking was if we call report_and_die due to hitting an assert and there is no context set, but then we crash before printing the stack. So the second attempt to print the stack is done within the signal handler and VMError::_context is still nullptr (only set first time), so we start walking from the current frame. I tested this case with this patch applied and we walk the full stack including the signal handler fine. I was confused first but then realized we execute the secondary signal handler, which doesn’t have this mark.

I can confirm David's concern, than if we get a SEGV, then the stack backtrace only contains a single frame. One solution would be to do something more like anchor frames, which are chained. Imagine wanting to start a stack walk in the middle of the stack between anchor frame A and anchor frame B, and stop at the anchor frame boundary. That is comparable to what error reporting is attempting to do by providing a saved context as a starting point.

dean-long added 4 commits October 23, 2025 23:32

stop stack walk at signal handler

5e18428

missing include

8716124

sort includes

d3e5ca4

remove added blank lines

d5a0b53

openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Oct 24, 2025

openjdk bot added the rfr Pull request is ready for review label Oct 24, 2025

dholmes-ora reviewed Oct 29, 2025

View reviewed changes

dean-long marked this pull request as draft October 29, 2025 22:33

openjdk bot removed the rfr Pull request is ready for review label Oct 29, 2025

8358725: RunThese30M: assert(nm->insts_contains_inclusive(original_pc)) failed: original PC must be in the main code section of the compiled method (or must be immediately following it) #27985

Are you sure you want to change the base?

8358725: RunThese30M: assert(nm->insts_contains_inclusive(original_pc)) failed: original PC must be in the main code section of the compiled method (or must be immediately following it) #27985

Conversation

dean-long commented Oct 24, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Oct 24, 2025

Uh oh!

openjdk bot commented Oct 24, 2025

Uh oh!

openjdk bot commented Oct 24, 2025

Uh oh!

mlbridge bot commented Oct 24, 2025

Webrevs

Uh oh!

dholmes-ora commented Oct 27, 2025

Uh oh!

dean-long commented Oct 28, 2025

Uh oh!

dean-long commented Oct 28, 2025

Uh oh!

dholmes-ora commented Oct 28, 2025

Uh oh!

dean-long commented Oct 28, 2025

Uh oh!

dholmes-ora Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

dean-long Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

dholmes-ora Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

pchilano Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

dean-long Nov 1, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

dean-long commented Oct 24, 2025 •

edited by openjdk bot

Loading