8368465: [leyden] Improve precompiler method selection code #99

shipilev · 2025-09-23T12:33:23Z

Forked from JDK-8366681: there are still some cleanups/performance improvements possible. Current selection code is a bit hairy, and turns out the changes I made for previous patch improve performance.

Notable improvements:

Push the compilation level filters downwards. This allows compiling A2 from T2/T3 code more easily, and allows to implement policies for compiling on any A* level based on observing top-compiled T* levels.
Sort methods by hotness and code size. This looks to have a positive effect on shorter workloads, I suspect because we are avoiding a lot of C1 compilations by preloading hottest code first.

Additional testing:

Performance tests (see comments)
Linux x86_64 server fastdebug, runtime/cds

Progress

Change must not contain extraneous whitespace
Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

JDK-8368465: [leyden] Improve precompiler method selection code (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/leyden.git pull/99/head:pull/99
$ git checkout pull/99

Update a local copy of the PR:
$ git checkout pull/99
$ git pull https://git.openjdk.org/leyden.git pull/99/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 99

View PR using the GUI difftool:
$ git pr show -t 99

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/leyden/pull/99.diff

Using Webrev

Link to Webrev Comment

shipilev · 2025-09-23T12:34:23Z

javac test (1000 iterations trained, 50 iterations production)

# --- Before

Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:AOTCache=app.aot -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50
  Time (mean ± σ):     338.2 ms ±   3.5 ms    [User: 742.4 ms, System: 120.6 ms]
  Range (min … max):   332.3 ms … 342.9 ms    10 runs
 
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:+UnlockExperimentalVMOptions -XX:+PreloadOnly -XX:AOTCache=app.aot JavacBenchApp 50
  Time (mean ± σ):     497.2 ms ±   4.1 ms    [User: 491.6 ms, System: 55.5 ms]
  Range (min … max):   489.7 ms … 502.3 ms    10 runs
 

# --- After

Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -XX:AOTCache=app.aot -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar JavacBenchApp 50
  Time (mean ± σ):     322.8 ms ±   2.2 ms    [User: 511.0 ms, System: 101.3 ms]
  Range (min … max):   319.1 ms … 325.9 ms    10 runs
 
Benchmark 1: build/linux-x86_64-server-release/images/jdk/bin/java -Xms64m -Xmx1g -XX:+UseSerialGC -cp JavacBenchApp.jar -XX:+UnlockExperimentalVMOptions -XX:+PreloadOnly -XX:AOTCache=app.aot JavacBenchApp 50
  Time (mean ± σ):     483.0 ms ±   4.5 ms    [User: 476.9 ms, System: 55.4 ms]
  Range (min … max):   476.5 ms … 492.0 ms    10 runs

user time significantly improves, I think that's because we manage to preload the hottest code before C1 discovers it needs to compile it. See how loading is half of the workload time, and "before" C1 was more active:

Before:

After:

bridgekeeper · 2025-09-23T12:35:09Z

👋 Welcome back shade! A progress list of the required criteria for merging this PR into premain will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

shipilev · 2025-09-23T12:35:44Z

Larger benchmarks all improve with 1-core tests.

quarkus-getting-started:

Run,Old CDS + AOT,New CDS + AOT
1,235,190
2,234,198
3,224,189
4,227,185
5,224,199
6,226,192
7,234,193
8,219,181
9,218,196
10,214,197
Geomean,225.40,191.92 (1.17x improvement)
Stdev,6.87,5.57

helidon-quickstart-se

Run,Old CDS + AOT,New CDS + AOT
1,196,167
2,191,166
3,195,166
4,196,173
5,199,165
6,203,169
7,199,168
8,200,162
9,198,167
10,199,173
Geomean,197.58,167.57 (1.18x improvement)
Stdev,3.10,3.23

micronaut-first-app

Run,Old CDS + AOT,New CDS + AOT
1,239,239
2,247,246
3,258,237
4,262,229
5,250,234
6,233,233
7,249,236
8,256,239
9,250,233
10,259,241
Geomean,250.15,236.66 (1.06x improvement)

spring-boot-getting-started:

Run,Old CDS + AOT,New CDS + AOT
1,494,450
2,489,440
3,492,441
4,494,426
5,480,440
6,501,441
7,521,441
8,483,443
9,486,435
10,490,441
Geomean,492.88,439.76 (1.12x improvement)
Stdev,10.93,5.78

spring-petclinic:

Run,Old CDS + AOT,New CDS + AOT
1,2691,2407
2,2659,2451
3,2559,2439
4,2645,2433
5,2667,2475
6,2650,2447
7,2647,2454
8,2667,2439
9,2635,2440
10,2649,2446
Geomean,2646.69,2443.05 (1.08x improvement)
Stdev,32.84,16.34

openjdk · 2025-09-23T12:36:03Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

mlbridge · 2025-09-23T12:39:59Z

Webrevs

shipilev · 2025-10-17T07:05:44Z

Ready for review, folks. There are performance benefits of doing this, very apparently.

vnkozlov

Few questions

src/hotspot/share/compiler/precompiler.cpp

vnkozlov · 2025-10-17T17:17:51Z

src/hotspot/share/compiler/precompiler.cpp

+    if (mtd != nullptr) {
+      MethodData* md = mtd->final_profile();
+      if (md != nullptr) {
+        count += md->backedge_count();


Hmm, this will put methods with hot loop up front.

Yes, this is intentional: this effectively puts the methods that are profitable to (pre)load first, so they: a) do not linger in interpreter too much; b) do not trigger JIT compilation before AOT code is able to (pre)load. The methods with hot back-branches are those methods :)

vnkozlov · 2025-10-17T17:19:08Z

src/hotspot/share/compiler/precompiler.cpp


-      default: fatal("%d", _search_level);
-    }
+    // Otherwise, break the tie by code size: largest methods go first.


What is the reason for larger method be first? Can we use compile ID here instead?

So my logic was like with the hot methods. If we have lost the game of "preload the AOT code before JIT got triggered", and JIT got triggered, we want to then prioritize larger methods, as they are more likely to take more time to JIT compile. In other words, I think if you lost to JIT timing-wise, you want to preempt the fattest JIT compiles first. But it is only a bet. If we ever record compilation time in nmethods/profiles, we could have used that to break the tie.

I am not sure understand how different order of pre/AOT-compilation can affect performance of production run.
We bulk load all "Preload" AOT code - ordering does not matter for it. Even if we load in selected order. It is one thread which do loading and it is blocking (I actually playing with spreading this preload on all compiler threads - did not see much affect on startup).

The only explanation is that preload happens only when C2 compiler threads are initialized (Preload AOT code is C2 compiled code) and it happens simultaneously with C1 threads initialization which could be available sooner for C1 compilation than we finish preloading. Especially on small number of cores machines. I did observed that we start loading A1 and A2 code first (even normal C1 compilations) before we start preload AP4.

Is it what you are trying to solve here?

The invocation counters should be roughly the same for methods without loops (10000 to trigger C2 compilation). They could be different if code was deoptimized and run in interpreter. The only difference id backedge counter. So in this sense you push methods with hot loop up front as we talked about in other comment. Which may affect performance but it would depend on application.

I agree with ordering by size (or time spant in compilation) but only for methods which did not have A1 or A2 code. Which should not be the case - if we have AP4 we will have A1 and A2 for it. I am still not convince by this.

May be we should try to move AOTCodeCache::preload_code() just after SystemDictionary::compute_java_loaders() because it does not depend on training data. So we can have AP4 sooner. MMaping directly into CodeCache will also speedup preloading - it is on our list to do.

vnkozlov · 2025-10-18T00:09:19Z

I think we have small performance "issue" how we replace existing JITed code with new one which AOT code loading could be more sensitive. We deoptimize old code before new code is set under lock NMethodState_lock :
https://github.com/openjdk/leyden/blob/premain/src/hotspot/share/ci/ciEnv.cpp#L1058

If lock is held by other thread we may deoptimize previous code and go into interpreter before new code is set for use.

This is present in mainline but with normal JIT compilation replacement it may be not noticeable.

vnkozlov · 2025-10-18T00:16:00Z

An other suggestion for this concurrent preloading would be to split A4 preload code. One set is the current which needs to wait compute_java_loaders(). And new one (much smaller) is for simple methods for classes which are loaded first (String, for example) which we can preload much sooner.

bridgekeeper · 2025-11-15T03:58:16Z

@shipilev This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

shipilev added 2 commits September 23, 2025 13:49

Fix

d63bde8

Touchups

5e6d82b

openjdk bot added the rfr Pull request is ready for review label Sep 23, 2025

shipilev added 3 commits September 23, 2025 19:49

Touchup

77b8b3e

Merge branch 'premain' into JDK-8368465-precompiler-method-select

3a7844c

Merge branch 'premain' into JDK-8368465-precompiler-method-select

c1ceda2

shipilev requested review from iwanowww, veresov and vnkozlov and removed request for iwanowww, veresov and vnkozlov October 17, 2025 07:05

vnkozlov reviewed Oct 17, 2025

View reviewed changes

Drop the mention of MDO

49f9552

8368465: [leyden] Improve precompiler method selection code #99

Are you sure you want to change the base?

8368465: [leyden] Improve precompiler method selection code #99

Uh oh!

Conversation

shipilev commented Sep 23, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

shipilev commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bridgekeeper bot commented Sep 23, 2025

Uh oh!

shipilev commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Sep 23, 2025

Uh oh!

mlbridge bot commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

shipilev commented Oct 17, 2025

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vnkozlov Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

shipilev Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

vnkozlov Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

shipilev Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vnkozlov Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

vnkozlov commented Oct 18, 2025

Uh oh!

vnkozlov commented Oct 18, 2025

Uh oh!

bridgekeeper bot commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

shipilev commented Sep 23, 2025 •

edited by openjdk bot

Loading

shipilev commented Sep 23, 2025 •

edited

Loading

shipilev commented Sep 23, 2025 •

edited

Loading

mlbridge bot commented Sep 23, 2025 •

edited

Loading

shipilev Oct 17, 2025 •

edited

Loading