feat: Add --test-mode for resilient bootstrap with failure handling #719

LalatenduMohanty · 2025-08-22T06:31:14Z

Add --test-mode flag that enables resilient bootstrapping by marking failed
packages as pre-built and continuing until all packages are processed. Uses
optimal n+1 retry logic with comprehensive failure reporting including exception
types, messages, and per-package context.

Major changes:
- Enhanced BuildResult dataclass with req, resolved_version, and exception
  tracking for detailed failure analysis
- Refactored pre_built_override from Settings to WorkContext for proper
  separation of static config vs runtime state
- Introduced public WorkContext.package_build_info() API, replacing direct
  Settings access across commands (bootstrap, build, graph, list-overrides)
- Fixed build-parallel command to use new public API
- Added 4 essential test scenarios in test_bootstrap_test_mode.py

Benefits:
- Discover all build failures in one run rather than stopping on first failure
- Support mixed source/binary dependency workflows
- Better error context for debugging failed builds
- Cleaner API boundaries between configuration and runtime context

Fixes #713

Co-developed-with: Cursor IDE with Claude 4.0 Sonnet

Command prompts: https://gist.github.com/LalatenduMohanty/762baf9999a09ef3d2d3e63220b9c52e

LalatenduMohanty · 2025-08-22T06:44:13Z

Have not added the tests for this. Planning to add the tests after initial round of reviews

dhellmann

This implementation is going to be very slow, since it will re-process all of the packages that build successfully until it gets to the one that failed and is now marked as prebuilt.

It would be more efficient to do the check in the bootstrapper class at the point where a wheel is being built. If that build fails then we can treat the package as though it was prebuilt by running the logic to handle a prebuilt wheel, even though that wheel is not marked as prebuilt.

You could extract the logic from https://github.com/python-wheel-build/fromager/blob/main/src/fromager/bootstrapper.py#L188-L295 into its own function to make some of that logic easier to deal with. A closure inside the existing function might be easier, since that code uses a lot of the variables from elsewhere in the existing function.

tiran

I'm sorry for the blunt feedback, but I really don't like this feature. It's making one of the most complex parts of Fromager even more complicated. I spent half an hour with the new code and I still don't understand all its nooks and crannies. That worries me. I found at least one fundamental design flaw with bootstrap-parallel and failing wheel builds.

What use case do you want to solve? Make onboarding of new components easier for a developer? Can we implement simpler features to improve the UX of onboarding components?

add a global command line flag --prebuilt-packages to mark packages are pre-built without creating a config file.
add a --keep-going flag to build, build-parallel, and bootstrap-parallel commands. When the flag is set, then an exception from wheels.build_wheel no longer stops the build and Fromager continues with building packages. At the end, print which packages have failed to build.
a new command that takes a constraints and requirements file, resolves all install dependencies, and tells the user which packages do neither have an sdist on PyPI nor settings/hooks to fetch sdist from somewhere else. Thanks to PEP 714 core metadata, this check should be fast.

src/fromager/bootstrapper.py

tiran · 2025-08-27T08:55:11Z

src/fromager/commands/bootstrap.py

        cache_wheel_server_url=cache_wheel_server_url,
        sdist_only=True,
        skip_constraints=skip_constraints,
+        test_mode=test_mode,


bootstrap-parallel runs bootstrap in sdist-only mode. That means it's not compiling any wheels except for build system requirements. The test-mode flag does not affect the wheel building for most packages.

Right, If you want to test actual wheel compilation failures, you'd need to run fromager bootstrap --test-mode package1 package2
However it will be still useful for bootstrap-parallel to identify some issues with the source distribution e.g. identifying of the source of a package is not available, or any-other issue with dependency resolution

LalatenduMohanty · 2025-09-15T16:14:24Z

As discussed with @tiran last week, I will create a design doc and get it reviewed first then will update this PR. cc @dhellmann

LalatenduMohanty · 2025-09-19T20:30:13Z

@dhellmann @tiran Here is the design document for the current implementation : doc

dhellmann · 2025-11-01T13:51:50Z

@dhellmann @tiran Here is the design document for the current implementation : doc

That design seems more like what I was expecting. It isolates the check for test mode in the bootstrapper where it should be.

To handle the parallel case, we could say this feature is not supported, to avoid having to change the build code in a similar way. Downstream we can control the mode when we want to run a test.

LalatenduMohanty · 2025-11-03T21:36:13Z

To handle the parallel case, we could say this feature is not supported, to avoid having to change the build code in a similar way. Downstream we can control the mode when we want to run a test.

I am not clear on what we want to do for the build-parallel use case.

LalatenduMohanty · 2025-11-03T21:50:35Z

Updated the design doc with Test mode is only supported with bootstrap (serial mode), not bootstrap-parallel. This ensures comprehensive failure detection including wheel compilation failures. Most packages will use cached wheels, so serial mode performance is acceptable for testing.

tiran · 2025-11-05T15:56:55Z

src/fromager/bootstrapper.py


+
+@dataclasses.dataclass
+class BuildResult:


I like the use of a data class here. How about you include the requirement, resolved version, and exception for failure?

Sounds good to me. I see #837

I have accepted the suggestion and fixed the code.

tiran · 2025-11-05T16:00:14Z

src/fromager/bootstrapper.py

+
+    def _find_cached_wheel(
+        self, req: Requirement, resolved_version: Version
+    ) -> tuple[pathlib.Path | None, pathlib.Path | None]:


Are all four combinations possible or just these two?

Suggested change

) -> tuple[pathlib.Path | None, pathlib.Path | None]:

) -> tuple[pathlib.Path, pathlib.Path] | tuple[None, None]:

Looks like these returns are possible:
- (Path, Path): Wheel found with metadata successfully extracted
- (Path, None): Wheel found but metadata extraction failed
- (None, None): No matching wheel found in any location

So current code looks right to me

I have not changed the code for this, PTAL at the code and let me know if you disagree.

src/fromager/packagesettings.py

tiran · 2025-11-05T16:04:28Z

src/fromager/packagesettings.py

        self._patches_dir = patches_dir
        self._max_jobs = max_jobs
        self._pbi_cache: dict[Package, PackageBuildInfo] = {}
+        self.pre_built_override: set[NormalizedName] = set()


I think it would make more sense to have _pre_built_override on the WorkContext object and then have a small API to add / get pre_built overrides. WDYT?

I agree. Fixing it.

LalatenduMohanty · 2025-11-12T21:33:00Z

If multiple packages fails and they are added as prebuilt, here is a simulated output

ERROR test mode: the following packages failed to build:
ERROR   - package-a==1.0.0
ERROR     Error: CalledProcessError: Command '...' returned non-zero exit status 1
ERROR   - package-b==2.1.0
ERROR     Error: RuntimeError: Missing build dependency gcc
ERROR   - package-c==0.5.0
ERROR     Error: CalledProcessError: setup.py failed
ERROR   - package-d==3.2.1
ERROR     Error: FileNotFoundError: No such file or directory: 'cargo'
ERROR   - package-e==1.5.0
ERROR     Error: RuntimeError: Wheel compilation failed

ERROR test mode: failure breakdown by type:
ERROR   CalledProcessError: 2 package(s)
ERROR   FileNotFoundError: 1 package(s)
ERROR   RuntimeError: 2 package(s)

ERROR test mode: 5 package(s) failed to build

LalatenduMohanty · 2025-11-12T22:40:40Z

Here is the actual test and its output

$ cat requirements.txt 
requests
pytest-asyncio
urllib3
beautifulsoup4

$ cat constraints.txt 
requests>=2.25.0
urllib3==2.2.3
beautifulsoup4>=4.9.0
certifi==2024.8.30
charset-normalizer==3.3.0
pytest-asyncio==1.1.0
setuptools==80.9
setuptools-scm==9.2

```
Command : $ fromager -c constraints.txt bootstrap --test-mode -r requirements.txt

Result
```
17:36:58 ERROR test mode: the following packages failed to build:
17:36:58 ERROR   - pytest-asyncio==1.1.0
17:36:58 ERROR     Error: CalledProcessError: Command '['/home/lmohanty/code/github.com/python-wheel-build/fromager/src/fromager/run_network_isolation.sh', '/home/lmohanty/code/github.com/python-wheel-build/fromager/work-dir/pytest_asyncio-1.1.0/build-3.14.0/bin/python3', '/home/lmohanty/.local/share/hatch/env/virtual/fromager/IHvMNJ7t/fromager/lib/python3.14/site-packages/pyproject_hooks/_in_process/_in_process.py', 'get_requires_for_build_wheel', '/tmp/tmpknvbag5e']' returned non-zero exit status 1.
17:36:58 ERROR 
17:36:58 ERROR test mode: failure breakdown by type:
17:36:58 ERROR   CalledProcessError: 1 package(s)
17:36:58 ERROR test mode: 1 package(
```

src/fromager/bootstrapper.py

dhellmann · 2025-11-12T22:45:58Z

src/fromager/bootstrapper.py

-                "get install dependencies of wheel %s",
-                wheel_filename.name,
+        # Get install dependencies - much simpler logic
+        if result.failed:


This should check for test-mode, right?

The value of result.failed is true only in the test-mode. However your concern is valid. We can make the code more defensive for future errors by changing it to if self.test_mode and result.failed:

dhellmann · 2025-11-12T22:47:30Z

src/fromager/bootstrapper.py

+            )
+
+            try:
+                self._mark_package_as_pre_built_runtime(req)


Instead of changing state and going through self._build_wheel_and_sdist() again, could we just invoke self._download_prebuilt() directly?

This is a good catch. The previous code was not efficient. Fixed it.

dhellmann · 2025-11-12T22:54:01Z

src/fromager/commands/bootstrap.py

+            except Exception as err:
+                if test_mode:
+                    # Test mode: log error but continue processing
+                    logger.error(


I think this is where we end up if bootstrap() fails to resolve a version, but that error isn't saved in the list of errors to be reported at the end of the program and cause it to exit with an error.

Thats right, there is the else condition for normal mode.

I wasn't clear.

I want resolution errors to be included with all of the others. In test mode, the bootstrapper should build everything it can, without stopping. It should collect all errors of any kind and save them to be reported when the program exits.

makes sense, thanks for pointing this out. Fixed it.

Added following.

+ bt.failed_builds.append( + bootstrapper.BuildResult.failure(req=req, exception=err)

dhellmann · 2025-11-14T18:16:32Z

src/fromager/commands/bootstrap.py

+            except Exception as err:
+                if test_mode:
+                    # Test mode: log error but continue processing
+                    logger.error(


I wasn't clear.

I want resolution errors to be included with all of the others. In test mode, the bootstrapper should build everything it can, without stopping. It should collect all errors of any kind and save them to be reported when the program exits.

dhellmann · 2025-11-14T18:18:55Z

src/fromager/context.py

    def enable_parallel_builds(self) -> None:
        self._parallel_builds = True

+    def add_pre_built_override(self, package_name: str | NormalizedName) -> None:


With the change in the logic in the bootstrapper to invoke the logic for prebuilt packages directly, I think you can remove this function.

Sorry, I misunderstood. I removed my previous comments. Going to fix it.

Add --test-mode flag that enables resilient bootstrapping by marking failed packages as pre-built and continuing until all packages are processed. Uses optimal n+1 retry logic with comprehensive failure reporting including exception types, messages, and per-package context. Major changes: - Enhanced BuildResult dataclass with req, resolved_version, and exception tracking for detailed failure analysis - Refactored pre_built_override from Settings to WorkContext for proper separation of static config vs runtime state - Introduced public WorkContext.package_build_info() API, replacing direct Settings access across commands (bootstrap, build, graph, list-overrides) - Fixed build-parallel command to use new public API - Added 4 essential test scenarios in test_bootstrap_test_mode.py Benefits: - Discover all build failures in one run rather than stopping on first failure - Support mixed source/binary dependency workflows - Better error context for debugging failed builds - Cleaner API boundaries between configuration and runtime context Fixes python-wheel-build#713 Co-developed-with: Cursor IDE with Claude 4.0 Sonnet Signed-off-by: Lalatendu Mohanty <lmohanty@redhat.com>

LalatenduMohanty requested a review from a team as a code owner August 22, 2025 06:31

LalatenduMohanty force-pushed the bootstrap-testmode branch from 09630a6 to 919ec13 Compare August 22, 2025 06:39

LalatenduMohanty marked this pull request as draft August 22, 2025 06:43

dhellmann requested changes Aug 25, 2025

View reviewed changes

LalatenduMohanty force-pushed the bootstrap-testmode branch 4 times, most recently from 714b3f4 to 4a343fa Compare August 26, 2025 20:36

tiran requested changes Aug 27, 2025

View reviewed changes

LalatenduMohanty force-pushed the bootstrap-testmode branch 2 times, most recently from 0cb15a6 to 54dfc06 Compare September 19, 2025 20:13

LalatenduMohanty mentioned this pull request Sep 19, 2025

bootstrap test mode #713

Open

LalatenduMohanty force-pushed the bootstrap-testmode branch from 54dfc06 to 877f2c5 Compare September 19, 2025 20:30

LalatenduMohanty force-pushed the bootstrap-testmode branch from 877f2c5 to 23eb2a6 Compare November 3, 2025 21:14

LalatenduMohanty marked this pull request as ready for review November 3, 2025 21:35

LalatenduMohanty force-pushed the bootstrap-testmode branch from 23eb2a6 to 7ecef97 Compare November 3, 2025 21:48

mergify bot added the ci label Nov 3, 2025

LalatenduMohanty marked this pull request as draft November 3, 2025 21:50

tiran reviewed Nov 5, 2025

View reviewed changes

LalatenduMohanty force-pushed the bootstrap-testmode branch 5 times, most recently from 92a3dbb to cb38aef Compare November 12, 2025 00:42

LalatenduMohanty marked this pull request as ready for review November 12, 2025 00:44

LalatenduMohanty force-pushed the bootstrap-testmode branch from cb38aef to 409554d Compare November 12, 2025 00:50

dhellmann reviewed Nov 12, 2025

View reviewed changes

LalatenduMohanty force-pushed the bootstrap-testmode branch 2 times, most recently from dc5ee5a to f32aa3d Compare November 13, 2025 02:54

dhellmann reviewed Nov 14, 2025

View reviewed changes

LalatenduMohanty force-pushed the bootstrap-testmode branch 3 times, most recently from 6539a5a to ca3e0da Compare November 14, 2025 23:14

LalatenduMohanty force-pushed the bootstrap-testmode branch from ca3e0da to 1bf3690 Compare November 14, 2025 23:34

	) -> tuple[pathlib.Path \| None, pathlib.Path \| None]:
	) -> tuple[pathlib.Path, pathlib.Path] \| tuple[None, None]:

feat: Add --test-mode for resilient bootstrap with failure handling #719

Are you sure you want to change the base?

feat: Add --test-mode for resilient bootstrap with failure handling #719

Uh oh!

Conversation

LalatenduMohanty commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LalatenduMohanty commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhellmann left a comment

Choose a reason for hiding this comment

Uh oh!

tiran left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LalatenduMohanty commented Sep 15, 2025

Uh oh!

LalatenduMohanty commented Sep 19, 2025

Uh oh!

dhellmann commented Nov 1, 2025

Uh oh!

LalatenduMohanty commented Nov 3, 2025

Uh oh!

LalatenduMohanty commented Nov 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LalatenduMohanty Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LalatenduMohanty commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LalatenduMohanty commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LalatenduMohanty commented Aug 22, 2025 •

edited

Loading

LalatenduMohanty commented Aug 22, 2025 •

edited

Loading

LalatenduMohanty Nov 11, 2025 •

edited

Loading

LalatenduMohanty commented Nov 12, 2025 •

edited

Loading

LalatenduMohanty commented Nov 12, 2025 •

edited

Loading