Skip to content

Conversation

@adityamaru
Copy link

Summary

  • Add 30-second timeout to buildxHistory.export() operation
  • Prevent workflow jobs from hanging when buildx dial-stdio crashes
  • Continue with build reporting even if export fails

Problem

Customer reported workflow jobs timing out with "broken pipe" errors when the buildx backend process becomes unresponsive during the export phase.

Solution

Implemented a timeout wrapper using Promise.race() that:

  • Times out after 30 seconds if the export operation hangs
  • Logs a warning but continues with build reporting
  • Gracefully handles export failures without blocking the entire build

Test plan

  • Build completes successfully
  • Linting passes
  • TypeScript compilation succeeds
  • Test with actual failing buildx export scenario

🤖 Generated with Claude Code

adityamaru and others added 30 commits September 11, 2024 20:08
1. Checks we have buildx installed
2. Configures a remote builder if we get an address back
3. Uses the already configured builder if we don't get an address back

This change does not plumb the dockerfile path through as the entity,
and does not differentiate a failed build from a succesful to report
to anvil in the post step yet.
*: basic scaffolding for build-push-action
* tls

* set up tls while creating the remote builder
This change teaches the build push action to request a stickydisk
every time it runs. Once the SD is hotloaded the VM will mount
the buildkit root dir and starts buildkitd.
adityamaru and others added 28 commits May 18, 2025 16:29
src: add a retry with backoff to combat 429s when downloading buildkit
*: allow users to pass in a buildx version
The remote builder was hardcoded to use --platform linux/amd64
regardless of user input or runner architecture. This caused
performance issues on ARM runners and cache inefficiencies.

Now properly uses the platforms input or detects host architecture
to avoid unnecessary QEMU emulation and improve build performance.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
The test was hardcoded to expect arm64 platform, causing failures
on AMD runners. Now checks actual host architecture dynamically.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
fix: use correct platform when creating remote buildx builder
src: use BLACKSMITH prefixed VM ID env var
src: only prune if buildkitd was spun up
- Remove buildkitd startup and configuration logic
- Remove buildkitd shutdown and cleanup from both main and post actions
- Remove buildkitd-related imports and helper functions
- Update startBlacksmithBuilder to check for existing builder from setup-docker-builder
- Keep sticky disk setup and build reporting functionality intact

BREAKING CHANGE: This action now requires setup-docker-builder to be run first to manage the Docker builder lifecycle
- Remove sticky disk mounting and unmounting logic
- Remove sticky disk commit logic from both main and post actions
- Replace setupStickyDisk with reportBuildStart to only report build start
- Update build completion reporting to not depend on exposeId
- Keep build tracking and reporting functionality intact

The sticky disk lifecycle is now fully managed by setup-docker-builder
This commit completes the refactoring of build-push-action to focus solely on
Docker build reporting and metrics, with all infrastructure management moved
to the separate setup-docker-builder action.

Changes:
- Remove all setupOnly references from context.ts, main.ts, and state-helper.ts
- Rename startBlacksmithBuilder to reportBuildMetrics to better reflect its purpose
- Remove exposeId from all function signatures and state management
- Remove sticky disk commit logic from reporter.ts
- Update tests to match new function names and signatures
- Clean up unused imports and fix linting issues

The action now assumes that a Docker builder has already been configured
(either via setup-docker-builder or existing setup) and focuses only on:
- Running Docker builds with the configured builder
- Reporting build metrics and status to Blacksmith API
- Managing build outputs and metadata

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
This commit completes the cleanup of build-push-action by removing all
the buildkit and sticky disk setup logic that has been moved to the
separate setup-docker-builder action.

Changes:
- Delete setup_builder.ts which contained 380+ lines of unused code
- Create new build-reporter.ts with only the reportBuildStart function
- Update all imports to use the new build-reporter module
- The new file name better reflects its single responsibility

The action is now cleaner and more focused, with infrastructure setup
logic properly separated into the setup-docker-builder action.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Inline the reportBuildStart function directly into main.ts since it was
just a thin wrapper around reporter.reportBuild. This removes an
unnecessary abstraction layer and makes the code simpler.

Changes:
- Delete build-reporter.ts file
- Inline the reportBuild logic directly in reportBuildMetrics function
- Update tests to mock reporter.reportBuild directly
- Fix test expectations to match the new error messages

The code is now cleaner with one less file and abstraction layer.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Since setup-docker-builder handles all infrastructure setup, build-push-action
no longer needs to:
- Install buildx (just assert it's available)
- Manage temporary directories (handled by actions toolkit)

Changes:
- Replace buildx installation with simple availability assertion
- Remove tmpDir state management entirely
- Remove buildx-version input parameter
- Clean up unused imports and functions

The action now assumes buildx is already configured by setup-docker-builder
or another setup action, making it simpler and more focused.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Remove the nofallback input parameter as it's no longer needed. The action
now assumes that a builder is already configured and available.

Changes:
- Remove nofallback from action.yml inputs
- Remove nofallback from Inputs interface and getInputs function
- Update tests to remove nofallback references
- Also remove setup-only and buildx-version from action.yml (already removed from code)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Fix axios-retry type compatibility issues in reporter.ts
- Remove unused imports (Context, Metric_MetricType)
- Update test expectations to match current implementation
- Fix ESLint errors and apply formatting

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Rename reportBuildMetrics to reportBuildStart for clarity
- Check if builder name contains 'blacksmith' before reporting
- Log warning when not using a Blacksmith builder
- Skip build start and completion reporting for non-Blacksmith builders
- Update tests to reflect function rename

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Move builder variable declaration to outer scope
- Use optional chaining for builder.name in buildRef call
- Fixes runtime error where builder was undefined outside its declaration scope

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- The @ts-expect-error directive was causing CI failures
- npm dependencies don't have the type mismatch that pnpm dependencies had
- Verified all tests and builds pass with npm commands matching CI

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- v2 is a breaking version requiring setup-docker-builder to be called first
- Commenting out workflow to prevent accidentally bumping v1 tags to breaking changes
- Workflow kept as reference for future tag bumping procedures

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…anagement

refactor: split build-push-action to separate infrastructure setup
Add a 30-second timeout to the buildxHistory.export() call to prevent
workflow jobs from hanging indefinitely when the buildx dial-stdio
process crashes or becomes unresponsive. The build will continue
with reporting even if the export fails or times out.

Fixes issue where docker run export-build command would hang with
"broken pipe" errors when buildx backend is unavailable.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants