Skip to content

Conversation

@Alex-Welsh
Copy link
Member

Previously, the terraform apply would just run 5 times. If it failed, it would wait 60 seconds.
All AIOs are triggered at the same time, so might conflict.

Now, it runs 6 times. After 3 attempts, wait for 2 hours (the cloud is probably at capacity, waits for other jobs to finish)
Normal waits are randomised from 1 to 3 minutes

@Alex-Welsh Alex-Welsh requested a review from a team as a code owner October 14, 2025 10:45
@gemini-code-assist
Copy link

Note

Gemini is unable to generate a review for this pull request due to the file types involved not being currently supported.

@Alex-Welsh
Copy link
Member Author

The one time I want to test a CI failure, every AIO works perfectly first time 🙃

Copy link
Member

@priteau priteau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jump from 60-180 seconds to 7200 is huge (an exponential backoff is more conventional), but let's see how it works. We can keep tuning it.

@Alex-Welsh Alex-Welsh merged commit b57e27d into stackhpc/2025.1 Oct 23, 2025
52 of 56 checks passed
@Alex-Welsh Alex-Welsh deleted the better-aio-retries branch October 23, 2025 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants