Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions .github/workflows/stackhpc-all-in-one.yml
Original file line number Diff line number Diff line change
Expand Up @@ -190,15 +190,23 @@ jobs:
- name: Terraform Apply
id: tf_apply
run: |
for attempt in $(seq 5); do
# Try up to 6 times to create the infrastructure, destroying and retrying if it fails.
# If it fails 3 times, wait 2 hours before trying again.
# The cloud is likely just at capacity, so wait until other jobs finish.
for attempt in $(seq 6); do
if terraform apply -auto-approve; then
echo "Created infrastructure on attempt $attempt"
exit 0
fi
echo "Failed to create infrastructure on attempt $attempt"
sleep 10
terraform destroy -auto-approve
sleep 60
if [ "$attempt" -eq 3 ]; then
echo "Sleeping for 2 hours after 3 failed attempts..."
sleep 7200
else
sleep $(shuf -i 60-180 -n 1)
fi
done
echo "Failed to create infrastructure after $attempt attempts"
exit 1
Expand Down