Persistant storage #871

cynicaljoy · 2020-12-18T15:16:57Z

cynicaljoy
Dec 18, 2020

Rather than cache/artifact assets needing to be constantly uploaded/downloaded, wouldn't it be better to have a persistent storage volume that's owned by the repo owner that gets mounted every execution?

Whether the workflow stores data to the volume or not could be completely up to the workflow. The usage of this volume would be accounted for as part of the "Storage for Actions and Packages" within our account billing.

This should result in less complexity in the Workflow definition file as well as speed up Workflow execution times.

iskunk · 2024-04-20T04:40:47Z

iskunk
Apr 20, 2024

I am trying to build a Workflow that, unfortunately, is not well served by the cache construct. There are at least two significant problems with the way caches are implemented:

The storage and retrieval medium: a literal tarball that is persisted in blob storage. This has to be downloaded, unpacked, then (re)packed, and (re)uploaded---a noticeably slow operation. Inefficient, too: If my cache has 5 GB of data, and a particular Workflow run only needs a 1 kB file from it? Still gotta move and unpack 5 GB of data. Compare that to e.g. NFS, where just the 1 kB goes over the wire.
Cache content is immutable once uploaded. So if your cache behaves like a typical cache, where new entries are added and old entries expired as time goes on, you have to do this bizarre dance with restore-keys: to create a whole new updated copy of the cache from the old one at each iteration. Housekeeping of obsolete cache revisions is left as an exercise for the reader.

GitHub caches are a reasonable solution if (1) your Workflow regularly needs most/all of the cached content, and (2) the thing you are caching remains stable over time. But that's a fairly narrow use case. And if you're doing something not ideally suited to that, it gets ugly pretty quickly---it's not a particularly general or flexible sort of mechanism that lends itself to a wide variety of use cases.

A persistent storage volume, as @cynicaljoy described, is exactly what I want. Conceptually, it serves the same need as volumes in a Docker container. Not only would that let me implement caching and persistent state in a manner that suits my application, it would be flexible enough to support the immutable-caching scenario (just offer an API call to switch read-only status) for the occasions when that makes sense.

(Frankly, I don't understand why this approach wasn't implemented to begin with. The current way of doing things is so much more complex and hard to understand than "just" mounting a volume that there has to be some non-obvious reasoning behind it.)

0 replies

Barre · 2025-07-21T09:11:16Z

Barre
Jul 21, 2025

Hi,

I built ZeroFS and just made a GitHub Action for it that provides persistent volumes:

https://github.com/marketplace/actions/zerofs-volume

  - uses: Barre/zerofs@v1
    with:
      object-store-url: 's3://bucket/path'
      encryption-password: ${{ secrets.ZEROFS_PASSWORD }}
      aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
      aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
      mount-path: '/mnt/persistent'

It mounts S3 (or any S3-compatible storage) as a regular filesystem via NFS. Your workflows just read/write files normally and the data persists between runs.

Main differences from cache/artifacts:

No upload/download steps (Pour probably want to run sync though)
Works with any tools expecting a filesystem
You control the storage (your S3 bucket)

0 replies

kohtala · 2025-07-21T15:33:34Z

kohtala
Jul 21, 2025

The GitHub hosted runner can run in any region by default. For what I see, the free runners are in the regions of US. To see the region the job is running in, you can make a HTTP query to instance metadata service.

Mounting own storages may have delays if the GitHub runner happens to be in another region than the storage.

There is a feature request to specify region for GitHub hosted runner.

There is an existing way to specify the region. I set up in a paid GitHub subscription a larger runner with Azure Virtual Network (VNET). The region of this VNET determines the region of the GitHub hosted runner.

To avoid networking costs and minimum delays, I set up a file share in the same region. For authentication I created in Azure a managed identity with federated credential for the GitHub repository, branch or environment. Login to this managed identity with azure/login action providing only client-id, tenant-id and subscription-id, then fetch the storage account key with az storage account keys --resource-group $RESOURCE_GROUP_NAME --account-name $STORAGE_ACCOUNT_NAME --query '[0].value' -o tsv. I save the storage account key as a password and the storage account name as username in a credential file, and mount using sudo mount -t cifs //$STORAGE_ACCOUNT_NAME.file.core.windows.net/$FILE_SHARE_NAME directory -o gid="$(id -g)",uid="$(id -u)",credentials="$RUNNER_TEMP/cred",serverino,nosharesock,actimeo=30,mfsymlink.

For cleanup I added at end of the job a if: always() && steps.mount.outcome == 'success' step to sudo umount the directory.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Persistant storage #871

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Persistant storage #871

Uh oh!

cynicaljoy Dec 18, 2020

Replies: 3 comments

Uh oh!

iskunk Apr 20, 2024

Uh oh!

Barre Jul 21, 2025

Uh oh!

kohtala Jul 21, 2025

cynicaljoy
Dec 18, 2020

iskunk
Apr 20, 2024

Barre
Jul 21, 2025

kohtala
Jul 21, 2025