Persistant storage #871
Replies: 3 comments
-
I am trying to build a Workflow that, unfortunately, is not well served by the cache construct. There are at least two significant problems with the way caches are implemented:
GitHub caches are a reasonable solution if (1) your Workflow regularly needs most/all of the cached content, and (2) the thing you are caching remains stable over time. But that's a fairly narrow use case. And if you're doing something not ideally suited to that, it gets ugly pretty quickly---it's not a particularly general or flexible sort of mechanism that lends itself to a wide variety of use cases. A persistent storage volume, as @cynicaljoy described, is exactly what I want. Conceptually, it serves the same need as volumes in a Docker container. Not only would that let me implement caching and persistent state in a manner that suits my application, it would be flexible enough to support the immutable-caching scenario (just offer an API call to switch read-only status) for the occasions when that makes sense. (Frankly, I don't understand why this approach wasn't implemented to begin with. The current way of doing things is so much more complex and hard to understand than "just" mounting a volume that there has to be some non-obvious reasoning behind it.) |
Beta Was this translation helpful? Give feedback.
-
Hi, I built ZeroFS and just made a GitHub Action for it that provides persistent volumes: https://github.com/marketplace/actions/zerofs-volume - uses: Barre/zerofs@v1
with:
object-store-url: 's3://bucket/path'
encryption-password: ${{ secrets.ZEROFS_PASSWORD }}
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
mount-path: '/mnt/persistent' It mounts S3 (or any S3-compatible storage) as a regular filesystem via NFS. Your workflows just read/write files normally and the data persists between runs. Main differences from cache/artifacts:
|
Beta Was this translation helpful? Give feedback.
-
The GitHub hosted runner can run in any region by default. For what I see, the free runners are in the regions of US. To see the region the job is running in, you can make a HTTP query to instance metadata service. Mounting own storages may have delays if the GitHub runner happens to be in another region than the storage. There is a feature request to specify region for GitHub hosted runner. There is an existing way to specify the region. I set up in a paid GitHub subscription a larger runner with Azure Virtual Network (VNET). The region of this VNET determines the region of the GitHub hosted runner. To avoid networking costs and minimum delays, I set up a file share in the same region. For authentication I created in Azure a managed identity with federated credential for the GitHub repository, branch or environment. Login to this managed identity with For cleanup I added at end of the job a |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Rather than cache/artifact assets needing to be constantly uploaded/downloaded, wouldn't it be better to have a persistent storage volume that's owned by the repo owner that gets mounted every execution?
Whether the workflow stores data to the volume or not could be completely up to the workflow. The usage of this volume would be accounted for as part of the "Storage for Actions and Packages" within our account billing.
This should result in less complexity in the Workflow definition file as well as speed up Workflow execution times.
Beta Was this translation helpful? Give feedback.
All reactions