From 26d4853837e8677e1a7e7eaa0eaa4ad5d1de6567 Mon Sep 17 00:00:00 2001 From: Mo King Date: Thu, 23 Oct 2025 08:23:36 -0400 Subject: [PATCH] Add fast storage documentation --- .../endpoints/endpoint-configurations.mdx | 2 +- serverless/overview.mdx | 2 +- storage/network-volumes.mdx | 37 +++++++++++++------ 3 files changed, 27 insertions(+), 14 deletions(-) diff --git a/serverless/endpoints/endpoint-configurations.mdx b/serverless/endpoints/endpoint-configurations.mdx index fd99c71b..995c0c26 100644 --- a/serverless/endpoints/endpoint-configurations.mdx +++ b/serverless/endpoints/endpoint-configurations.mdx @@ -230,7 +230,7 @@ Use these strategies to reduce worker startup times: 1. **Embed models in Docker images:** Package your ML models directly within your worker container image instead of downloading them in your handler function. This strategy places models on the worker's high-speed local storage (SSD/NVMe), dramatically reducing the time needed to load models into GPU memory. This approach is optimal for production environments, though extremely large models (500GB+) may require network volume storage. -2. **Store large models on network volumes:** For flexibility during development, save large models to a [network volume](/storage/network-volumes) using a Pod or one-time handler, then mount this volume to your Serverless workers. While network volumes offer slower model loading compared to embedding models directly, they can speed up your workflow by enabling rapid iteration and seamless switching between different models and configurations. +2. **Store large models on network volumes:** For flexibility during development, save large models to a [network volume](/storage/network-volumes) using a Pod or one-time handler, then mount this volume to your Serverless workers. While network volumes offer slower model loading compared to embedding models directly, they can speed up your workflow by enabling rapid iteration and seamless switching between different models and configurations. For even faster loading times, consider using [high performance network volumes](/storage/network-volumes#high-performance-storage), which offer up to 2x faster read/write speeds. 3. **Maintain active workers:** Set active worker counts above zero to completely eliminate cold starts. These workers remain ready to process requests instantly and cost up to 30% less when idle compared to standard (flex) workers. diff --git a/serverless/overview.mdx b/serverless/overview.mdx index 37991597..c84d358e 100644 --- a/serverless/overview.mdx +++ b/serverless/overview.mdx @@ -100,7 +100,7 @@ flowchart TD A "cold start" refers to the time between when an endpoint with no running workers receives a request, and when a worker is fully "warmed up" and ready to handle the request. This generally involves starting the container, loading models into GPU memory, and initializing runtime environments. Larger models take longer to load into memory, increasing cold start time, and request response time by extension. -Minimizing cold starts is key to creating a responsive and cost-effective endpoint. You can reduce cold starts by using [cached models](/serverless/endpoints/model-caching), enabling [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot), and by setting [active worker counts](/serverless/endpoints/endpoint-configurations#active-min-workers) above zero. +Minimizing cold starts is key to creating a responsive and cost-effective endpoint. You can reduce cold starts by using [cached models](/serverless/endpoints/model-caching), enabling [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot), setting [active worker counts](/serverless/endpoints/endpoint-configurations#active-min-workers) above zero, or using [high performance network volumes](/storage/network-volumes#high-performance-storage) to speed up model and dataset loading. ### [Load balancing endpoints](/serverless/load-balancing/overview) diff --git a/storage/network-volumes.mdx b/storage/network-volumes.mdx index 9bcc961b..4f38d734 100644 --- a/storage/network-volumes.mdx +++ b/storage/network-volumes.mdx @@ -26,19 +26,18 @@ If your account lacks sufficient funds to cover storage costs, your network volu -## Create a network volume +## High performance storage - - +High performance network volumes offer up to 2x faster read/write speeds compared to standard network volumes. This can reduce cold start times for Serverless workers and improve performance for Pod and Instant Cluster workflows that require reading and writing large models or many small files. -To create a new network volume: +High performance storage costs \$0.14 per GB per month (compared to \$0.07 per GB per month for standard storage) and is currently available in the `CA-MTL-4` (Montreal, Canada) data center. -1. Navigate to the [Storage page](https://www.console.runpod.io/user/storage) in the Runpod console. -2. Select **New Network Volume**. -3. Configure your volume: - - Select a datacenter for your volume. Datacenter location does not affect pricing, but determines which GPU types and endpoints your network volume can be used with. - - Provide a descriptive name for your volume (e.g., "project-alpha-data" or "shared-models"). - - Specify the desired size for the volume in gigabytes (GB). +{/* Table to support future DCs: +| Datacenter | Region | +| ---------- | ------ | +| CA-MTL-4 | Montreal, Canada | */} + +## Create a network volume @@ -46,7 +45,21 @@ Network volume size can be increased later, but cannot be decreased. -4. Select **Create Network Volume**. + + + +To create a new network volume: + +1. Navigate to the [Storage page](https://www.console.runpod.io/user/storage) in the Runpod console. +2. Click **New Network Volume**. +3. (Optional) At the top of the **Datacenter** section, you can filter for datacenters that support these features: + - [High Performance](#high-performance-storage) + - [Global Networking](/pods/networking) + - [S3 Compatible](/storage/s3-api). +4. Select a datacenter for your volume. Datacenter location does not affect pricing, but determines which GPU types and endpoints your network volume can be used with. +5. Provide a descriptive name for your volume (e.g., "project-alpha-data" or "shared-models"). +6. Specify the desired size for the volume in gigabytes (GB). +7. Click **Create Network Volume**. You can edit and delete your network volumes using the [Storage page](https://www.console.runpod.io/user/storage). @@ -325,4 +338,4 @@ The destination Pod should show similar disk usage to the source Pod if all file You can run the `rsync` command multiple times if the transfer is interrupted. The `--inplace` flag ensures that `rsync` resumes from where it left off rather than starting over. - \ No newline at end of file +