Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion serverless/endpoints/endpoint-configurations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,7 @@ Use these strategies to reduce worker startup times:

1. **Embed models in Docker images:** Package your ML models directly within your worker container image instead of downloading them in your handler function. This strategy places models on the worker's high-speed local storage (SSD/NVMe), dramatically reducing the time needed to load models into GPU memory. This approach is optimal for production environments, though extremely large models (500GB+) may require network volume storage.

2. **Store large models on network volumes:** For flexibility during development, save large models to a [network volume](/storage/network-volumes) using a Pod or one-time handler, then mount this volume to your Serverless workers. While network volumes offer slower model loading compared to embedding models directly, they can speed up your workflow by enabling rapid iteration and seamless switching between different models and configurations.
2. **Store large models on network volumes:** For flexibility during development, save large models to a [network volume](/storage/network-volumes) using a Pod or one-time handler, then mount this volume to your Serverless workers. While network volumes offer slower model loading compared to embedding models directly, they can speed up your workflow by enabling rapid iteration and seamless switching between different models and configurations. For even faster loading times, consider using [high performance network volumes](/storage/network-volumes#high-performance-storage), which offer up to 2x faster read/write speeds.

3. **Maintain active workers:** Set active worker counts above zero to completely eliminate cold starts. These workers remain ready to process requests instantly and cost up to 30% less when idle compared to standard (flex) workers.

Expand Down
2 changes: 1 addition & 1 deletion serverless/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ flowchart TD

A "cold start" refers to the time between when an endpoint with no running workers receives a request, and when a worker is fully "warmed up" and ready to handle the request. This generally involves starting the container, loading models into GPU memory, and initializing runtime environments. Larger models take longer to load into memory, increasing cold start time, and request response time by extension.

Minimizing cold starts is key to creating a responsive and cost-effective endpoint. You can reduce cold starts by using [cached models](/serverless/endpoints/model-caching), enabling [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot), setting [active worker counts](/serverless/endpoints/endpoint-configurations#active-min-workers) above zero.
Minimizing cold starts is key to creating a responsive and cost-effective endpoint. You can reduce cold starts by using [cached models](/serverless/endpoints/model-caching), enabling [FlashBoot](/serverless/endpoints/endpoint-configurations#flashboot), setting [active worker counts](/serverless/endpoints/endpoint-configurations#active-min-workers) above zero, or using [high performance network volumes](/storage/network-volumes#high-performance-storage) to speed up model and dataset loading.

### [Load balancing endpoints](/serverless/load-balancing/overview)

Expand Down
23 changes: 19 additions & 4 deletions storage/network-volumes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,17 @@ If your account lacks sufficient funds to cover storage costs, your network volu

</Warning>

## High performance storage

High performance network volumes offer up to 2x faster read/write speeds compared to standard network volumes. This can reduce cold start times for Serverless workers and improve performance for Pod and Instant Cluster workflows that require reading and writing large models or many small files.

High performance storage costs \$0.14 per GB per month (compared to \$0.07 per GB per month for standard storage) and is currently available in the `CA-MTL-4` (Montreal, Canada) data center.

{/* Table to support future DCs:
| Datacenter | Region |
| ---------- | ------ |
| CA-MTL-4 | Montreal, Canada | */}

## Create a network volume

<Warning>
Expand All @@ -41,10 +52,14 @@ To create a new network volume:

1. Navigate to the [Storage page](https://www.console.runpod.io/user/storage) in the Runpod console.
2. Click **New Network Volume**.
3. Select a datacenter for your volume. Datacenter location does not affect pricing, but determines which GPU types and endpoints your network volume can be used with.
4. Provide a descriptive name for your volume (e.g., "project-alpha-data" or "shared-models").
5. Specify the desired size for the volume in gigabytes (GB).
6. Click **Create Network Volume**.
3. (Optional) At the top of the **Datacenter** section, you can filter for datacenters that support these features:
- [High Performance](#high-performance-storage)
- [Global Networking](/pods/networking)
- [S3 Compatible](/storage/s3-api).
4. Select a datacenter for your volume. Datacenter location does not affect pricing, but determines which GPU types and endpoints your network volume can be used with.
5. Provide a descriptive name for your volume (e.g., "project-alpha-data" or "shared-models").
6. Specify the desired size for the volume in gigabytes (GB).
7. Click **Create Network Volume**.

You can edit and delete your network volumes using the [Storage page](https://www.console.runpod.io/user/storage).

Expand Down