File tree Expand file tree Collapse file tree 5 files changed +48
-2
lines changed Expand file tree Collapse file tree 5 files changed +48
-2
lines changed Original file line number Diff line number Diff line change 11# Toggles whether UI should be run locally using gradio hot-reloading
22# or should be included in the remote Helm install
3- run_ui_locally = True
3+ run_ui_locally = os . getenv ( "AZIMUTH_LLM_TILT_LOCAL_UI" , True )
44
55# Allow non-local contexts
66allow_k8s_contexts (k8s_context ())
Original file line number Diff line number Diff line change @@ -10,10 +10,25 @@ controls:
1010 type : MirrorControl
1111 path : /huggingface/model
1212 visuallyHidden : true
13+ # Azimuth UI doesn't handle json type ["integer","null"]
14+ # properly so we allow any type in JSON schema then
15+ # constrain to (optional) integer here.
16+ /api/modelMaxContextLength :
17+ type : IntegerControl
18+ minimum : 100
19+ step : 100
20+ required : false
1321
1422sortOrder :
1523 - /huggingface/model
1624 - /huggingface/token
1725 - /ui/appSettings/hf_model_instruction
1826 - /ui/appSettings/page_title
27+ - /api/image/version
1928 - /ui/appSettings/llm_temperature
29+ - /ui/appSettings/llm_max_tokens
30+ - /ui/appSettings/llm_frequency_penalty
31+ - /ui/appSettings/llm_presence_penalty
32+ - /ui/appSettings/llm_top_p
33+ - /ui/appSettings/llm_top_k
34+ - /api/modelMaxContextLength
Original file line number Diff line number Diff line change 2929 - --model
3030 - {{ .Values.huggingface.model }}
3131 {{- include "azimuth-llm.chatTemplate" . | nindent 10 }}
32+ {{- if .Values.api.modelMaxContextLength -}}
33+ - --max-model-len
34+ - {{ .Values.api.modelMaxContextLength | quote }}
35+ {{- end -}}
3236 {{- if .Values.api.extraArgs -}}
3337 {{- .Values.api.extraArgs | toYaml | nindent 10 }}
3438 {{- end -}}
Original file line number Diff line number Diff line change 9292 "required" : [" hf_model_name" , " hf_model_instruction" ]
9393 }
9494 }
95+ },
96+ "api" : {
97+ "type" : " object" ,
98+ "properties" : {
99+ "modelMaxContextLength" : {
100+ "title" : " Model Context Length" ,
101+ "description" : " An override for the maximum context length to allow, if the model's default is not suitable."
102+ },
103+ "image" : {
104+ "type" : " object" ,
105+ "properties" : {
106+ "version" : {
107+ "type" : " string" ,
108+ "title" : " Backend vLLM version" ,
109+ "description" : " The vLLM version to use as a backend. Must be a version tag from [this list](https://github.com/vllm-project/vllm/tags)" ,
110+ "default" : " v0.4.3"
111+ }
112+ }
113+ }
114+ }
95115 }
96116 }
97117}
Original file line number Diff line number Diff line change 5151 iconUrl : https://raw.githubusercontent.com/vllm-project/vllm/v0.2.7/docs/source/assets/logos/vllm-logo-only-light.png
5252 description : |
5353 The raw inference API endpoints for the deployed LLM.
54+
5455 # Config for huggingface model cache volume
5556 # This is mounted at /root/.cache/huggingface in the api deployment
5657 cacheVolume :
5758 hostPath :
5859 path : /tmp/llm/huggingface-cache
60+
5961 # Number of gpus to requests for each api pod instance
6062 # NOTE: This must be in the range 1 <= value <= N, where
6163 # 'N' is the number of GPUs available in a single
7173 # to preform a rolling zero-downtime update
7274 updateStrategy :
7375 type : Recreate
76+
77+ # The value of the vLLM backend's max_model_len argument (if the model's default is not suitable)
78+ # https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html#command-line-arguments-for-the-server
79+ modelMaxContextLength :
80+
7481 # Extra args to supply to the vLLM backend, see
75- # https://github.com/ vllm-project/vllm/blob/main/vllm/entrypoints/openai/api_server.py
82+ # https://docs. vllm.ai/en/stable/serving/openai_compatible_server.html#command-line-arguments-for-the-server
7683 extraArgs : []
7784
7885# Configuration for the frontend web interface
You can’t perform that action at this time.
0 commit comments