Skip to main content

Vertex AI - Self Deployed Models

Deploy and use your own models on Vertex AI through Model Garden or custom endpoints.

Model Gardenโ€‹

tip

All OpenAI compatible models from Vertex Model Garden are supported.

Using Model Gardenโ€‹

Almost all Vertex Model Garden models are OpenAI compatible.

PropertyDetails
Provider Routevertex_ai/openai/{MODEL_ID}
Vertex DocumentationModel Garden LiteLLM Inference, Vertex Model Garden
Supported Operations/chat/completions, /embeddings
from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
model="vertex_ai/openai/<your-endpoint-id>",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Gemma Models (Custom Endpoints)โ€‹

Deploy Gemma models on custom Vertex AI prediction endpoints with OpenAI-compatible format.

PropertyDetails
Provider Routevertex_ai/gemma/{MODEL_NAME}
Vertex DocumentationVertex AI Prediction
Required Parameterapi_base - Full prediction endpoint URL

Proxy Usage:

1. Add to config.yaml

model_list:
- model_name: gemma-model
litellm_params:
model: vertex_ai/gemma/gemma-3-12b-it-1222199011122
api_base: https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict
vertex_project: "my-project-id"
vertex_location: "us-central1"

2. Start proxy

litellm --config /path/to/config.yaml

3. Test it

curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemma-model",
"messages": [{"role": "user", "content": "What is machine learning?"}],
"max_tokens": 100
}'

SDK Usage:

from litellm import completion

response = completion(
model="vertex_ai/gemma/gemma-3-12b-it-1222199011122",
messages=[{"role": "user", "content": "What is machine learning?"}],
api_base="https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict",
vertex_project="my-project-id",
vertex_location="us-central1",
)

MedGemma Models (Custom Endpoints)โ€‹

Deploy MedGemma models on custom Vertex AI prediction endpoints with OpenAI-compatible format. MedGemma models use the same vertex_ai/gemma/ route.

PropertyDetails
Provider Routevertex_ai/gemma/{MODEL_NAME}
Vertex DocumentationVertex AI Prediction
Required Parameterapi_base - Full prediction endpoint URL

Proxy Usage:

1. Add to config.yaml

model_list:
- model_name: medgemma-model
litellm_params:
model: vertex_ai/gemma/medgemma-2b-v1
api_base: https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict
vertex_project: "my-project-id"
vertex_location: "us-central1"

2. Start proxy

litellm --config /path/to/config.yaml

3. Test it

curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "medgemma-model",
"messages": [{"role": "user", "content": "What are the symptoms of hypertension?"}],
"max_tokens": 100
}'

SDK Usage:

from litellm import completion

response = completion(
model="vertex_ai/gemma/medgemma-2b-v1",
messages=[{"role": "user", "content": "What are the symptoms of hypertension?"}],
api_base="https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict",
vertex_project="my-project-id",
vertex_location="us-central1",
)