API Reference | VMware Private AI Service Kubernetes API

API Reference

Packages

pais.vmware.com/v1alpha1

pais.vmware.com/v1alpha1

Package v1alpha1 contains API Schema definitions for the pais v1alpha1 API group

Resource Types

InferenceGatewayRoute
ModelEndpoint
PAISConfiguration

APIRuntimeConfig

APIRuntimeConfig defines configuration/tuning parameters for all Private AI Services API

Appears In:

RuntimeConfig

Field Description Default Validation

Field	Description	Default	Validation
`deployment` RuntimeDeploymentConfig	deployment specifies the desired configuration for the Private AI Services API. Modify only if instructed by Broadcom support to increase concurrent requests.

deployment RuntimeDeploymentConfig

deployment specifies the desired configuration for the Private AI Services API. Modify only if
instructed by Broadcom support to increase concurrent requests.

Auth

Auth describes the authentication backend for Private AI Service

Appears In:

PAISConfigurationSpec

Field Description Default Validation

Field	Description	Default	Validation
`oidc` OIDC	oidc defines the OpenID Connect connection details that the Private AI Services UI will use to authenticate users

oidc OIDC

oidc defines the OpenID Connect connection details that the Private AI Services UI will use to authenticate users

AutomaticUpgradeStrategy

AutomaticUpgradeStrategy means the Supervisor Service will manage upgrades

Appears In:

UpgradeStrategy

BackendAuth

BackendAuth configures authentication when connecting to a backend

Appears In:

InferenceGatewayRouteBackend

Field Description Default Validation

Field	Description	Default	Validation
`apiTokenRef` LocalObjectReference	apiTokenRef may be used to provide a bearer token when making HTTPS requests to the backend. It must name a Secret in this namespace with type "pais.vmware.com/api-token-credentials" that contains a key "api_token".

apiTokenRef LocalObjectReference

apiTokenRef may be used to provide a bearer token when making HTTPS requests to the backend.
It must name a Secret in this namespace with type "pais.vmware.com/api-token-credentials"
that contains a key "api_token".

BackendTLS

BackendTLS configures TLS when connecting to a backend

Appears In:

InferenceGatewayRouteBackend

Field Description Default Validation

Field	Description	Default	Validation
`verification` TLSVerification	verification determines how to validate the HTTPS connection to the backend. Add certificate authorities to the PAISConfiguration.spec.clientTls.caBundleRefs in this namespace so that they are trusted. strict does one-way TLS with full strict validation (default) caOnly does one-way TLS and validates the server certificate chain, but allows a mismatch in server name none does one-way TLS but does not validate the server certificate chain at all. It is insecure and should not be used in production. mutual should only be used for local models managed by this instance of Private AI Services. It does a full mutual-TLS handshake using system-managed certificates.		Enum: [strict caOnly none mutual]

verification TLSVerification

verification determines how to validate the HTTPS connection to the backend.

Add certificate authorities to the PAISConfiguration.spec.clientTls.caBundleRefs in this namespace
so that they are trusted.

strict does one-way TLS with full strict validation (default)
caOnly does one-way TLS and validates the server certificate chain, but allows a mismatch in server name
none does one-way TLS but does not validate the server certificate chain at all.
It is insecure and should not be used in production.
mutual should only be used for local models managed by this instance of Private AI Services.
It does a full mutual-TLS handshake using system-managed certificates.

Enum: [strict caOnly none mutual]

ChildStatus

ChildStatus is the schema for child resources of PAISConfiguration

Appears In:

ModelEndpointStatus
PAISConfigurationStatus

Field Description Default Validation

Field	Description	Default	Validation
`apiGroup` string	APIGroup is the group for the resource being referenced. If APIGroup is not specified, the specified Kind must be in the core API group. For any other third-party types, APIGroup is required.
`kind` string	Kind is the type of resource being referenced
`name` string	Name is the name of resource being referenced
`observedGeneration` integer	observedGeneration describes the generation of this child observed by the PAISConfiguration controller

apiGroup string

APIGroup is the group for the resource being referenced.
If APIGroup is not specified, the specified Kind must be in the core API group.
For any other third-party types, APIGroup is required.

kind string

Kind is the type of resource being referenced

name string

Name is the name of resource being referenced

observedGeneration integer

observedGeneration describes the generation of this child observed by the PAISConfiguration controller

ClientTLS

ClientTLS configures TLS/SSL clients used by Private AI Services to connect to remote services

Appears In:

PAISConfigurationSpec

Field Description Default Validation

Field	Description	Default	Validation
`caBundleRefs` LocalObjectReference array	caBundleRefs specifies what certificates Private AI Services will trust when connecting to remote servers over TLS. Elements of this list must name ConfigMaps in the current namespace, with a key ca.crt that contains PEM-encoded certificate bundle. In addition to being provided to PAIS pods, these CA bundles are included in VKS cluster nodes' osConfiguration, allowing them to be used to pull container images from private registries. Note that if this list of ConfigMaps is changed, the PAISConfiguration will be reconciled, and the changes will propagate to the VKS cluster node. However, simply modifying the contents of the ConfigMaps will not cause a reconciliation and the changes will not propagate. Be aware that changes to this list with an existing cluster will result in a rollout of the cluster, which may affect availability of single-replica ModelEndpoints.

caBundleRefs LocalObjectReference array

caBundleRefs specifies what certificates Private AI Services will trust when connecting to remote servers over TLS.

Elements of this list must name ConfigMaps in the current namespace,
with a key ca.crt that contains PEM-encoded certificate bundle.

In addition to being provided to PAIS pods, these CA bundles are included in VKS cluster
nodes' osConfiguration, allowing them to be used to pull container images from private registries.

Note that if this list of ConfigMaps is changed, the PAISConfiguration will be reconciled,
and the changes will propagate to the VKS cluster node. However, simply modifying the contents
of the ConfigMaps will not cause a reconciliation and the changes will not propagate.

Be aware that changes to this list with an existing cluster will result in a rollout
of the cluster, which may affect availability of single-replica ModelEndpoints.

DBPasswordRef

DBPasswordRef describes the database connection secret reference to connect to the database

Appears In:

Database

Field Description Default Validation

Field	Description	Default	Validation
`name` string	name of a Secret in this namespace		MinLength: 1
`fieldPath` string	fieldPath is the name of the key within the Secret containing the password. In addition to this key for the password, there should also be an additional key called "ca.crt" which contains the Certificate Authority to trust when verifying the TLS connection to the database.		MinLength: 1

name string

name of a Secret in this namespace

MinLength: 1

fieldPath string

fieldPath is the name of the key within the Secret containing the password.

In addition to this key for the password, there should also be
an additional key called "ca.crt" which contains the Certificate Authority
to trust when verifying the TLS connection to the database.

MinLength: 1

Database

Database describes the connection details for Private AI Services to connect to a Postgres database See https://gitlab-vmw.devops.broadcom.net/moneta/dsm-tsql-provisioner/-/blob/4c2d19b30bbd5da3857aa2cad93270cf336874b2/dsm-apis/api/databases/v1alpha1/database_common.go#L381

Appears In:

PAISConfigurationSpec

Field Description Default Validation

Field	Description	Default	Validation
`host` string	host is the network hostname of a PostgreSQL server to use		MinLength: 1
`port` integer	port is the TCP port to connect to on the database server If unset, the default Postgres port (5432) will be used	5432
`username` string	username to use when connecting to the database server		MinLength: 1
`passwordRef` DBPasswordRef	passwordRef is a reference to a Secret in this namespace containing the password for this database user
`dbname` string	dbname is the name of the logical database to use within the server		MinLength: 1
`sslMode` DatabaseSslMode	sslMode configures how to validate the connection with the database server		Enum: [VerifyFull VerifyCA Require Allow]

host string

host is the network hostname of a PostgreSQL server to use

MinLength: 1

port integer

port is the TCP port to connect to on the database server
If unset, the default Postgres port (5432) will be used

5432

username string

username to use when connecting to the database server

MinLength: 1

passwordRef DBPasswordRef

passwordRef is a reference to a Secret in this namespace containing the password for this database user

dbname string

dbname is the name of the logical database to use within the server

MinLength: 1

sslMode DatabaseSslMode

sslMode configures how to validate the connection with the database server

Enum: [VerifyFull VerifyCA Require Allow]

DatabaseSslMode

Underlying type: string

DatabaseSslMode describes how the Private AI Services instance validates the SSL connection to the database.

Validation:

Enum: [VerifyFull VerifyCA Require Allow]

Appears In:

Database

EnvVar

EnvVar represents an environment variable present in a Container. NOTE: We do not use corev1.EnvVar, as we cannot implement all sources of data that it supports in EnvVarSource (since we’d have to mount these values into the VKS cluster) NOTE: Immutability for both properties is already provided by the struct referencing this type. Adding the XValidation rules here as well exceeds complexity allowed by the k8s API

Appears In:

InferenceServerCustomization

Field Description Default Validation

Field	Description	Default	Validation
`name` string	name is the key for an environment variable override to be passed to the inference engine		MaxLength: 128 MinLength: 1
`value` string	value is the value for an environment variable override to be passed to the inference engine

name string

name is the key for an environment variable override to be passed to the inference engine

MaxLength: 128
MinLength: 1

value string

value is the value for an environment variable override to be passed to the inference engine

GPUDriverType

Underlying type: string

GPUDriverType defines types of GPU drivers

Validation:

Enum: [NVAIE OSS]

Appears In:

NvidiaGPURuntimeConfig

IndexingWorkersRuntimeConfig

IndexingWorkersRuntimeConfig defines configuration/tuning parameters for all Private AI Services indexing workers

Appears In:

RuntimeConfig

Field Description Default Validation

Field	Description	Default	Validation
`deployment` RuntimeDeploymentConfig	deployment specifies the desired configuration for the Private AI Services workers performing indexing tasks. Modify only if instructed by Broadcom support to increase indexing throughput.
`workerThreads` integer	workerThreads specifies the desired number of threads for each Private AI Services worker performing indexing tasks. Modify only if instructed by Broadcom support to increase indexing throughput.	10	Maximum: 100 Minimum: 1
`workerRateLimit` string	workerRateLimit specifies the desired rate-limit at which the Private AI Services workers performing indexing tasks pick up individual tasks for processing. Modify only if instructed by Broadcom support to increase indexing throughput.	100/s	MinLength: 1 Pattern: `^[1-9][0-9]*/[smh]`

deployment RuntimeDeploymentConfig

deployment specifies the desired configuration for the Private AI Services workers performing
indexing tasks. Modify only if instructed by Broadcom support to increase indexing
throughput.

workerThreads integer

workerThreads specifies the desired number of threads for each Private AI Services worker performing
indexing tasks. Modify only if instructed by Broadcom support to increase indexing
throughput.

Maximum: 100
Minimum: 1

workerRateLimit string

workerRateLimit specifies the desired rate-limit at which the Private AI Services workers performing
indexing tasks pick up individual tasks for processing. Modify only if instructed by
Broadcom support to increase indexing throughput.

100/s

MinLength: 1
Pattern: ^[1-9][0-9]*/[smh]

InferenceEngine

Underlying type: string

InferenceEngine describes the valid types of engines for ModelEndpoint.

Validation:

Enum: [Infinity vLLM LlamaCPP]

Appears In:

ModelEndpointSpec
ModelTypeWithEngine

InferenceGatewayRoute

InferenceGatewayRoute describes a routing rule for the Inference Gateway

Field Description Default Validation

apiVersion string

pais.vmware.com/v1alpha1

kind string

InferenceGatewayRoute

metadata ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

spec InferenceGatewayRouteSpec

status InferenceGatewayRouteStatus

MinProperties: 1

InferenceGatewayRouteBackend

InferenceGatewayRouteBackend describes a model running on an inference server either in this namespace or elsewhere.

Appears In:

InferenceGatewayRouteSpec

Field Description Default Validation

Field	Description	Default	Validation
`httpBaseUrl` string	httpBaseUrl defines the base url of the server hosting the model to use. For ModelEndpoints managed by the Private AI Services instance within this namespace the Private AI Services controller will set this to the name of the Kubernetes Service for that ModelEndpoint (which is itself managed by this instance of Private AI Services.) For ModelEndpoints managed by a Private AI Services instance in a different namespace, this field should be set based on that other ingress Service’s address, e.g. https://pais-ingress-default.other-namespace/api/v1/compatibility/openai Inspect the PAISConfiguration.status.ingressServiceRef for that other namespace to get the service name. To use a model hosted on a remote API (e.g. cloud hosted model), provide the base URL of that API, e.g. https://api.anthropic.com/ or https://api.openai.com/ Do not include the `/v1` suffix: Private AI Services will append that.		Format: uri MinLength: 1
`modelId` string	modelId defines the name of the model used inside a request sent to the inference server. For requests to ModelEndpoints managed by the Private AI Services instance in this namespace, this can be an arbitrary string as long as the inference server was started with this identifier (using `PAIH_MODEL_ID`). For models managed by a Private AI Services instance in another namespace, this must be the "routingName" of that remote ModelEndpoint. For a remote API (e.g. a cloud hosted model) this should be the "modelId" defined by that API.	pais	MinLength: 1
`tls` BackendTLS	tls configures transport level security for the HTTPS connection to this backend.	{ verification:strict }
`auth` BackendAuth	auth configures authentication to this backend

httpBaseUrl string

httpBaseUrl defines the base url of the server hosting the model to use.

For ModelEndpoints managed by the Private AI Services instance within this namespace
the Private AI Services controller will set this to the name of the Kubernetes Service
for that ModelEndpoint (which is itself managed by this instance of Private AI Services.)

For ModelEndpoints managed by a Private AI Services instance in a different namespace,
this field should be set based on that other ingress Service’s address, e.g.
https://pais-ingress-default.other-namespace/api/v1/compatibility/openai
Inspect the PAISConfiguration.status.ingressServiceRef for that other namespace to get the
service name.

To use a model hosted on a remote API (e.g. cloud hosted model), provide the base URL of
that API, e.g. https://api.anthropic.com/ or https://api.openai.com/

Do not include the /v1 suffix: Private AI Services will append that.

Format: uri
MinLength: 1

modelId string

modelId defines the name of the model used inside a request sent to the inference server.

For requests to ModelEndpoints managed by the Private AI Services instance in this namespace,
this can be an arbitrary string as long as the inference server was started with this identifier
(using PAIH_MODEL_ID).

For models managed by a Private AI Services instance in another namespace,
this must be the "routingName" of that remote ModelEndpoint.

For a remote API (e.g. a cloud hosted model) this should be the "modelId" defined by that API.

pais

MinLength: 1

tls BackendTLS

tls configures transport level security for the HTTPS connection to this backend.

{ verification:strict }

auth BackendAuth

auth configures authentication to this backend

InferenceGatewayRouteEngine

Underlying type: string

InferenceGatewayRouteEngine describes the valid types of engines for InferenceGatewayRoute.

Validation:

Enum: [Infinity vLLM LlamaCPP OpenAI]

Appears In:

InferenceGatewayRouteModelTypeWithEngine
InferenceGatewayRouteSpec

InferenceGatewayRouteMatches

InferenceGatewayRouteMatches describes the matching rules for a route

Appears In:

InferenceGatewayRouteSpec

Field Description Default Validation

Field	Description	Default	Validation
`routingName` RoutingName	routingName is the name that this namespace’s instance of Private AI Services will use to route requests to this model. It may be different than the backend modelId.		MaxLength: 253 MinLength: 1

routingName RoutingName

routingName is the name that this namespace’s instance of Private AI Services
will use to route requests to this model. It may be different than the backend modelId.

MaxLength: 253
MinLength: 1

InferenceGatewayRouteModelTypeWithEngine

InferenceGatewayRouteModelTypeWithEngine describes a type of model inference and an engine to use for that inference for InferenceGatewayRoute

Appears In:

InferenceGatewayRouteSpec

Field Description Default Validation

Field	Description	Default	Validation
`type` ModelType	type defines if the model is designed for completions or embeddings		Enum: [Completions Embeddings]
`engine` InferenceGatewayRouteEngine	engine describes what inference engine is running this model. For a remote model where you don’t know the engine, you may set "OpenAI" to treat it as a generic engine compatible with the OpenAI API.		Enum: [Infinity vLLM LlamaCPP OpenAI]

type ModelType

type defines if the model is designed for completions or embeddings

Enum: [Completions Embeddings]

engine InferenceGatewayRouteEngine

engine describes what inference engine is running this model.

For a remote model where you don’t know the engine, you may set "OpenAI"
to treat it as a generic engine compatible with the OpenAI API.

Enum: [Infinity vLLM LlamaCPP OpenAI]

InferenceGatewayRouteSpec

InferenceGatewayRouteSpec specifies the details of the routing rule

Appears In:

InferenceGatewayRoute

Field Description Default Validation

Field	Description	Validation
`type` ModelType	type defines if the model is designed for completions or embeddings	Enum: [Completions Embeddings]
`engine` InferenceGatewayRouteEngine	engine describes what inference engine is running this model. For a remote model where you don’t know the engine, you may set "OpenAI" to treat it as a generic engine compatible with the OpenAI API.	Enum: [Infinity vLLM LlamaCPP OpenAI]
`matches` InferenceGatewayRouteMatches	matches describes how traffic will get routed to this model by the local instance of Private AI Services
`backend` InferenceGatewayRouteBackend	backend describes where inference requests should be forwarded to

type ModelType

type defines if the model is designed for completions or embeddings

Enum: [Completions Embeddings]

engine InferenceGatewayRouteEngine

engine describes what inference engine is running this model.

For a remote model where you don’t know the engine, you may set "OpenAI"
to treat it as a generic engine compatible with the OpenAI API.

Enum: [Infinity vLLM LlamaCPP OpenAI]

matches InferenceGatewayRouteMatches

matches describes how traffic will get routed to this model by the local instance of Private AI Services

backend InferenceGatewayRouteBackend

backend describes where inference requests should be forwarded to

InferenceGatewayRouteStatus

InferenceGatewayRouteStatus reports the current status of an InferenceGatewayRoute

Validation:

MinProperties: 1

Appears In:

InferenceGatewayRoute

Field Description Default Validation

Field	Description	Default	Validation
`conditions` Condition array	conditions update as changes occur in the status.

conditions Condition array

conditions update as changes occur in the status.

InferenceServerCustomization

InferenceServerCustomization describes the extra customization that can be provided to the inference server

Appears In:

ModelEndpointSpec

Field Description Default Validation

Field	Description	Default	Validation
`cliArgs` string array	cliArgs describe additional command-line arguments to append when starting the inference engine
`envVars` EnvVar array	envVars describe additional environment variables to set when starting the inference engine		MaxItems: 1024
`engineImage` string	engineImage will override the inference server container image. This can allow use of an inference engine not included in this release of Private AI Services. But use this feature at your own risk. Broadcom cannot support arbitrary customer-provided engine images. In particular, be aware of potential version mismatches with node and host drivers. Also note that Private AI Services sets some command-line flags when running the inference server engine that may not work for you.		MinLength: 1
`engineImageCompressedSize` Quantity	engineImageCompressedSize should be set to the compressed size of the engineImage, if that field is set. The compressed size of an image is the sum of the layers, and is typically displayed on the web UI of container registries like Docker Hub. To find the compressed size of a container image you’ve pulled locally, run: docker manifest inspect vllm/vllm-openai:v0.9.1 \| jq '[.layers[].size] \| add' \| numfmt --to=iec-i This field is used when sizing the /var/lib/containerd mount on the VKS worker nodes hosting this ModelEndpoint. The formula for setting the full mount size may vary in the future. Currently we configure containerdMountSize = 32Gi + 3 * engineImageCompressedSize to account for other (non-engine) images, the compression ratio, and the fact that both compressed and uncompressed data is stored in /var/lib/containerd. In future versions of Private AI Services, this field may no longer be required and may be deprecated.	15Gi
`sharedMemoryMountSize` Quantity	sharedMemoryMountSize determines the size of the /dev/shm mount point inside the inference engine container. If unset, it will use the Kubernetes default of 64Mi.	64Mi
`tempMountSize` Quantity	tempMountSize determines the size in bytes of the /tmp mount point available to the inference server. If unset, it will default to 1Gi.	1Gi

cliArgs string array

cliArgs describe additional command-line arguments to append when starting the inference engine

envVars EnvVar array

envVars describe additional environment variables to set when starting the inference engine

MaxItems: 1024

engineImage string

engineImage will override the inference server container image.
This can allow use of an inference engine not included in this release of
Private AI Services. But use this feature at your own risk.
Broadcom cannot support arbitrary customer-provided engine images.
In particular, be aware of potential version mismatches with node and host drivers.
Also note that Private AI Services sets some command-line flags when running
the inference server engine that may not work for you.

MinLength: 1

engineImageCompressedSize Quantity

engineImageCompressedSize should be set to the compressed size of the engineImage, if that field is set.

The compressed size of an image is the sum of the layers, and is typically displayed on the web UI of container registries like Docker Hub.
To find the compressed size of a container image you’ve pulled locally, run:
docker manifest inspect vllm/vllm-openai:v0.9.1 | jq '[.layers[].size] | add' | numfmt --to=iec-i

This field is used when sizing the /var/lib/containerd mount on the VKS worker nodes hosting this ModelEndpoint.
The formula for setting the full mount size may vary in the future.
Currently we configure containerdMountSize = 32Gi + 3 * engineImageCompressedSize
to account for other (non-engine) images, the compression ratio, and the fact that
both compressed and uncompressed data is stored in /var/lib/containerd.

In future versions of Private AI Services, this field may no longer be required and may be deprecated.

15Gi

sharedMemoryMountSize Quantity

sharedMemoryMountSize determines the size of the /dev/shm mount point inside
the inference engine container. If unset, it will use the Kubernetes default of 64Mi.

64Mi

tempMountSize Quantity

tempMountSize determines the size in bytes of the /tmp mount point available to the inference server.
If unset, it will default to 1Gi.

1Gi

Ingress

Ingress defines the desired state for how the Private AI Services runtime is accessible in the cluster

Appears In:

PAISConfigurationSpec

Field Description Default Validation

Field	Description	Default	Validation
`serviceType` ServiceType	serviceType determines how the Private AI Services runtime will be exposed as a Kubernetes Service. Defaults to LoadBalancer — which will create a Service of type LoadBalancer. Valid options are ClusterIP, and LoadBalancer. See Service.Spec.Type for details.	LoadBalancer	Enum: [ClusterIP LoadBalancer]

serviceType ServiceType

serviceType determines how the Private AI Services runtime will be exposed as a Kubernetes Service.
Defaults to LoadBalancer — which will create a Service of type LoadBalancer.
Valid options are ClusterIP, and LoadBalancer. See Service.Spec.Type for details.

LoadBalancer

Enum: [ClusterIP LoadBalancer]

LLMTracesConfig

LLMTracesConfig defines the configuration for trace collection.

Appears In:

ObservabilitySpec

Field Description Default Validation

Field	Description	Validation
`endpoint` string	endpoint specifies the target URL or address for the OpenTelemetry backend where traces should be sent.	Format: uri MinLength: 1
`protocol` OpenTelemetryTransportProtocol	protocol specifies the OpenTelemetry transport protocol.	Enum: [grpc http/protobuf]
`projectName` string	projectName specifies the "openinference.project.name" resource attribute in accordance with the OpenInference specification.	MinLength: 1
`headersSecretRef` SecretKeySelector	headersSecretRef selects a key field within a Secret in this namespace that should contain HTTP headers to be sent to the OpenTelemetry backend. The value should be a semicolon-separated list of HTTP headers in the format "Header-Name=header-value". For example: Authorization=Bearer%20token123; X-Custom-Header=custom-value See https://www.w3.org/TR/baggage/#baggage-http-header-format

endpoint string

endpoint specifies the target URL or address for the OpenTelemetry backend
where traces should be sent.

Format: uri
MinLength: 1

protocol OpenTelemetryTransportProtocol

protocol specifies the OpenTelemetry transport protocol.

Enum: [grpc http/protobuf]

projectName string

projectName specifies the "openinference.project.name" resource attribute in accordance
with the OpenInference specification.

MinLength: 1

headersSecretRef SecretKeySelector

headersSecretRef selects a key field within a Secret in this namespace
that should contain HTTP headers to be sent to the OpenTelemetry backend.

The value should be a semicolon-separated list of HTTP headers
in the format "Header-Name=header-value".

For example:

Authorization=Bearer%20token123; X-Custom-Header=custom-value

See https://www.w3.org/TR/baggage/#baggage-http-header-format

ManualUpgradeStrategy

ManualUpgradeStrategy is currently unsupported

Appears In:

UpgradeStrategy

ModelEndpoint

ModelEndpoint is a request to serve an AI model using a particular engine on 1 or more VMs

Field Description Default Validation

apiVersion string

pais.vmware.com/v1alpha1

kind string

ModelEndpoint

metadata ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

spec ModelEndpointSpec

status ModelEndpointStatus

MinProperties: 1

ModelEndpointSpec

ModelEndpointSpec defines the desired state of ModelEndpoint

Appears In:

ModelEndpoint

Field Description Default Validation

Field	Description	Default	Validation
`type` ModelType	type defines if the model is designed for completions or embeddings		Enum: [Completions Embeddings]
`engine` InferenceEngine	engine describes what inferencing engine should be used when running a particular model		Enum: [Infinity vLLM LlamaCPP]
`model` ModelEndpointSpecModel	model describes the model which should be run for inference
`replicas` integer	replicas describes how many instances of this model should be running Note that currently, all replicas of a ModelEndpoint will run in a single vSphere Zone which may be customized by spec.failureDomain.	1	Minimum: 0
`routingName` RoutingName	routingName defines how this model will appear in the data plane API For example, the model "id" in /api/v1/compatibility/openai/v1/models and elsewhere in the data plane API		MaxLength: 253 MinLength: 1
`virtualMachineClassName` string	virtualMachineClassName specifies the virtual machine class to use for running this model endpoint This is used to create the virtual machine for a node pool in the VKS cluster. Note this value may interact with FailureDomain. Ensure your chosen FailureDomain has hardware sufficient to support this choice.		MinLength: 1
`storageClassName` string	storageClassName specifies the storage class to use for running this model endpoint This is used to create the virtual machine for a node pool in the VKS cluster		MinLength: 1
`failureDomain` string	failureDomain specifies the failure domain (vSphere Zone) to use for running this model endpoint. This string should be the metadata.name of a topology.tanzu.vmware.com Zone resource in this namespace. This is used to create the virtual machine(s) for the VKS node pool running this model. This choice of Zone may limit the available hardware (e.g. GPUs) available for this ModelEndpoint. See also spec.VirtualMachineClassName. Note that currently, all replicas of a ModelEndpoint will run in this 1 specified failureDomain. If this value is unset, the system will fallback to a default value, which may be customized via the special annotation "pais.vmware.com/default-failure-domain" on the PAISConfiguration resource in this namespace. That annotation value (if set) should be the metadata.name of a topology.tanzu.vmware.com Zone resource in this namespace. If unset, but Zones are in the namespce, then the PAISConfiguration controller will initialize that annotation on the PAISConfiguration. A user may modify it. ModelEndpoint spec.failureDomain will always take precedence over that annotation.		MinLength: 1
`inferenceServerCustomization` InferenceServerCustomization	inferenceServerCustomization describes additional customization that can be appended when starting the inference engine	{ }
`overrides` string	overrides is not yet implemented. Once implemented, it will enable a user to provide ytt overlays to customize the inference components, including the underlying node settings. Values provided here will take precedence over all other fields on this resource. Users can easily break their system by setting this field, and should be discouraged from using it unless necessary.

type ModelType

type defines if the model is designed for completions or embeddings

Enum: [Completions Embeddings]

engine InferenceEngine

engine describes what inferencing engine should be used when running a particular model

Enum: [Infinity vLLM LlamaCPP]

model ModelEndpointSpecModel

model describes the model which should be run for inference

replicas integer

replicas describes how many instances of this model should be running

Note that currently, all replicas of a ModelEndpoint will run in a single vSphere Zone
which may be customized by spec.failureDomain.

Minimum: 0

routingName RoutingName

routingName defines how this model will appear in the data plane API
For example, the model "id" in /api/v1/compatibility/openai/v1/models
and elsewhere in the data plane API

MaxLength: 253
MinLength: 1

virtualMachineClassName string

virtualMachineClassName specifies the virtual machine class to use for running this model endpoint
This is used to create the virtual machine for a node pool in the VKS cluster.

Note this value may interact with FailureDomain. Ensure your chosen FailureDomain has hardware
sufficient to support this choice.

MinLength: 1

storageClassName string

storageClassName specifies the storage class to use for running this model endpoint
This is used to create the virtual machine for a node pool in the VKS cluster

MinLength: 1

failureDomain string

failureDomain specifies the failure domain (vSphere Zone) to use for running this model endpoint.
This string should be the metadata.name of a topology.tanzu.vmware.com Zone resource in this namespace.
This is used to create the virtual machine(s) for the VKS node pool running this model.

This choice of Zone may limit the available hardware (e.g. GPUs) available for this ModelEndpoint.
See also spec.VirtualMachineClassName.

Note that currently, all replicas of a ModelEndpoint will run in this 1 specified failureDomain.

If this value is unset, the system will fallback to a default value, which may be customized via the special
annotation "pais.vmware.com/default-failure-domain" on the PAISConfiguration resource in this namespace.
That annotation value (if set) should be the metadata.name of a topology.tanzu.vmware.com Zone resource
in this namespace.
If unset, but Zones are in the namespce, then the PAISConfiguration controller will
initialize that annotation on the PAISConfiguration. A user may modify it.
ModelEndpoint spec.failureDomain will always take precedence over that annotation.

MinLength: 1

inferenceServerCustomization InferenceServerCustomization

inferenceServerCustomization describes additional customization that can be appended
when starting the inference engine

{ }

overrides string

overrides is not yet implemented.

Once implemented, it will enable a user to provide ytt overlays to customize the
inference components, including the underlying node settings.
Values provided here will take precedence over all other fields on this resource.
Users can easily break their system by setting this field, and should be discouraged from using it unless necessary.

ModelEndpointSpecModel

ModelEndpointSpecModel describes a model which should be run for inference

Appears In:

ModelEndpointSpec

Field Description Default Validation

ociRef string

ociRef is a reference to an OCI artifact containing the model to run for inference.
We expect that this artifact is pushed using the Private AI Services CLI (vcf pais models push …)

MaxLength: 1024
MinLength: 1

pullSecrets LocalObjectReference array

pullSecrets describe a list of references to Kubernetes secrets to use
when pulling this model from the remote OCI registry (e.g. Harbor).
The same secret is also available for pulling the engine image, e.g.
when setting spec.inferenceServerCustomization.engineImage to use an authenticated registry.
The secrets should have the same format as Pod spec.imagePullSecret
See: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/#registry-secret-existing-credentials

Note	Currently this list may contain at most 1 item. In the future, multiple secrets may be supported.

We recommend the engine image be hosted in a different repository than the model
although they may be in the same registry.
If using different registries, ensure the JSON "auths" block lists both.

MaxItems: 1

ModelEndpointStatus

ModelEndpointStatus defines the observed state of ModelEndpoint

Validation:

MinProperties: 1

Appears In:

ModelEndpoint

Field Description Default Validation

Field	Description	Validation
`conditions` Condition array	conditions update as changes occur in the status. They display the current state of the model endpoint
`observedGeneration` integer	observedGeneration describes the generation observed by the ModelEndpoint controller.
`children` ChildStatus array	children reports status information about child resources owned by this ModelEndpoint
`controllerVersion` string	controllerVersion reports a version string for the controller-manager which has most recently reconciled this resource	MinLength: 1

conditions Condition array

conditions update as changes occur in the status.
They display the current state of the model endpoint

observedGeneration integer

observedGeneration describes the generation observed by the ModelEndpoint controller.

children ChildStatus array

children reports status information about child resources owned by this ModelEndpoint

controllerVersion string

controllerVersion reports a version string for the controller-manager which has most recently reconciled this resource

MinLength: 1

ModelType

Underlying type: string

ModelType describes the valid types of models.

Validation:

Enum: [Completions Embeddings]

Appears In:

InferenceGatewayRouteModelTypeWithEngine
InferenceGatewayRouteSpec
ModelEndpointSpec
ModelTypeWithEngine

ModelTypeWithEngine

ModelTypeWithEngine describes a type of model inference and an engine to use for that inference for ModelEndpoint

Appears In:

ModelEndpointSpec

Field Description Default Validation

Field	Description	Default	Validation
`type` ModelType	type defines if the model is designed for completions or embeddings		Enum: [Completions Embeddings]
`engine` InferenceEngine	engine describes what inferencing engine should be used when running a particular model		Enum: [Infinity vLLM LlamaCPP]

type ModelType

type defines if the model is designed for completions or embeddings

Enum: [Completions Embeddings]

engine InferenceEngine

engine describes what inferencing engine should be used when running a particular model

Enum: [Infinity vLLM LlamaCPP]

NvidiaGPURuntimeConfig

NvidiaGPURuntimeConfig defines NVIDIA GPU Driver software configuration (e.g. license config, nvcr.io image pull secrets etc.)

Appears In:

PAISConfigurationSpec

Field Description Default Validation

Field	Description	Default	Validation
`gpuDriverType` GPUDriverType	gpuDriverType determines which type of NVIDIA GPU driver to use. Allowed values are NVAIE (default) and OSS. NVAIE supports both vGPU and passthrough devices, and requires a license key and pull secret from NVIDIA. In this mode, the GPU Operator will be configured to use the proprietary vGPU-capable driver. OSS supports passthrough devices (no vGPU support). In this case the licenseConfigRef must not be set. In this mode, the GPU Operator will use its default driver configuration, which pulls the open-source driver from public NVIDIA repositories. You can still customize the driver configuration using gpuOperatorOverridesRef to specify driver versions, repositories, etc.	NVAIE	Enum: [NVAIE OSS]
`licenseConfigRef` LocalObjectReference	licenseConfigRef names a ConfigMap in this namespace containing Nvidia license access token and Nvidia GRID configuration. ConfigMap data must have keys named `client_configuration_token.tok` and `gridd.conf`. This pull secret is only for the driver image pull, and not for other gpu-operator container images. This field is required when gpuDriverType is NVAIE or unset. If gpuDriverType is OSS, then this field must not be set.
`imagePullSecretRef` LocalObjectReference	imagePullSecretRef names a Secret in this namespace containing NGC access token ("personal key") to be used as ImagePullSecret for access to Nvidia GPU operator container images (e.g. gpu-operator, vGPU driver). Secret must be of type `kubernetes.io/dockerconfigjson`.
`gpuOperatorOverridesRef` LocalObjectReference	gpuOperatorOverridesRef names a ConfigMap in this namespace containing Helm chart values for Nvidia gpu-operator. ConfigMap data must have a key named `values.yaml` with content as overrides for gpu-operator Helm chart input parameters.

gpuDriverType GPUDriverType

gpuDriverType determines which type of NVIDIA GPU driver to use.

Allowed values are NVAIE (default) and OSS.

NVAIE supports both vGPU and passthrough devices, and requires a license key and pull secret from NVIDIA.
In this mode, the GPU Operator will be configured to use the proprietary vGPU-capable driver.

OSS supports passthrough devices (no vGPU support). In this case the licenseConfigRef must not be set.
In this mode, the GPU Operator will use its default driver configuration,
which pulls the open-source driver from public NVIDIA repositories.
You can still customize the driver configuration using gpuOperatorOverridesRef
to specify driver versions, repositories, etc.

NVAIE

Enum: [NVAIE OSS]

licenseConfigRef LocalObjectReference

licenseConfigRef names a ConfigMap in this namespace containing Nvidia license access token and Nvidia GRID configuration.
ConfigMap data must have keys named client_configuration_token.tok and gridd.conf.
This pull secret is only for the driver image pull, and not for other gpu-operator container images.
This field is required when gpuDriverType is NVAIE or unset.
If gpuDriverType is OSS, then this field must not be set.

imagePullSecretRef LocalObjectReference

imagePullSecretRef names a Secret in this namespace containing NGC access token ("personal key") to be used as ImagePullSecret for access to Nvidia GPU operator container images (e.g. gpu-operator, vGPU driver). Secret must be of type kubernetes.io/dockerconfigjson.

gpuOperatorOverridesRef LocalObjectReference

gpuOperatorOverridesRef names a ConfigMap in this namespace containing Helm chart values for Nvidia gpu-operator. ConfigMap data must have a key named values.yaml with content as overrides for gpu-operator Helm chart input parameters.

OIDC

OIDC describes the details of the upstream OIDC provider Private AI Services will use

Appears In:

Auth

Field Description Default Validation

Field	Description	Default	Validation
`issuerUrl` string	issuerUrl is the url of an OpenID provider endpoint that publishes the metadata for clients to use to construct a request to an OpenID server. NOTE: The provider must support the discovery endpoint in the form of IssuerUrl + "/.well-known/openid-configuration"		Format: uri MinLength: 1
`scope` string array	scope defines what scopes are requested when initiating the auth flow.
`clientId` string	clientId is the client ID used for communicating with the OIDC provider. The client must be configured to support authorization flows with PKCE (Proof Key for Code Exchange) for this Private AI Services instance.		MinLength: 1
`extraAudiences` string array	extraAudiences allows for additional OAuth2.0 client that Private AI Services will accept when validating the Access Token.
`groupsClaim` string	groupsClaim is an OIDC claim that the Private AI Services runtime will expect to exist in the clientId Token. This group claim should map to the groups claim for your upstream OIDC provider.	groups	MinLength: 1
`authorizedGroups` string array	authorizedGroups is the list of group names that are used to authorize user access to this Private AI Services instance. If a non-empty list is provided, a successfully authenticated user must be in at least one of the provided groups to be granted access. If an empty list is provided, no group membership is required and any authenticated user is granted access.

issuerUrl string

issuerUrl is the url of an OpenID provider endpoint that publishes the metadata for clients to use to construct a request to an OpenID server.
NOTE: The provider must support the discovery endpoint in the form of IssuerUrl + "/.well-known/openid-configuration"

Format: uri
MinLength: 1

scope string array

scope defines what scopes are requested when initiating the auth flow.

clientId string

clientId is the client ID used for communicating with the OIDC provider. The client must
be configured to support authorization flows with PKCE (Proof Key for Code Exchange) for
this Private AI Services instance.

MinLength: 1

extraAudiences string array

extraAudiences allows for additional OAuth2.0 client that Private AI Services will accept when validating the Access Token.

groupsClaim string

groupsClaim is an OIDC claim that the Private AI Services runtime will expect to exist in the clientId Token.
This group claim should map to the groups claim for your upstream OIDC provider.

groups

MinLength: 1

authorizedGroups string array

authorizedGroups is the list of group names that are used to authorize user access
to this Private AI Services instance. If a non-empty list is provided, a successfully
authenticated user must be in at least one of the provided groups to be granted access.
If an empty list is provided, no group membership is required and any authenticated
user is granted access.

ObservabilitySpec

ObservabilitySpec configures observability features

Validation:

MinProperties: 1

Appears In:

PAISConfigurationSpec

Field Description Default Validation

Field	Description	Default	Validation
`prometheusRuntime` PrometheusRuntimeConfig	prometheusRuntime deploys and configures additional components to collect metrics from the Private AI Services instance in this namespace. If this value is unset, metrics collection is disabled and those additional components will not be deployed.
`llmTraces` LLMTracesConfig	llmTraces configures collection of traces from LLM components and forwarding to an OpenTelemetry backend. If unset, trace collection is disabled.

prometheusRuntime PrometheusRuntimeConfig

prometheusRuntime deploys and configures additional components to collect metrics from
the Private AI Services instance in this namespace.
If this value is unset, metrics collection is disabled and those additional components will not be deployed.

llmTraces LLMTracesConfig

llmTraces configures collection of traces from LLM components
and forwarding to an OpenTelemetry backend.
If unset, trace collection is disabled.

OpenTelemetryTransportProtocol

Underlying type: string

OpenTelemetryTransportProtocol defines the transport protocol for OpenTelemetry export. See https://opentelemetry.io/docs/specs/otlp

Validation:

Enum: [grpc http/protobuf]

Appears In:

LLMTracesConfig

PAISConfiguration

PAISConfiguration is the Schema for the paisconfigurations API

Field Description Default Validation

apiVersion string

pais.vmware.com/v1alpha1

kind string

PAISConfiguration

metadata ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

spec PAISConfigurationSpec

status PAISConfigurationStatus

MinProperties: 1

PAISConfigurationSpec

PAISConfigurationSpec defines the desired state of the Private AI Services instance within this namespace

Appears In:

PAISConfiguration

Field Description Default Validation

Field	Description	Default	Validation
`clientTls` ClientTLS	clientTls configures TLS/SSL clients used by Private AI Services to connect to remote services
`worker` WorkerConfig	worker sets configuration for the Private AI Services Data Indexing and Retrieval workers
`database` Database	database defines the database connection parameters that the Hub API will use
`auth` Auth	auth defines the authentication configuration for the Private AI Services API and UI
`ingress` Ingress	ingress defines how the Private AI Services will be accessible If unset, a Service of type LoadBalancer will be created to expose this Private AI Services instance	{ serviceType:LoadBalancer }
`vksControlPlane` VKSControlPlaneConfig	vksControlPlane specifies attributes of PAIS-managed vSphere Kubernetes Service cluster control plane
`nvidiaConfig` NvidiaGPURuntimeConfig	nvidiaConfig defines pointers to configuration needed for Nvidia NVAIE software
`runtimeConfig` RuntimeConfig	runtimeConfig defines the desired state for the Private AI Services runtime configuration
`upgradeStrategy` UpgradeStrategy	upgradeStrategy defines how this instance of Private AI Services will be upgraded when a new version is installed into this Supervisor. Currently only the "automatic" strategy is supported, which means that this instance will be automatically upgraded to the latest version installed into this Supervisor. Future versions of Private AI Services will provide namespace-users more control over this.	{ automatic:map[] }
`observability` ObservabilitySpec	observability configures metrics and trace collection for this instance of Private AI Services. If unset, metrics and trace features are disabled for components in this namespace.		MinProperties: 1
`defaultStorageClassName` string	defaultStorageClassName sets the storage class used by components of this instance of Private AI Services. New deployments should set this field (it is optional only for backwards compatibility).		MinLength: 1

clientTls ClientTLS

clientTls configures TLS/SSL clients used by Private AI Services to connect to remote services

worker WorkerConfig

worker sets configuration for the Private AI Services Data Indexing and Retrieval workers

database Database

database defines the database connection parameters that the Hub API will use

auth Auth

auth defines the authentication configuration for the Private AI Services API and UI

ingress Ingress

ingress defines how the Private AI Services will be accessible
If unset, a Service of type LoadBalancer will be created to expose this Private AI Services instance

{ serviceType:LoadBalancer }

vksControlPlane VKSControlPlaneConfig

vksControlPlane specifies attributes of PAIS-managed vSphere Kubernetes Service cluster control plane

nvidiaConfig NvidiaGPURuntimeConfig

nvidiaConfig defines pointers to configuration needed for Nvidia NVAIE software

runtimeConfig RuntimeConfig

runtimeConfig defines the desired state for the Private AI Services runtime configuration

upgradeStrategy UpgradeStrategy

upgradeStrategy defines how this instance of Private AI Services will be upgraded
when a new version is installed into this Supervisor.

Currently only the "automatic" strategy is supported, which means that this instance will be
automatically upgraded to the latest version installed into this Supervisor.
Future versions of Private AI Services will provide namespace-users more control over this.

{ automatic:map[] }

observability ObservabilitySpec

observability configures metrics and trace collection for this instance of Private AI Services.
If unset, metrics and trace features are disabled for components in this namespace.

MinProperties: 1

defaultStorageClassName string

defaultStorageClassName sets the storage class used by components of this instance of Private AI Services.
New deployments should set this field (it is optional only for backwards compatibility).

MinLength: 1

PAISConfigurationStatus

PAISConfigurationStatus defines the observed state of PAISConfiguration

Validation:

MinProperties: 1

Appears In:

PAISConfiguration

Field Description Default Validation

Field	Description	Validation
`conditions` Condition array	conditions update as changes occur to this resource
`ingressServiceRef` LocalObjectReference	ingressServiceRef references the Service where this instance of Private AI Services is reachable
`observedGeneration` integer	observedGeneration describes the generation of this resource observed by the PAISConfiguration controller.
`children` ChildStatus array	children reports status information about child resources owned by this PAISConfiguration
`controllerVersion` string	controllerVersion reports a version string for the controller-manager which has most recently reconciled this resource	MinLength: 1

conditions Condition array

conditions update as changes occur to this resource

ingressServiceRef LocalObjectReference

ingressServiceRef references the Service where this instance of Private AI Services is reachable

observedGeneration integer

observedGeneration describes the generation of this resource observed by the PAISConfiguration controller.

children ChildStatus array

children reports status information about child resources owned by this PAISConfiguration

controllerVersion string

controllerVersion reports a version string for the controller-manager which has most recently reconciled this resource

MinLength: 1

PrometheusRuntimeConfig

PrometheusRuntimeConfig defines the desired state for the metrics collection configuration.

Appears In:

ObservabilitySpec

Field Description Default Validation

Field	Description	Default	Validation
`metricsRetention` string	metricsRetention defines limit on how long (in number of days) to keep observability metrics data.	90d	Pattern: `^[1-9][0-9]*[d]$`
`storageClassName` string	storageClassName can be used to customize the storage class for metrics data. If unset, spec.defaultStorageClassName will be used instead.		MinLength: 1
`prometheusOverridesRef` LocalObjectReference	prometheusOverridesRef is not yet supported. Once support is in place it, can names a ConfigMap in this namespace with configuration overrides for Prometheus components deployed to support metrics colletion. This ConfigMap data must have a key named `values.yaml` with content as overrides for Prometheus input parameters. This field is usually not expected to be set and should be used only for advanced use cases where Prometheus configuration choices made by PAIS are insufficient or undesirable.

metricsRetention string

metricsRetention defines limit on how long (in number of days) to keep observability metrics data.

90d

Pattern: ^[1-9][0-9]*[d]$

storageClassName string

storageClassName can be used to customize the storage class for metrics data.
If unset, spec.defaultStorageClassName will be used instead.

MinLength: 1

prometheusOverridesRef LocalObjectReference

prometheusOverridesRef is not yet supported.
Once support is in place it, can names a ConfigMap in this namespace with configuration overrides for
Prometheus components deployed to support metrics colletion. This ConfigMap data must have a key
named values.yaml with content as overrides for Prometheus input parameters. This field is usually
not expected to be set and should be used only for advanced use cases where Prometheus configuration
choices made by PAIS are insufficient or undesirable.

RoutingName

Underlying type: string

RoutingName a user-readable string to route to a particular model

Validation:

MaxLength: 253
MinLength: 1

Appears In:

InferenceGatewayRouteMatches
ModelEndpointSpec

RuntimeConfig

RuntimeConfig defines configuration/tuning parameters for all Private AI Services components

Appears In:

PAISConfigurationSpec

Field Description Default Validation

Field	Description	Validation
`logVerbosity` RuntimeConfigLogVerbosity	logVerbosity specifies the desired log verbosity used for all Private AI Services components. Modify only if instructed by Broadcom support to collected detailed error logs.	Enum: [Debug Info Warning Error]
`apiRuntimeConfig` APIRuntimeConfig	apiRuntimeConfig specifies the desired state of the configuration for the Private AI Services API
`indexingWorkersRuntimeConfig` IndexingWorkersRuntimeConfig	indexingWorkersRuntimeConfig specifies the desired state of the configuration for the Private AI Services Indexing workers

logVerbosity RuntimeConfigLogVerbosity

logVerbosity specifies the desired log verbosity used for all Private AI Services components.
Modify only if instructed by Broadcom support to collected detailed error logs.

Enum: [Debug Info Warning Error]

apiRuntimeConfig APIRuntimeConfig

apiRuntimeConfig specifies the desired state of the configuration for the Private AI Services API

indexingWorkersRuntimeConfig IndexingWorkersRuntimeConfig

indexingWorkersRuntimeConfig specifies the desired state of the configuration for the
Private AI Services Indexing workers

RuntimeConfigLogVerbosity

Underlying type: string

RuntimeConfigLogVerbosity describes the valid types of log levels of the Private AI Services API.

Validation:

Enum: [Debug Info Warning Error]

Appears In:

RuntimeConfig

RuntimeDeploymentConfig

RuntimeDeploymentConfig defines configuration/tuning parameters for a Private AI Services component

Appears In:

APIRuntimeConfig
IndexingWorkersRuntimeConfig

Field Description Default Validation

Field	Description	Default	Validation
`replicas` integer	replicas specifies the desired replicas for the Private AI Services component.	1	Minimum: 1
`resources` ResourceRequirements	resources describes the desired compute resource requirements for the Private AI Services component.

replicas integer

replicas specifies the desired replicas for the Private AI Services component.

Minimum: 1

resources ResourceRequirements

resources describes the desired compute resource requirements for the
Private AI Services component.

TLSVerification

Underlying type: string

TLSVerification describes how to verify TLS connections to the backend

Validation:

Enum: [strict caOnly none mutual]

Appears In:

BackendTLS

UpgradeStrategy

UpgradeStrategy defines how this instance of Private AI Services will be upgraded

Appears In:

PAISConfigurationSpec

Field Description Default Validation

Field	Description	Default	Validation
`automatic` AutomaticUpgradeStrategy	automatic means the Supervisor Service will manage upgrades of this instance of Private AI Services
`manual` ManualUpgradeStrategy	manual is currently unsupported

automatic AutomaticUpgradeStrategy

automatic means the Supervisor Service will manage upgrades of this instance of Private AI Services

manual ManualUpgradeStrategy

manual is currently unsupported

VKSControlPlaneConfig

VKSControlPlaneConfig defines configuration for the control plane of the VKS cluster used by Private AI Services.

Appears In:

PAISConfigurationSpec

Field Description Default Validation

Field	Description	Validation
`virtualMachineClassName` string	virtualMachineClassName specifies the virtual machine class to use for the control plane node of the VKS cluster. Note this does not affect the virtual machine class used to run ModelEndpoint inference servers.	MinLength: 1
`storageClassName` string	storageClassName can be used to customize the storage of the control plane node of the VKS cluster used by Private AI Services. If unset, spec.defaultStorageClassName will be used instead. Note this does not affect the storage used for ModelEndpoint inference servers.	MinLength: 1
`upgradeStrategy` VKSUpgradeStrategy	upgradeStrategy configures how the backing VKS cluster is upgraded. Currently, only the Manual strategy is available. In future releases, other upgrade strategies will be available. If unset, the cluster upgrade strategy will not be under user control. In the current version of Private AI Services, an unset value is equivalent to a Manual strategy using a particular VKS version known to Private AI Services at the time it was released. Future versions of Private AI Services may change the default behavior for this field, e.g. to automatically upgrade the VKS cluster to the latest available patch release.

virtualMachineClassName string

virtualMachineClassName specifies the virtual machine class to use for the control plane node of the VKS cluster.
Note this does not affect the virtual machine class used to run ModelEndpoint inference servers.

MinLength: 1

storageClassName string

storageClassName can be used to customize the storage of the control plane node of the
VKS cluster used by Private AI Services.
If unset, spec.defaultStorageClassName will be used instead.
Note this does not affect the storage used for ModelEndpoint inference servers.

MinLength: 1

upgradeStrategy VKSUpgradeStrategy

upgradeStrategy configures how the backing VKS cluster is upgraded.
Currently, only the Manual strategy is available.
In future releases, other upgrade strategies will be available.

If unset, the cluster upgrade strategy will not be under user control.

In the current version of Private AI Services, an unset value is equivalent to a Manual strategy using
a particular VKS version known to Private AI Services at the time it was released.

Future versions of Private AI Services may change the default behavior for this field, e.g. to automatically
upgrade the VKS cluster to the latest available patch release.

VKSManualUpgradeStrategy

VKSManualUpgradeStrategy describes the manual upgrade strategy for upgrading the VKS cluster

Appears In:

VKSUpgradeStrategy

Field Description Default Validation

Field	Description	Default	Validation
`version` string	version specifies the Kubernetes Release version to use for the VKS cluster hosting ModelEndpoints. A valid version string is formatted like "v1.32.0+vmware.6-fips-vkr.2" To list versions available in your Supervisor cluster, run kubectl get kubernetesreleases \| grep v1.3 Changing this field will cause temporary downtime for ModelEndpoints with only 1 replica. This is because the VKS Cluster is deployed with maxUnavailable: 1 in order to not require customers to over-provision scarce GPUs. Please test changes in a non-production environment, and plan for an outage window. Only use release versions which have been documented as supported for PAIS. Warning: Other version strings, such as "v1.31" may be accepted by the API, but could result in unexpected behavior, including unexpected downtime for ModelEndpoints at arbitrary times in the future (not just when this field is changed). This issue will be fixed in a future version. It is also possible to specify the version by setting configuration on the PAIS supervisor service when it is installed, via the key "supervisor_service.vks.version" If both fields are set, then this namespace-scoped custom-resource has precedence over the supervisor-service configuration.		MinLength: 1 Pattern: `^v([0-9]+)\.([0-9]+)[\.0-9a-zA-Z\-\+]+$`

version string

version specifies the Kubernetes Release version to use for the VKS cluster hosting ModelEndpoints.

A valid version string is formatted like "v1.32.0+vmware.6-fips-vkr.2"

To list versions available in your Supervisor cluster, run
kubectl get kubernetesreleases | grep v1.3

Changing this field will cause temporary downtime for ModelEndpoints with only 1 replica.
This is because the VKS Cluster is deployed with maxUnavailable: 1 in order to not require
customers to over-provision scarce GPUs.

Please test changes in a non-production environment, and plan for an outage window.

Only use release versions which have been documented as supported for PAIS.

Warning: Other version strings, such as "v1.31" may be accepted by the API,
but could result in unexpected behavior, including unexpected downtime for ModelEndpoints
at arbitrary times in the future (not just when this field is changed).
This issue will be fixed in a future version.

It is also possible to specify the version by setting configuration on the PAIS
supervisor service when it is installed, via the key "supervisor_service.vks.version"
If both fields are set, then this namespace-scoped custom-resource has precedence over
the supervisor-service configuration.

MinLength: 1
Pattern: ^v([0-9]+)\.([0-9]+)[\.0-9a-zA-Z\-\+]+$

VKSUpgradeStrategy

VKSUpgradeStrategy describes the strategy for upgrading the VKS cluster

Appears In:

VKSControlPlaneConfig

Field Description Default Validation

Field	Description	Default	Validation
`manual` VKSManualUpgradeStrategy	manual upgrade strategy means that the VKS Kubernetes release version is set by the Version field.

manual VKSManualUpgradeStrategy

manual upgrade strategy means that the VKS Kubernetes release version is set by
the Version field.

WorkerConfig

WorkerConfig defines the configuration for Data Indexing and Retrieval workers

Appears In:

PAISConfigurationSpec

Field Description Default Validation

Field	Description	Default	Validation
`storageClassName` string	storageClassName is now deprecated. Deprecated: Set the top-level spec.defaultStorageClassName instead. As of Private AI Services 2.1, Data Indexing workers use ephemeral storage (emptyDir volumes). However, the Valkey storage will fall back to this value if spec.defaultStorageClassName is unset.		MinLength: 1

storageClassName string

storageClassName is now deprecated.

Deprecated: Set the top-level spec.defaultStorageClassName instead.

As of Private AI Services 2.1, Data Indexing workers use ephemeral storage (emptyDir volumes).
However, the Valkey storage will fall back to this value if spec.defaultStorageClassName is unset.

MinLength: 1