API Reference

API Reference

API Reference

pais.vmware.com/v1alpha1

Package v1alpha1 contains API Schema definitions for the pais v1alpha1 API group

APIRuntimeConfig

APIRuntimeConfig defines configuration/tuning parameters for all Private AI Services API

Appears In:
Field Description Default Validation

deployment specifies the desired configuration for the Private AI Services API. Modify only if
instructed by Broadcom support to increase concurrent requests.

Auth

Auth describes the authentication backend for Private AI Service

Appears In:
Field Description Default Validation

oidc OIDC

oidc defines the OpenID Connect connection details that the Private AI Services UI will use to authenticate users

AutomaticUpgradeStrategy

AutomaticUpgradeStrategy means the Supervisor Service will manage upgrades

Appears In:

BackendAuth

BackendAuth configures authentication when connecting to a backend

Field Description Default Validation

apiTokenRef LocalObjectReference

apiTokenRef may be used to provide a bearer token when making HTTPS requests to the backend.
It must name a Secret in this namespace with type "pais.vmware.com/api-token-credentials"
that contains a key "api_token".

BackendTLS

BackendTLS configures TLS when connecting to a backend

Field Description Default Validation

verification TLSVerification

verification determines how to validate the HTTPS connection to the backend.

Add certificate authorities to the PAISConfiguration.spec.clientTls.caBundleRefs in this namespace
so that they are trusted.

  • strict does one-way TLS with full strict validation (default)

  • caOnly does one-way TLS and validates the server certificate chain, but allows a mismatch in server name

  • none does one-way TLS but does not validate the server certificate chain at all.
    It is insecure and should not be used in production.

  • mutual should only be used for local models managed by this instance of Private AI Services.
    It does a full mutual-TLS handshake using system-managed certificates.

Enum: [strict caOnly none mutual]

ChildStatus

ChildStatus is the schema for child resources of PAISConfiguration

Field Description Default Validation

apiGroup string

APIGroup is the group for the resource being referenced.
If APIGroup is not specified, the specified Kind must be in the core API group.
For any other third-party types, APIGroup is required.

kind string

Kind is the type of resource being referenced

name string

Name is the name of resource being referenced

observedGeneration integer

observedGeneration describes the generation of this child observed by the PAISConfiguration controller

ClientTLS

ClientTLS configures TLS/SSL clients used by Private AI Services to connect to remote services

Appears In:
Field Description Default Validation

caBundleRefs LocalObjectReference array

caBundleRefs specifies what certificates Private AI Services will trust when connecting to remote servers over TLS.

Elements of this list must name ConfigMaps in the current namespace,
with a key ca.crt that contains PEM-encoded certificate bundle.

In addition to being provided to PAIS pods, these CA bundles are included in VKS cluster
nodes' osConfiguration, allowing them to be used to pull container images from private registries.

Note that if this list of ConfigMaps is changed, the PAISConfiguration will be reconciled,
and the changes will propagate to the VKS cluster node. However, simply modifying the contents
of the ConfigMaps will not cause a reconciliation and the changes will not propagate.

Be aware that changes to this list with an existing cluster will result in a rollout
of the cluster, which may affect availability of single-replica ModelEndpoints.

DBPasswordRef

DBPasswordRef describes the database connection secret reference to connect to the database

Appears In:
Field Description Default Validation

name string

name of a Secret in this namespace

MinLength: 1

fieldPath string

fieldPath is the name of the key within the Secret containing the password.

In addition to this key for the password, there should also be
an additional key called "ca.crt" which contains the Certificate Authority
to trust when verifying the TLS connection to the database.

MinLength: 1

Database

Appears In:
Field Description Default Validation

host string

host is the network hostname of a PostgreSQL server to use

MinLength: 1

port integer

port is the TCP port to connect to on the database server
If unset, the default Postgres port (5432) will be used

5432

username string

username to use when connecting to the database server

MinLength: 1

passwordRef DBPasswordRef

passwordRef is a reference to a Secret in this namespace containing the password for this database user

dbname string

dbname is the name of the logical database to use within the server

MinLength: 1

sslMode DatabaseSslMode

sslMode configures how to validate the connection with the database server

Enum: [VerifyFull VerifyCA Require Allow]

DatabaseSslMode

Underlying type: string

DatabaseSslMode describes how the Private AI Services instance validates the SSL connection to the database.

Validation:
  • Enum: [VerifyFull VerifyCA Require Allow]

Appears In:

EnvVar

EnvVar represents an environment variable present in a Container. NOTE: We do not use corev1.EnvVar, as we cannot implement all sources of data that it supports in EnvVarSource (since we’d have to mount these values into the VKS cluster) NOTE: Immutability for both properties is already provided by the struct referencing this type. Adding the XValidation rules here as well exceeds complexity allowed by the k8s API

Field Description Default Validation

name string

name is the key for an environment variable override to be passed to the inference engine

MaxLength: 128
MinLength: 1

value string

value is the value for an environment variable override to be passed to the inference engine

GPUDriverType

Underlying type: string

GPUDriverType defines types of GPU drivers

Validation:
  • Enum: [NVAIE OSS]

IndexingWorkersRuntimeConfig

IndexingWorkersRuntimeConfig defines configuration/tuning parameters for all Private AI Services indexing workers

Appears In:
Field Description Default Validation

deployment specifies the desired configuration for the Private AI Services workers performing
indexing tasks. Modify only if instructed by Broadcom support to increase indexing
throughput.

workerThreads integer

workerThreads specifies the desired number of threads for each Private AI Services worker performing
indexing tasks. Modify only if instructed by Broadcom support to increase indexing
throughput.

10

Maximum: 100
Minimum: 1

workerRateLimit string

workerRateLimit specifies the desired rate-limit at which the Private AI Services workers performing
indexing tasks pick up individual tasks for processing. Modify only if instructed by
Broadcom support to increase indexing throughput.

100/s

MinLength: 1
Pattern: ^[1-9][0-9]*/[smh]

InferenceEngine

Underlying type: string

InferenceEngine describes the valid types of engines for ModelEndpoint.

Validation:
  • Enum: [Infinity vLLM LlamaCPP]

InferenceGatewayRoute

InferenceGatewayRoute describes a routing rule for the Inference Gateway

Field Description Default Validation

apiVersion string

pais.vmware.com/v1alpha1

kind string

InferenceGatewayRoute

metadata ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

MinProperties: 1

InferenceGatewayRouteBackend

InferenceGatewayRouteBackend describes a model running on an inference server either in this namespace or elsewhere.

Field Description Default Validation

httpBaseUrl string

httpBaseUrl defines the base url of the server hosting the model to use.

For ModelEndpoints managed by the Private AI Services instance within this namespace
the Private AI Services controller will set this to the name of the Kubernetes Service
for that ModelEndpoint (which is itself managed by this instance of Private AI Services.)

For ModelEndpoints managed by a Private AI Services instance in a different namespace,
this field should be set based on that other ingress Service’s address, e.g.
https://pais-ingress-default.other-namespace/api/v1/compatibility/openai
Inspect the PAISConfiguration.status.ingressServiceRef for that other namespace to get the
service name.

To use a model hosted on a remote API (e.g. cloud hosted model), provide the base URL of
that API, e.g. https://api.anthropic.com/ or https://api.openai.com/

Do not include the /v1 suffix: Private AI Services will append that.

Format: uri
MinLength: 1

modelId string

modelId defines the name of the model used inside a request sent to the inference server.

For requests to ModelEndpoints managed by the Private AI Services instance in this namespace,
this can be an arbitrary string as long as the inference server was started with this identifier
(using PAIH_MODEL_ID).

For models managed by a Private AI Services instance in another namespace,
this must be the "routingName" of that remote ModelEndpoint.

For a remote API (e.g. a cloud hosted model) this should be the "modelId" defined by that API.

pais

MinLength: 1

tls configures transport level security for the HTTPS connection to this backend.

{ verification:strict }

auth configures authentication to this backend

InferenceGatewayRouteEngine

Underlying type: string

InferenceGatewayRouteEngine describes the valid types of engines for InferenceGatewayRoute.

Validation:
  • Enum: [Infinity vLLM LlamaCPP OpenAI]

InferenceGatewayRouteMatches

InferenceGatewayRouteMatches describes the matching rules for a route

Field Description Default Validation

routingName RoutingName

routingName is the name that this namespace’s instance of Private AI Services
will use to route requests to this model. It may be different than the backend modelId.

MaxLength: 253
MinLength: 1

InferenceGatewayRouteModelTypeWithEngine

InferenceGatewayRouteModelTypeWithEngine describes a type of model inference and an engine to use for that inference for InferenceGatewayRoute

Field Description Default Validation

type ModelType

type defines if the model is designed for completions or embeddings

Enum: [Completions Embeddings]

engine describes what inference engine is running this model.

For a remote model where you don’t know the engine, you may set "OpenAI"
to treat it as a generic engine compatible with the OpenAI API.

Enum: [Infinity vLLM LlamaCPP OpenAI]

InferenceGatewayRouteSpec

InferenceGatewayRouteSpec specifies the details of the routing rule

Appears In:
Field Description Default Validation

type ModelType

type defines if the model is designed for completions or embeddings

Enum: [Completions Embeddings]

engine describes what inference engine is running this model.

For a remote model where you don’t know the engine, you may set "OpenAI"
to treat it as a generic engine compatible with the OpenAI API.

Enum: [Infinity vLLM LlamaCPP OpenAI]

matches describes how traffic will get routed to this model by the local instance of Private AI Services

backend describes where inference requests should be forwarded to

InferenceGatewayRouteStatus

InferenceGatewayRouteStatus reports the current status of an InferenceGatewayRoute

Validation:
  • MinProperties: 1

Appears In:
Field Description Default Validation

conditions Condition array

conditions update as changes occur in the status.

InferenceServerCustomization

InferenceServerCustomization describes the extra customization that can be provided to the inference server

Appears In:
Field Description Default Validation

cliArgs string array

cliArgs describe additional command-line arguments to append when starting the inference engine

envVars EnvVar array

envVars describe additional environment variables to set when starting the inference engine

MaxItems: 1024

engineImage string

engineImage will override the inference server container image.
This can allow use of an inference engine not included in this release of
Private AI Services. But use this feature at your own risk.
Broadcom cannot support arbitrary customer-provided engine images.
In particular, be aware of potential version mismatches with node and host drivers.
Also note that Private AI Services sets some command-line flags when running
the inference server engine that may not work for you.

MinLength: 1

engineImageCompressedSize Quantity

engineImageCompressedSize should be set to the compressed size of the engineImage, if that field is set.

The compressed size of an image is the sum of the layers, and is typically displayed on the web UI of container registries like Docker Hub.
To find the compressed size of a container image you’ve pulled locally, run:
docker manifest inspect vllm/vllm-openai:v0.9.1 | jq '[.layers[].size] | add' | numfmt --to=iec-i

This field is used when sizing the /var/lib/containerd mount on the VKS worker nodes hosting this ModelEndpoint.
The formula for setting the full mount size may vary in the future.
Currently we configure containerdMountSize = 32Gi + 3 * engineImageCompressedSize
to account for other (non-engine) images, the compression ratio, and the fact that
both compressed and uncompressed data is stored in /var/lib/containerd.

In future versions of Private AI Services, this field may no longer be required and may be deprecated.

15Gi

sharedMemoryMountSize Quantity

sharedMemoryMountSize determines the size of the /dev/shm mount point inside
the inference engine container. If unset, it will use the Kubernetes default of 64Mi.

64Mi

tempMountSize Quantity

tempMountSize determines the size in bytes of the /tmp mount point available to the inference server.
If unset, it will default to 1Gi.

1Gi

Ingress

Ingress defines the desired state for how the Private AI Services runtime is accessible in the cluster

Appears In:
Field Description Default Validation

serviceType ServiceType

serviceType determines how the Private AI Services runtime will be exposed as a Kubernetes Service.
Defaults to LoadBalancer — which will create a Service of type LoadBalancer.
Valid options are ClusterIP, and LoadBalancer. See Service.Spec.Type for details.

LoadBalancer

Enum: [ClusterIP LoadBalancer]

LLMTracesConfig

LLMTracesConfig defines the configuration for trace collection.

Appears In:
Field Description Default Validation

endpoint string

endpoint specifies the target URL or address for the OpenTelemetry backend
where traces should be sent.

Format: uri
MinLength: 1

protocol specifies the OpenTelemetry transport protocol.

Enum: [grpc http/protobuf]

projectName string

projectName specifies the "openinference.project.name" resource attribute in accordance
with the OpenInference specification.

MinLength: 1

headersSecretRef SecretKeySelector

headersSecretRef selects a key field within a Secret in this namespace
that should contain HTTP headers to be sent to the OpenTelemetry backend.

The value should be a semicolon-separated list of HTTP headers
in the format "Header-Name=header-value".

For example:

Authorization=Bearer%20token123; X-Custom-Header=custom-value

ManualUpgradeStrategy

ManualUpgradeStrategy is currently unsupported

Appears In:

ModelEndpoint

ModelEndpoint is a request to serve an AI model using a particular engine on 1 or more VMs

Field Description Default Validation

apiVersion string

pais.vmware.com/v1alpha1

kind string

ModelEndpoint

metadata ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

MinProperties: 1

ModelEndpointSpec

ModelEndpointSpec defines the desired state of ModelEndpoint

Appears In:
Field Description Default Validation

type ModelType

type defines if the model is designed for completions or embeddings

Enum: [Completions Embeddings]

engine describes what inferencing engine should be used when running a particular model

Enum: [Infinity vLLM LlamaCPP]

model describes the model which should be run for inference

replicas integer

replicas describes how many instances of this model should be running

Note that currently, all replicas of a ModelEndpoint will run in a single vSphere Zone
which may be customized by spec.failureDomain.

1

Minimum: 0

routingName RoutingName

routingName defines how this model will appear in the data plane API
For example, the model "id" in /api/v1/compatibility/openai/v1/models
and elsewhere in the data plane API

MaxLength: 253
MinLength: 1

virtualMachineClassName string

virtualMachineClassName specifies the virtual machine class to use for running this model endpoint
This is used to create the virtual machine for a node pool in the VKS cluster.

Note this value may interact with FailureDomain. Ensure your chosen FailureDomain has hardware
sufficient to support this choice.

MinLength: 1

storageClassName string

storageClassName specifies the storage class to use for running this model endpoint
This is used to create the virtual machine for a node pool in the VKS cluster

MinLength: 1

failureDomain string

failureDomain specifies the failure domain (vSphere Zone) to use for running this model endpoint.
This string should be the metadata.name of a topology.tanzu.vmware.com Zone resource in this namespace.
This is used to create the virtual machine(s) for the VKS node pool running this model.

This choice of Zone may limit the available hardware (e.g. GPUs) available for this ModelEndpoint.
See also spec.VirtualMachineClassName.

Note that currently, all replicas of a ModelEndpoint will run in this 1 specified failureDomain.

If this value is unset, the system will fallback to a default value, which may be customized via the special
annotation "pais.vmware.com/default-failure-domain" on the PAISConfiguration resource in this namespace.
That annotation value (if set) should be the metadata.name of a topology.tanzu.vmware.com Zone resource
in this namespace.
If unset, but Zones are in the namespce, then the PAISConfiguration controller will
initialize that annotation on the PAISConfiguration. A user may modify it.
ModelEndpoint spec.failureDomain will always take precedence over that annotation.

MinLength: 1

inferenceServerCustomization InferenceServerCustomization

inferenceServerCustomization describes additional customization that can be appended
when starting the inference engine

{ }

overrides string

overrides is not yet implemented.

Once implemented, it will enable a user to provide ytt overlays to customize the
inference components, including the underlying node settings.
Values provided here will take precedence over all other fields on this resource.
Users can easily break their system by setting this field, and should be discouraged from using it unless necessary.

ModelEndpointSpecModel

ModelEndpointSpecModel describes a model which should be run for inference

Appears In:
Field Description Default Validation

ociRef string

ociRef is a reference to an OCI artifact containing the model to run for inference.
We expect that this artifact is pushed using the Private AI Services CLI (vcf pais models push …​)

MaxLength: 1024
MinLength: 1

pullSecrets LocalObjectReference array

pullSecrets describe a list of references to Kubernetes secrets to use
when pulling this model from the remote OCI registry (e.g. Harbor).
The same secret is also available for pulling the engine image, e.g.
when setting spec.inferenceServerCustomization.engineImage to use an authenticated registry.
The secrets should have the same format as Pod spec.imagePullSecret
See: https://kubernetes.io/docs/tasks/configure-pod-container/pull-image-private-registry/#registry-secret-existing-credentials

Note
Currently this list may contain at most 1 item. In the future,
multiple secrets may be supported.

We recommend the engine image be hosted in a different repository than the model
although they may be in the same registry.
If using different registries, ensure the JSON "auths" block lists both.

MaxItems: 1

ModelEndpointStatus

ModelEndpointStatus defines the observed state of ModelEndpoint

Validation:
  • MinProperties: 1

Appears In:
Field Description Default Validation

conditions Condition array

conditions update as changes occur in the status.
They display the current state of the model endpoint

observedGeneration integer

observedGeneration describes the generation observed by the ModelEndpoint controller.

children ChildStatus array

children reports status information about child resources owned by this ModelEndpoint

controllerVersion string

controllerVersion reports a version string for the controller-manager which has most recently reconciled this resource

MinLength: 1

ModelType

Underlying type: string

ModelType describes the valid types of models.

Validation:
  • Enum: [Completions Embeddings]

ModelTypeWithEngine

ModelTypeWithEngine describes a type of model inference and an engine to use for that inference for ModelEndpoint

Appears In:
Field Description Default Validation

type ModelType

type defines if the model is designed for completions or embeddings

Enum: [Completions Embeddings]

engine describes what inferencing engine should be used when running a particular model

Enum: [Infinity vLLM LlamaCPP]

NvidiaGPURuntimeConfig

NvidiaGPURuntimeConfig defines NVIDIA GPU Driver software configuration (e.g. license config, nvcr.io image pull secrets etc.)

Appears In:
Field Description Default Validation

gpuDriverType GPUDriverType

gpuDriverType determines which type of NVIDIA GPU driver to use.

Allowed values are NVAIE (default) and OSS.

NVAIE supports both vGPU and passthrough devices, and requires a license key and pull secret from NVIDIA.
In this mode, the GPU Operator will be configured to use the proprietary vGPU-capable driver.

OSS supports passthrough devices (no vGPU support). In this case the licenseConfigRef must not be set.
In this mode, the GPU Operator will use its default driver configuration,
which pulls the open-source driver from public NVIDIA repositories.
You can still customize the driver configuration using gpuOperatorOverridesRef
to specify driver versions, repositories, etc.

NVAIE

Enum: [NVAIE OSS]

licenseConfigRef LocalObjectReference

licenseConfigRef names a ConfigMap in this namespace containing Nvidia license access token and Nvidia GRID configuration.
ConfigMap data must have keys named client_configuration_token.tok and gridd.conf.
This pull secret is only for the driver image pull, and not for other gpu-operator container images.
This field is required when gpuDriverType is NVAIE or unset.
If gpuDriverType is OSS, then this field must not be set.

imagePullSecretRef LocalObjectReference

imagePullSecretRef names a Secret in this namespace containing NGC access token ("personal key") to be used as ImagePullSecret for access to Nvidia GPU operator container images (e.g. gpu-operator, vGPU driver). Secret must be of type kubernetes.io/dockerconfigjson.

gpuOperatorOverridesRef LocalObjectReference

gpuOperatorOverridesRef names a ConfigMap in this namespace containing Helm chart values for Nvidia gpu-operator. ConfigMap data must have a key named values.yaml with content as overrides for gpu-operator Helm chart input parameters.

OIDC

OIDC describes the details of the upstream OIDC provider Private AI Services will use

Appears In:
Field Description Default Validation

issuerUrl string

issuerUrl is the url of an OpenID provider endpoint that publishes the metadata for clients to use to construct a request to an OpenID server.
NOTE: The provider must support the discovery endpoint in the form of IssuerUrl + "/.well-known/openid-configuration"

Format: uri
MinLength: 1

scope string array

scope defines what scopes are requested when initiating the auth flow.

clientId string

clientId is the client ID used for communicating with the OIDC provider. The client must
be configured to support authorization flows with PKCE (Proof Key for Code Exchange) for
this Private AI Services instance.

MinLength: 1

extraAudiences string array

extraAudiences allows for additional OAuth2.0 client that Private AI Services will accept when validating the Access Token.

groupsClaim string

groupsClaim is an OIDC claim that the Private AI Services runtime will expect to exist in the clientId Token.
This group claim should map to the groups claim for your upstream OIDC provider.

groups

MinLength: 1

authorizedGroups string array

authorizedGroups is the list of group names that are used to authorize user access
to this Private AI Services instance. If a non-empty list is provided, a successfully
authenticated user must be in at least one of the provided groups to be granted access.
If an empty list is provided, no group membership is required and any authenticated
user is granted access.

ObservabilitySpec

ObservabilitySpec configures observability features

Validation:
  • MinProperties: 1

Appears In:
Field Description Default Validation

prometheusRuntime PrometheusRuntimeConfig

prometheusRuntime deploys and configures additional components to collect metrics from
the Private AI Services instance in this namespace.
If this value is unset, metrics collection is disabled and those additional components will not be deployed.

llmTraces LLMTracesConfig

llmTraces configures collection of traces from LLM components
and forwarding to an OpenTelemetry backend.
If unset, trace collection is disabled.

OpenTelemetryTransportProtocol

Underlying type: string

OpenTelemetryTransportProtocol defines the transport protocol for OpenTelemetry export. See https://opentelemetry.io/docs/specs/otlp

Validation:
  • Enum: [grpc http/protobuf]

Appears In:

PAISConfiguration

PAISConfiguration is the Schema for the paisconfigurations API

Field Description Default Validation

apiVersion string

pais.vmware.com/v1alpha1

kind string

PAISConfiguration

metadata ObjectMeta

Refer to Kubernetes API documentation for fields of metadata.

MinProperties: 1

PAISConfigurationSpec

PAISConfigurationSpec defines the desired state of the Private AI Services instance within this namespace

Appears In:
Field Description Default Validation

clientTls ClientTLS

clientTls configures TLS/SSL clients used by Private AI Services to connect to remote services

worker WorkerConfig

worker sets configuration for the Private AI Services Data Indexing and Retrieval workers

database Database

database defines the database connection parameters that the Hub API will use

auth Auth

auth defines the authentication configuration for the Private AI Services API and UI

ingress Ingress

ingress defines how the Private AI Services will be accessible
If unset, a Service of type LoadBalancer will be created to expose this Private AI Services instance

{ serviceType:LoadBalancer }

vksControlPlane VKSControlPlaneConfig

vksControlPlane specifies attributes of PAIS-managed vSphere Kubernetes Service cluster control plane

nvidiaConfig NvidiaGPURuntimeConfig

nvidiaConfig defines pointers to configuration needed for Nvidia NVAIE software

runtimeConfig RuntimeConfig

runtimeConfig defines the desired state for the Private AI Services runtime configuration

upgradeStrategy UpgradeStrategy

upgradeStrategy defines how this instance of Private AI Services will be upgraded
when a new version is installed into this Supervisor.

Currently only the "automatic" strategy is supported, which means that this instance will be
automatically upgraded to the latest version installed into this Supervisor.
Future versions of Private AI Services will provide namespace-users more control over this.

{ automatic:map[] }

observability ObservabilitySpec

observability configures metrics and trace collection for this instance of Private AI Services.
If unset, metrics and trace features are disabled for components in this namespace.

MinProperties: 1

defaultStorageClassName string

defaultStorageClassName sets the storage class used by components of this instance of Private AI Services.
New deployments should set this field (it is optional only for backwards compatibility).

MinLength: 1

PAISConfigurationStatus

PAISConfigurationStatus defines the observed state of PAISConfiguration

Validation:
  • MinProperties: 1

Appears In:
Field Description Default Validation

conditions Condition array

conditions update as changes occur to this resource

ingressServiceRef LocalObjectReference

ingressServiceRef references the Service where this instance of Private AI Services is reachable

observedGeneration integer

observedGeneration describes the generation of this resource observed by the PAISConfiguration controller.

children ChildStatus array

children reports status information about child resources owned by this PAISConfiguration

controllerVersion string

controllerVersion reports a version string for the controller-manager which has most recently reconciled this resource

MinLength: 1

PrometheusRuntimeConfig

PrometheusRuntimeConfig defines the desired state for the metrics collection configuration.

Appears In:
Field Description Default Validation

metricsRetention string

metricsRetention defines limit on how long (in number of days) to keep observability metrics data.

90d

Pattern: ^[1-9][0-9]*[d]$

storageClassName string

storageClassName can be used to customize the storage class for metrics data.
If unset, spec.defaultStorageClassName will be used instead.

MinLength: 1

prometheusOverridesRef LocalObjectReference

prometheusOverridesRef is not yet supported.
Once support is in place it, can names a ConfigMap in this namespace with configuration overrides for
Prometheus components deployed to support metrics colletion. This ConfigMap data must have a key
named values.yaml with content as overrides for Prometheus input parameters. This field is usually
not expected to be set and should be used only for advanced use cases where Prometheus configuration
choices made by PAIS are insufficient or undesirable.

RoutingName

Underlying type: string

RoutingName a user-readable string to route to a particular model

Validation:
  • MaxLength: 253

  • MinLength: 1

RuntimeConfig

RuntimeConfig defines configuration/tuning parameters for all Private AI Services components

Appears In:
Field Description Default Validation

logVerbosity specifies the desired log verbosity used for all Private AI Services components.
Modify only if instructed by Broadcom support to collected detailed error logs.

Enum: [Debug Info Warning Error]

apiRuntimeConfig APIRuntimeConfig

apiRuntimeConfig specifies the desired state of the configuration for the Private AI Services API

indexingWorkersRuntimeConfig IndexingWorkersRuntimeConfig

indexingWorkersRuntimeConfig specifies the desired state of the configuration for the
Private AI Services Indexing workers

RuntimeConfigLogVerbosity

Underlying type: string

RuntimeConfigLogVerbosity describes the valid types of log levels of the Private AI Services API.

Validation:
  • Enum: [Debug Info Warning Error]

Appears In:

RuntimeDeploymentConfig

RuntimeDeploymentConfig defines configuration/tuning parameters for a Private AI Services component

Field Description Default Validation

replicas integer

replicas specifies the desired replicas for the Private AI Services component.

1

Minimum: 1

resources describes the desired compute resource requirements for the
Private AI Services component.

TLSVerification

Underlying type: string

TLSVerification describes how to verify TLS connections to the backend

Validation:
  • Enum: [strict caOnly none mutual]

Appears In:

UpgradeStrategy

UpgradeStrategy defines how this instance of Private AI Services will be upgraded

Appears In:
Field Description Default Validation

automatic means the Supervisor Service will manage upgrades of this instance of Private AI Services

manual is currently unsupported

VKSControlPlaneConfig

VKSControlPlaneConfig defines configuration for the control plane of the VKS cluster used by Private AI Services.

Appears In:
Field Description Default Validation

virtualMachineClassName string

virtualMachineClassName specifies the virtual machine class to use for the control plane node of the VKS cluster.
Note this does not affect the virtual machine class used to run ModelEndpoint inference servers.

MinLength: 1

storageClassName string

storageClassName can be used to customize the storage of the control plane node of the
VKS cluster used by Private AI Services.
If unset, spec.defaultStorageClassName will be used instead.
Note this does not affect the storage used for ModelEndpoint inference servers.

MinLength: 1

upgradeStrategy VKSUpgradeStrategy

upgradeStrategy configures how the backing VKS cluster is upgraded.
Currently, only the Manual strategy is available.
In future releases, other upgrade strategies will be available.

If unset, the cluster upgrade strategy will not be under user control.

In the current version of Private AI Services, an unset value is equivalent to a Manual strategy using
a particular VKS version known to Private AI Services at the time it was released.

Future versions of Private AI Services may change the default behavior for this field, e.g. to automatically
upgrade the VKS cluster to the latest available patch release.

VKSManualUpgradeStrategy

VKSManualUpgradeStrategy describes the manual upgrade strategy for upgrading the VKS cluster

Appears In:
Field Description Default Validation

version string

version specifies the Kubernetes Release version to use for the VKS cluster hosting ModelEndpoints.

A valid version string is formatted like "v1.32.0+vmware.6-fips-vkr.2"

To list versions available in your Supervisor cluster, run
kubectl get kubernetesreleases | grep v1.3

Changing this field will cause temporary downtime for ModelEndpoints with only 1 replica.
This is because the VKS Cluster is deployed with maxUnavailable: 1 in order to not require
customers to over-provision scarce GPUs.

Please test changes in a non-production environment, and plan for an outage window.

Only use release versions which have been documented as supported for PAIS.

Warning: Other version strings, such as "v1.31" may be accepted by the API,
but could result in unexpected behavior, including unexpected downtime for ModelEndpoints
at arbitrary times in the future (not just when this field is changed).
This issue will be fixed in a future version.

It is also possible to specify the version by setting configuration on the PAIS
supervisor service when it is installed, via the key "supervisor_service.vks.version"
If both fields are set, then this namespace-scoped custom-resource has precedence over
the supervisor-service configuration.

MinLength: 1
Pattern: ^v([0-9]+)\.([0-9]+)[\.0-9a-zA-Z\-\+]+$

VKSUpgradeStrategy

VKSUpgradeStrategy describes the strategy for upgrading the VKS cluster

Appears In:
Field Description Default Validation

manual upgrade strategy means that the VKS Kubernetes release version is set by
the Version field.

WorkerConfig

WorkerConfig defines the configuration for Data Indexing and Retrieval workers

Appears In:
Field Description Default Validation

storageClassName string

storageClassName is now deprecated.

Deprecated: Set the top-level spec.defaultStorageClassName instead.

As of Private AI Services 2.1, Data Indexing workers use ephemeral storage (emptyDir volumes).
However, the Valkey storage will fall back to this value if spec.defaultStorageClassName is unset.

MinLength: 1