API Reference
API Reference
pais.vmware.com/v1alpha1
Package v1alpha1 contains API Schema definitions for the pais v1alpha1 API group
APIRuntimeConfig
APIRuntimeConfig defines configuration/tuning parameters for all Private AI Services API
| Field | Description | Default | Validation |
|---|---|---|---|
|
deployment specifies the desired configuration for the Private AI Services API. Modify only if |
Auth
Auth describes the authentication backend for Private AI Service
| Field | Description | Default | Validation |
|---|---|---|---|
|
oidc defines the OpenID Connect connection details that the Private AI Services UI will use to authenticate users |
AutomaticUpgradeStrategy
AutomaticUpgradeStrategy means the Supervisor Service will manage upgrades
BackendAuth
BackendAuth configures authentication when connecting to a backend
| Field | Description | Default | Validation |
|---|---|---|---|
|
apiTokenRef may be used to provide a bearer token when making HTTPS requests to the backend. |
BackendTLS
BackendTLS configures TLS when connecting to a backend
| Field | Description | Default | Validation |
|---|---|---|---|
|
verification determines how to validate the HTTPS connection to the backend. Add certificate authorities to the PAISConfiguration.spec.clientTls.caBundleRefs in this namespace
|
Enum: [strict caOnly none mutual] |
ChildStatus
ChildStatus is the schema for child resources of PAISConfiguration
| Field | Description | Default | Validation |
|---|---|---|---|
|
APIGroup is the group for the resource being referenced. |
||
|
Kind is the type of resource being referenced |
||
|
Name is the name of resource being referenced |
||
|
observedGeneration describes the generation of this child observed by the PAISConfiguration controller |
ClientTLS
ClientTLS configures TLS/SSL clients used by Private AI Services to connect to remote services
| Field | Description | Default | Validation |
|---|---|---|---|
|
caBundleRefs specifies what certificates Private AI Services will trust when connecting to remote servers over TLS. Elements of this list must name ConfigMaps in the current namespace, In addition to being provided to PAIS pods, these CA bundles are included in VKS cluster Note that if this list of ConfigMaps is changed, the PAISConfiguration will be reconciled, Be aware that changes to this list with an existing cluster will result in a rollout |
DBPasswordRef
DBPasswordRef describes the database connection secret reference to connect to the database
| Field | Description | Default | Validation |
|---|---|---|---|
|
name of a Secret in this namespace |
MinLength: 1 |
|
|
fieldPath is the name of the key within the Secret containing the password. In addition to this key for the password, there should also be |
MinLength: 1 |
Database
Database describes the connection details for Private AI Services to connect to a Postgres database See https://gitlab-vmw.devops.broadcom.net/moneta/dsm-tsql-provisioner/-/blob/4c2d19b30bbd5da3857aa2cad93270cf336874b2/dsm-apis/api/databases/v1alpha1/database_common.go#L381
| Field | Description | Default | Validation |
|---|---|---|---|
|
host is the network hostname of a PostgreSQL server to use |
MinLength: 1 |
|
|
port is the TCP port to connect to on the database server |
5432 |
|
|
username to use when connecting to the database server |
MinLength: 1 |
|
|
passwordRef is a reference to a Secret in this namespace containing the password for this database user |
||
|
dbname is the name of the logical database to use within the server |
MinLength: 1 |
|
|
sslMode configures how to validate the connection with the database server |
Enum: [VerifyFull VerifyCA Require Allow] |
DatabaseSslMode
Underlying type: string
DatabaseSslMode describes how the Private AI Services instance validates the SSL connection to the database.
-
Enum: [VerifyFull VerifyCA Require Allow]
EnvVar
EnvVar represents an environment variable present in a Container.
NOTE: We do not use corev1.EnvVar, as we cannot implement all sources of data
that it supports in EnvVarSource (since we’d have to mount these values into
the VKS cluster)
NOTE: Immutability for both properties is already provided by the struct referencing
this type. Adding the XValidation rules here as well exceeds complexity allowed by the
k8s API
| Field | Description | Default | Validation |
|---|---|---|---|
|
name is the key for an environment variable override to be passed to the inference engine |
MaxLength: 128 |
|
|
value is the value for an environment variable override to be passed to the inference engine |
GPUDriverType
Underlying type: string
GPUDriverType defines types of GPU drivers
-
Enum: [NVAIE OSS]
IndexingWorkersRuntimeConfig
IndexingWorkersRuntimeConfig defines configuration/tuning parameters for all Private AI Services indexing workers
| Field | Description | Default | Validation |
|---|---|---|---|
|
deployment specifies the desired configuration for the Private AI Services workers performing |
||
|
workerThreads specifies the desired number of threads for each Private AI Services worker performing |
10 |
Maximum: 100 |
|
workerRateLimit specifies the desired rate-limit at which the Private AI Services workers performing |
100/s |
MinLength: 1 |
InferenceEngine
Underlying type: string
InferenceEngine describes the valid types of engines for ModelEndpoint.
-
Enum: [Infinity vLLM LlamaCPP]
InferenceGatewayRoute
InferenceGatewayRoute describes a routing rule for the Inference Gateway
| Field | Description | Default | Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
MinProperties: 1 |
InferenceGatewayRouteBackend
InferenceGatewayRouteBackend describes a model running on an inference server either in this namespace or elsewhere.
| Field | Description | Default | Validation |
|---|---|---|---|
|
httpBaseUrl defines the base url of the server hosting the model to use. For ModelEndpoints managed by the Private AI Services instance within this namespace For ModelEndpoints managed by a Private AI Services instance in a different namespace, To use a model hosted on a remote API (e.g. cloud hosted model), provide the base URL of Do not include the |
Format: uri |
|
|
modelId defines the name of the model used inside a request sent to the inference server. For requests to ModelEndpoints managed by the Private AI Services instance in this namespace, For models managed by a Private AI Services instance in another namespace, For a remote API (e.g. a cloud hosted model) this should be the "modelId" defined by that API. |
pais |
MinLength: 1 |
|
tls configures transport level security for the HTTPS connection to this backend. |
{ verification:strict } |
|
|
auth configures authentication to this backend |
InferenceGatewayRouteEngine
Underlying type: string
InferenceGatewayRouteEngine describes the valid types of engines for InferenceGatewayRoute.
-
Enum: [Infinity vLLM LlamaCPP OpenAI]
InferenceGatewayRouteMatches
InferenceGatewayRouteMatches describes the matching rules for a route
| Field | Description | Default | Validation |
|---|---|---|---|
|
routingName is the name that this namespace’s instance of Private AI Services |
MaxLength: 253 |
InferenceGatewayRouteModelTypeWithEngine
InferenceGatewayRouteModelTypeWithEngine describes a type of model inference and an engine to use for that inference for InferenceGatewayRoute
| Field | Description | Default | Validation |
|---|---|---|---|
|
type defines if the model is designed for completions or embeddings |
Enum: [Completions Embeddings] |
|
|
engine describes what inference engine is running this model. For a remote model where you don’t know the engine, you may set "OpenAI" |
Enum: [Infinity vLLM LlamaCPP OpenAI] |
InferenceGatewayRouteSpec
InferenceGatewayRouteSpec specifies the details of the routing rule
| Field | Description | Default | Validation |
|---|---|---|---|
|
type defines if the model is designed for completions or embeddings |
Enum: [Completions Embeddings] |
|
|
engine describes what inference engine is running this model. For a remote model where you don’t know the engine, you may set "OpenAI" |
Enum: [Infinity vLLM LlamaCPP OpenAI] |
|
|
matches describes how traffic will get routed to this model by the local instance of Private AI Services |
||
|
backend describes where inference requests should be forwarded to |
InferenceGatewayRouteStatus
InferenceGatewayRouteStatus reports the current status of an InferenceGatewayRoute
-
MinProperties: 1
| Field | Description | Default | Validation |
|---|---|---|---|
|
conditions update as changes occur in the status. |
InferenceServerCustomization
InferenceServerCustomization describes the extra customization that can be provided to the inference server
| Field | Description | Default | Validation |
|---|---|---|---|
|
cliArgs describe additional command-line arguments to append when starting the inference engine |
||
|
envVars describe additional environment variables to set when starting the inference engine |
MaxItems: 1024 |
|
|
engineImage will override the inference server container image. |
MinLength: 1 |
|
|
engineImageCompressedSize should be set to the compressed size of the engineImage, if that field is set. The compressed size of an image is the sum of the layers, and is typically displayed on the web UI of container registries like Docker Hub. This field is used when sizing the /var/lib/containerd mount on the VKS worker nodes hosting this ModelEndpoint. In future versions of Private AI Services, this field may no longer be required and may be deprecated. |
15Gi |
|
|
sharedMemoryMountSize determines the size of the /dev/shm mount point inside |
64Mi |
|
|
tempMountSize determines the size in bytes of the /tmp mount point available to the inference server. |
1Gi |
Ingress
Ingress defines the desired state for how the Private AI Services runtime is accessible in the cluster
| Field | Description | Default | Validation |
|---|---|---|---|
|
serviceType determines how the Private AI Services runtime will be exposed as a Kubernetes Service. |
LoadBalancer |
Enum: [ClusterIP LoadBalancer] |
LLMTracesConfig
LLMTracesConfig defines the configuration for trace collection.
| Field | Description | Default | Validation |
|---|---|---|---|
|
endpoint specifies the target URL or address for the OpenTelemetry backend |
Format: uri |
|
|
protocol specifies the OpenTelemetry transport protocol. |
Enum: [grpc http/protobuf] |
|
|
projectName specifies the "openinference.project.name" resource attribute in accordance |
MinLength: 1 |
|
|
headersSecretRef selects a key field within a Secret in this namespace The value should be a semicolon-separated list of HTTP headers For example: Authorization=Bearer%20token123; X-Custom-Header=custom-value |
ManualUpgradeStrategy
ManualUpgradeStrategy is currently unsupported
ModelEndpoint
ModelEndpoint is a request to serve an AI model using a particular engine on 1 or more VMs
| Field | Description | Default | Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
|||
|
MinProperties: 1 |
ModelEndpointSpec
ModelEndpointSpec defines the desired state of ModelEndpoint
| Field | Description | Default | Validation |
|---|---|---|---|
|
type defines if the model is designed for completions or embeddings |
Enum: [Completions Embeddings] |
|
|
engine describes what inferencing engine should be used when running a particular model |
Enum: [Infinity vLLM LlamaCPP] |
|
|
model describes the model which should be run for inference |
||
|
replicas describes how many instances of this model should be running Note that currently, all replicas of a ModelEndpoint will run in a single vSphere Zone |
1 |
Minimum: 0 |
|
routingName defines how this model will appear in the data plane API |
MaxLength: 253 |
|
|
virtualMachineClassName specifies the virtual machine class to use for running this model endpoint Note this value may interact with FailureDomain. Ensure your chosen FailureDomain has hardware |
MinLength: 1 |
|
|
storageClassName specifies the storage class to use for running this model endpoint |
MinLength: 1 |
|
|
failureDomain specifies the failure domain (vSphere Zone) to use for running this model endpoint. This choice of Zone may limit the available hardware (e.g. GPUs) available for this ModelEndpoint. Note that currently, all replicas of a ModelEndpoint will run in this 1 specified failureDomain. If this value is unset, the system will fallback to a default value, which may be customized via the special |
MinLength: 1 |
|
|
inferenceServerCustomization describes additional customization that can be appended |
{ } |
|
|
overrides is not yet implemented. Once implemented, it will enable a user to provide ytt overlays to customize the |
ModelEndpointSpecModel
ModelEndpointSpecModel describes a model which should be run for inference
| Field | Description | Default | Validation | ||
|---|---|---|---|---|---|
|
ociRef is a reference to an OCI artifact containing the model to run for inference. |
MaxLength: 1024 |
|||
|
pullSecrets describe a list of references to Kubernetes secrets to use
We recommend the engine image be hosted in a different repository than the model |
MaxItems: 1 |
ModelEndpointStatus
ModelEndpointStatus defines the observed state of ModelEndpoint
-
MinProperties: 1
| Field | Description | Default | Validation |
|---|---|---|---|
|
conditions update as changes occur in the status. |
||
|
observedGeneration describes the generation observed by the ModelEndpoint controller. |
||
|
children reports status information about child resources owned by this ModelEndpoint |
||
|
controllerVersion reports a version string for the controller-manager which has most recently reconciled this resource |
MinLength: 1 |
ModelType
Underlying type: string
ModelType describes the valid types of models.
-
Enum: [Completions Embeddings]
ModelTypeWithEngine
ModelTypeWithEngine describes a type of model inference and an engine to use for that inference for ModelEndpoint
| Field | Description | Default | Validation |
|---|---|---|---|
|
type defines if the model is designed for completions or embeddings |
Enum: [Completions Embeddings] |
|
|
engine describes what inferencing engine should be used when running a particular model |
Enum: [Infinity vLLM LlamaCPP] |
NvidiaGPURuntimeConfig
NvidiaGPURuntimeConfig defines NVIDIA GPU Driver software configuration (e.g. license config, nvcr.io image pull secrets etc.)
| Field | Description | Default | Validation |
|---|---|---|---|
|
gpuDriverType determines which type of NVIDIA GPU driver to use. Allowed values are NVAIE (default) and OSS. NVAIE supports both vGPU and passthrough devices, and requires a license key and pull secret from NVIDIA. OSS supports passthrough devices (no vGPU support). In this case the licenseConfigRef must not be set. |
NVAIE |
Enum: [NVAIE OSS] |
|
licenseConfigRef names a ConfigMap in this namespace containing Nvidia license access token and Nvidia GRID configuration. |
||
|
imagePullSecretRef names a Secret in this namespace containing NGC access token ("personal key") to be used as ImagePullSecret for access to Nvidia GPU operator container images (e.g. gpu-operator, vGPU driver). Secret must be of type |
||
|
gpuOperatorOverridesRef names a ConfigMap in this namespace containing Helm chart values for Nvidia gpu-operator. ConfigMap data must have a key named |
OIDC
OIDC describes the details of the upstream OIDC provider Private AI Services will use
| Field | Description | Default | Validation |
|---|---|---|---|
|
issuerUrl is the url of an OpenID provider endpoint that publishes the metadata for clients to use to construct a request to an OpenID server. |
Format: uri |
|
|
scope defines what scopes are requested when initiating the auth flow. |
||
|
clientId is the client ID used for communicating with the OIDC provider. The client must |
MinLength: 1 |
|
|
extraAudiences allows for additional OAuth2.0 client that Private AI Services will accept when validating the Access Token. |
||
|
groupsClaim is an OIDC claim that the Private AI Services runtime will expect to exist in the clientId Token. |
groups |
MinLength: 1 |
|
authorizedGroups is the list of group names that are used to authorize user access |
ObservabilitySpec
ObservabilitySpec configures observability features
-
MinProperties: 1
| Field | Description | Default | Validation |
|---|---|---|---|
|
prometheusRuntime deploys and configures additional components to collect metrics from |
||
|
llmTraces configures collection of traces from LLM components |
OpenTelemetryTransportProtocol
Underlying type: string
OpenTelemetryTransportProtocol defines the transport protocol for OpenTelemetry export. See https://opentelemetry.io/docs/specs/otlp
-
Enum: [grpc http/protobuf]
PAISConfiguration
PAISConfiguration is the Schema for the paisconfigurations API
| Field | Description | Default | Validation |
|---|---|---|---|
|
|
||
|
|
||
|
Refer to Kubernetes API documentation for fields of |
||
|
MinProperties: 1 |
PAISConfigurationSpec
PAISConfigurationSpec defines the desired state of the Private AI Services instance within this namespace
| Field | Description | Default | Validation |
|---|---|---|---|
|
clientTls configures TLS/SSL clients used by Private AI Services to connect to remote services |
||
|
worker sets configuration for the Private AI Services Data Indexing and Retrieval workers |
||
|
database defines the database connection parameters that the Hub API will use |
||
|
auth defines the authentication configuration for the Private AI Services API and UI |
||
|
ingress defines how the Private AI Services will be accessible |
{ serviceType:LoadBalancer } |
|
|
vksControlPlane specifies attributes of PAIS-managed vSphere Kubernetes Service cluster control plane |
||
|
nvidiaConfig defines pointers to configuration needed for Nvidia NVAIE software |
||
|
runtimeConfig defines the desired state for the Private AI Services runtime configuration |
||
|
upgradeStrategy defines how this instance of Private AI Services will be upgraded Currently only the "automatic" strategy is supported, which means that this instance will be |
{ automatic:map[] } |
|
|
observability configures metrics and trace collection for this instance of Private AI Services. |
MinProperties: 1 |
|
|
defaultStorageClassName sets the storage class used by components of this instance of Private AI Services. |
MinLength: 1 |
PAISConfigurationStatus
PAISConfigurationStatus defines the observed state of PAISConfiguration
-
MinProperties: 1
| Field | Description | Default | Validation |
|---|---|---|---|
|
conditions update as changes occur to this resource |
||
|
ingressServiceRef references the Service where this instance of Private AI Services is reachable |
||
|
observedGeneration describes the generation of this resource observed by the PAISConfiguration controller. |
||
|
children reports status information about child resources owned by this PAISConfiguration |
||
|
controllerVersion reports a version string for the controller-manager which has most recently reconciled this resource |
MinLength: 1 |
PrometheusRuntimeConfig
PrometheusRuntimeConfig defines the desired state for the metrics collection configuration.
| Field | Description | Default | Validation |
|---|---|---|---|
|
metricsRetention defines limit on how long (in number of days) to keep observability metrics data. |
90d |
Pattern: |
|
storageClassName can be used to customize the storage class for metrics data. |
MinLength: 1 |
|
|
prometheusOverridesRef is not yet supported. |
RoutingName
Underlying type: string
RoutingName a user-readable string to route to a particular model
-
MaxLength: 253
-
MinLength: 1
RuntimeConfig
RuntimeConfig defines configuration/tuning parameters for all Private AI Services components
| Field | Description | Default | Validation |
|---|---|---|---|
|
logVerbosity specifies the desired log verbosity used for all Private AI Services components. |
Enum: [Debug Info Warning Error] |
|
|
apiRuntimeConfig specifies the desired state of the configuration for the Private AI Services API |
||
|
indexingWorkersRuntimeConfig specifies the desired state of the configuration for the |
RuntimeConfigLogVerbosity
Underlying type: string
RuntimeConfigLogVerbosity describes the valid types of log levels of the Private AI Services API.
-
Enum: [Debug Info Warning Error]
RuntimeDeploymentConfig
RuntimeDeploymentConfig defines configuration/tuning parameters for a Private AI Services component
| Field | Description | Default | Validation |
|---|---|---|---|
|
replicas specifies the desired replicas for the Private AI Services component. |
1 |
Minimum: 1 |
|
resources describes the desired compute resource requirements for the |
TLSVerification
Underlying type: string
TLSVerification describes how to verify TLS connections to the backend
-
Enum: [strict caOnly none mutual]
UpgradeStrategy
UpgradeStrategy defines how this instance of Private AI Services will be upgraded
| Field | Description | Default | Validation |
|---|---|---|---|
|
automatic means the Supervisor Service will manage upgrades of this instance of Private AI Services |
||
|
manual is currently unsupported |
VKSControlPlaneConfig
VKSControlPlaneConfig defines configuration for the control plane of the VKS cluster used by Private AI Services.
| Field | Description | Default | Validation |
|---|---|---|---|
|
virtualMachineClassName specifies the virtual machine class to use for the control plane node of the VKS cluster. |
MinLength: 1 |
|
|
storageClassName can be used to customize the storage of the control plane node of the |
MinLength: 1 |
|
|
upgradeStrategy configures how the backing VKS cluster is upgraded. If unset, the cluster upgrade strategy will not be under user control. In the current version of Private AI Services, an unset value is equivalent to a Manual strategy using Future versions of Private AI Services may change the default behavior for this field, e.g. to automatically |
VKSManualUpgradeStrategy
VKSManualUpgradeStrategy describes the manual upgrade strategy for upgrading the VKS cluster
| Field | Description | Default | Validation |
|---|---|---|---|
|
version specifies the Kubernetes Release version to use for the VKS cluster hosting ModelEndpoints. A valid version string is formatted like "v1.32.0+vmware.6-fips-vkr.2" To list versions available in your Supervisor cluster, run Changing this field will cause temporary downtime for ModelEndpoints with only 1 replica. Please test changes in a non-production environment, and plan for an outage window. Only use release versions which have been documented as supported for PAIS. Warning: Other version strings, such as "v1.31" may be accepted by the API, It is also possible to specify the version by setting configuration on the PAIS |
MinLength: 1 |
VKSUpgradeStrategy
VKSUpgradeStrategy describes the strategy for upgrading the VKS cluster
| Field | Description | Default | Validation |
|---|---|---|---|
|
manual upgrade strategy means that the VKS Kubernetes release version is set by |
WorkerConfig
WorkerConfig defines the configuration for Data Indexing and Retrieval workers
| Field | Description | Default | Validation |
|---|---|---|---|
|
storageClassName is now deprecated. Deprecated: Set the top-level spec.defaultStorageClassName instead. As of Private AI Services 2.1, Data Indexing workers use ephemeral storage (emptyDir volumes). |
MinLength: 1 |