title	content_type	package	auto_generated	description
llmaz core API	tool-reference	llmaz.io/v1alpha1	true	Generated API reference documentation for llmaz.io/v1alpha1.

Resource Types

OpenModel

`OpenModel` {#llmaz-io-v1alpha1-OpenModel}

Appears in:

OpenModel is the Schema for the open models API

Field	Description
`apiVersion` string	`llmaz.io/v1alpha1`
`kind` string	`OpenModel`
`spec` [Required] `ModelSpec`	No description provided.
`status` [Required] `ModelStatus`	No description provided.

`Flavor` {#llmaz-io-v1alpha1-Flavor}

Appears in:

InferenceConfig

Flavor defines the accelerator requirements for a model and the necessary parameters in autoscaling. Right now, it will be used in two places:

Pod scheduling with node selectors specified.
Cluster autoscaling with essential parameters provided.

Field	Description
`name` [Required] `FlavorName`	Name represents the flavor name, which will be used in model claim.
`limits` `k8s.io/api/core/v1.ResourceList`	Limits defines the required accelerators to serve the model for each replica, like <nvidia.com/gpu: 8>. For multi-hosts cases, the limits here indicates the resource requirements for each replica, usually equals to the TP size. Not recommended to set the cpu and memory usage here: if using playground, you can define the cpu/mem usage at backendConfig. if using inference service, you can define the cpu/mem at the container resources. However, if you define the same accelerator resources at playground/service as well, the resources will be overwritten by the flavor limit here.
`nodeSelector` `map[string]string`	NodeSelector represents the node candidates for Pod placements, if a node doesn't meet the nodeSelector, it will be filtered out in the resourceFungibility scheduler plugin. If nodeSelector is empty, it means every node is a candidate.
`params` `map[string]string`	Params stores other useful parameters and will be consumed by cluster-autoscaler / Karpenter for autoscaling or be defined as model parallelism parameters like TP or PP size. E.g. with autoscaling, when scaling up nodes with 8x Nvidia A00, the parameter can be injected with <INSTANCE-TYPE: p4d.24xlarge> for AWS. Preset parameters: TP, PP, INSTANCE-TYPE.

`FlavorName` {#llmaz-io-v1alpha1-FlavorName}

(Alias of string)

Appears in:

Flavor

`InferenceConfig` {#llmaz-io-v1alpha1-InferenceConfig}

Appears in:

ModelSpec

InferenceConfig represents the inference configurations for the model.

Field	Description
`flavors` `[]Flavor`	Flavors represents the accelerator requirements to serve the model. Flavors are fungible following the priority represented by the slice order.

`ModelHub` {#llmaz-io-v1alpha1-ModelHub}

Appears in:

ModelSource

ModelHub represents the model registry for model downloads.

Field	Description
`name` `string`	Name refers to the model registry, such as huggingface.
`modelID` [Required] `string`	ModelID refers to the model identifier on model hub, such as meta-llama/Meta-Llama-3-8B.
`filename` [Required] `string`	Filename refers to a specified model file rather than the whole repo. This is helpful to download a specified GGUF model rather than downloading the whole repo which includes all kinds of quantized models. TODO: this is only supported with Huggingface, add support for ModelScope in the near future. Note: once filename is set, allowPatterns and ignorePatterns should be left unset.
`revision` `string`	Revision refers to a Git revision id which can be a branch name, a tag, or a commit hash.
`allowPatterns` `[]string`	AllowPatterns refers to files matched with at least one pattern will be downloaded.
`ignorePatterns` `[]string`	IgnorePatterns refers to files matched with any of the patterns will not be downloaded.

`ModelName` {#llmaz-io-v1alpha1-ModelName}

(Alias of string)

Appears in:

ModelRef
ModelSpec

`ModelRef` {#llmaz-io-v1alpha1-ModelRef}

Appears in:

ModelRef refers to a created Model with it's role.

Field	Description
`name` [Required] `ModelName`	Name represents the model name.
`role` `ModelRole`	Role represents the model role once more than one model is required. Such as a draft role, which means running with SpeculativeDecoding, and default arguments for backend will be searched in backendRuntime with the name of speculative-decoding.

`ModelRole` {#llmaz-io-v1alpha1-ModelRole}

(Alias of string)

Appears in:

ModelRef

`ModelSource` {#llmaz-io-v1alpha1-ModelSource}

Appears in:

ModelSpec

ModelSource represents the source of the model. Only one model source will be used.

Field Description

modelHub
ModelHub

ModelHub represents the model registry for model downloads.

uri
URIProtocol

URI represents a various kinds of model sources following the uri protocol, protocol://, e.g.

oss://./
ollama://llama3.3
host://

`ModelSpec` {#llmaz-io-v1alpha1-ModelSpec}

Appears in:

OpenModel

ModelSpec defines the desired state of Model

Field	Description
`familyName` [Required] `ModelName`	FamilyName represents the model type, like llama2, which will be auto injected to the labels with the key of `llmaz.io/model-family-name`.
`source` [Required] `ModelSource`	Source represents the source of the model, there're several ways to load the model such as loading from huggingface, OCI registry, s3, host path and so on.
`inferenceConfig` [Required] `InferenceConfig`	InferenceConfig represents the inference configurations for the model.

`ModelStatus` {#llmaz-io-v1alpha1-ModelStatus}

Appears in:

OpenModel

ModelStatus defines the observed state of Model

Field	Description
`conditions` [Required] `[]k8s.io/apimachinery/pkg/apis/meta/v1.Condition`	Conditions represents the Inference condition.

`URIProtocol` {#llmaz-io-v1alpha1-URIProtocol}

(Alias of string)

Appears in:

ModelSource

URIProtocol represents the protocol of the URI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

core.v1alpha1.md

core.v1alpha1.md

Resource Types

`OpenModel` {#llmaz-io-v1alpha1-OpenModel}

`Flavor` {#llmaz-io-v1alpha1-Flavor}

`FlavorName` {#llmaz-io-v1alpha1-FlavorName}

`InferenceConfig` {#llmaz-io-v1alpha1-InferenceConfig}

`ModelHub` {#llmaz-io-v1alpha1-ModelHub}

`ModelName` {#llmaz-io-v1alpha1-ModelName}

`ModelRef` {#llmaz-io-v1alpha1-ModelRef}

`ModelRole` {#llmaz-io-v1alpha1-ModelRole}

`ModelSource` {#llmaz-io-v1alpha1-ModelSource}

`ModelSpec` {#llmaz-io-v1alpha1-ModelSpec}

`ModelStatus` {#llmaz-io-v1alpha1-ModelStatus}

`URIProtocol` {#llmaz-io-v1alpha1-URIProtocol}

Files

core.v1alpha1.md

Latest commit

History

core.v1alpha1.md

File metadata and controls

Resource Types

OpenModel {#llmaz-io-v1alpha1-OpenModel}

Flavor {#llmaz-io-v1alpha1-Flavor}

FlavorName {#llmaz-io-v1alpha1-FlavorName}

InferenceConfig {#llmaz-io-v1alpha1-InferenceConfig}

ModelHub {#llmaz-io-v1alpha1-ModelHub}

ModelName {#llmaz-io-v1alpha1-ModelName}

ModelRef {#llmaz-io-v1alpha1-ModelRef}

ModelRole {#llmaz-io-v1alpha1-ModelRole}

ModelSource {#llmaz-io-v1alpha1-ModelSource}

ModelSpec {#llmaz-io-v1alpha1-ModelSpec}

ModelStatus {#llmaz-io-v1alpha1-ModelStatus}

URIProtocol {#llmaz-io-v1alpha1-URIProtocol}

`OpenModel` {#llmaz-io-v1alpha1-OpenModel}

`Flavor` {#llmaz-io-v1alpha1-Flavor}

`FlavorName` {#llmaz-io-v1alpha1-FlavorName}

`InferenceConfig` {#llmaz-io-v1alpha1-InferenceConfig}

`ModelHub` {#llmaz-io-v1alpha1-ModelHub}

`ModelName` {#llmaz-io-v1alpha1-ModelName}

`ModelRef` {#llmaz-io-v1alpha1-ModelRef}

`ModelRole` {#llmaz-io-v1alpha1-ModelRole}

`ModelSource` {#llmaz-io-v1alpha1-ModelSource}

`ModelSpec` {#llmaz-io-v1alpha1-ModelSpec}

`ModelStatus` {#llmaz-io-v1alpha1-ModelStatus}

`URIProtocol` {#llmaz-io-v1alpha1-URIProtocol}