Configuration
Plugin maintains the configuration in the conf/base/kubeflow.yaml
file. Sample configuration can be generated using kedro kubeflow init
:
# Base url of the Kubeflow Pipelines, should include the schema (http/https)
host: https://kubeflow.example.com/pipelines
# Configuration used to run the pipeline
run_config:
# Name of the image to run as the pipeline steps
image: kubeflow-plugin-demo
# Pull pilicy to be used for the steps. Use Always if you push the images
# on the same tag, or Never if you use only local images
image_pull_policy: IfNotPresent
# Location of Vertex AI GCS root, required only for vertex ai pipelines configuration
root: bucket_name/gcs_suffix
# Name of the kubeflow experiment to be created
experiment_name: Kubeflow Plugin Demo
# Name of the run for run-once
run_name: Kubeflow Plugin Demo Run
# Optional pipeline description
description: Very Important Pipeline
# Flag indicating if the run-once should wait for the pipeline to finish
wait_for_completion: False
# How long to keep underlying Argo workflow (together with pods and data
# volume after pipeline finishes) [in seconds]. Default: 1 week
ttl: 604800
# Optional volume specification
volume:
# Storage class - use null (or no value) to use the default storage
# class deployed on the Kubernetes cluster
storageclass: # default
# The size of the volume that is created. Applicable for some storage
# classes
size: 1Gi
# Access mode of the volume used to exchange data. ReadWriteMany is
# preferred, but it is not supported on some environements (like GKE)
# Default value: ReadWriteOnce
#access_modes: [ReadWriteMany]
# Flag indicating if the data-volume-init step (copying raw data to the
# fresh volume) should be skipped
skip_init: False
# Allows to specify user executing pipelines within containers
# Default: root user (to avoid issues with volumes in GKE)
owner: 0
# Flak indicating if volume for inter-node data exchange should be
# kept after the pipeline is deleted
keep: False
# Optional section allowing adjustment of the resources
# reservations and limits for the nodes
resources:
# For nodes that require more RAM you can increase the "memory"
data_import_step:
memory: 2Gi
# Training nodes can utilize more than one CPU if the algoritm
# supports it
model_training:
cpu: 8
memory: 1Gi
# GPU-capable nodes can request 1 GPU slot
tensorflow_step:
nvidia.com/gpu: 1
# Default settings for the nodes
__default__:
cpu: 200m
memory: 64Mi
Dynamic configuration support
kedro-kubeflow
contains hook that enables TemplatedConfigLoader.
It allows passing environment variables to configuration files. It reads all environment variables following KEDRO_CONFIG_<NAME>
pattern, which you
can later inject in configuration file using ${name}
syntax.
There are two special variables KEDRO_CONFIG_COMMIT_ID
, KEDRO_CONFIG_BRANCH_NAME
with support specifying default when variable is not set,
e.g. ${commit_id|dirty}