Info alert:Important Notice

Please note that more information about the previous v2 releases can be found here. You can use "Find a release" search bar to search for a particular release.

Working with accelerators

Use accelerators, such as NVIDIA GPUs, AMD GPUs, and Intel Gaudi AI accelerators, to optimize the performance of your end-to-end data science workflows.

Overview of accelerators

If you work with large data sets, you can use accelerators to optimize the performance of your data science models in Open Data Hub. With accelerators, you can scale your work, reduce latency, and increase productivity. You can use accelerators in Open Data Hub to assist your data scientists in the following tasks:

Natural language processing (NLP)
Inference
Training deep neural networks
Data cleansing and data processing

You can use the following accelerators with Open Data Hub:

NVIDIA graphics processing units (GPUs)
- To use compute-heavy workloads in your models, you can enable NVIDIA graphics processing units (GPUs) in Open Data Hub.
- To enable NVIDIA GPUs on OpenShift, you must install the NVIDIA GPU Operator.
AMD graphics processing units (GPUs)
- Use the AMD GPU Operator to enable AMD GPUs for workloads such as AI/ML training and inference.
- To enable AMD GPUs on OpenShift, you must do the following tasks:
  - Install the AMD GPU Operator.
  - Follow the instructions for full deployment and driver configuration in the AMD GPU Operator documentation.
- Once installed, the AMD GPU Operator allows you to use the ROCm workbench images to streamline AI/ML workflows on AMD GPUs.
Intel Gaudi AI accelerators
- Intel provides hardware accelerators intended for deep learning workloads.
- Before you can enable Intel Gaudi AI accelerators in Open Data Hub, you must install the necessary dependencies. Also, the version of the Intel Gaudi AI Operator that you install must match the version of the corresponding workbench image in your deployment.
- A workbench image for Intel Gaudi accelerators is not included in Open Data Hub by default. Instead, you must create and configure a custom workbench to enable Intel Gaudi AI support.
- You can enable Intel Gaudi AI accelerators on-premises or with AWS DL1 compute nodes on an AWS instance.

Before you can use an accelerator in Open Data Hub, you must enable GPU support in Open Data Hub. This includes installing the Node Feature Discovery and NVIDIA GPU Operators. For more information, see NVIDIA GPU Operator on Red Hat OpenShift Container Platform in the NVIDIA documentation. In addition, your OpenShift instance must contain an associated hardware profile or accelerator profile. For accelerators that are new to your deployment, you must configure a hardware profile or accelerator profile for the accelerator in context. You can create a hardware profile from the Settings → Hardware profiles page on the Open Data Hub dashboard. If your deployment contains existing accelerators that had associated profiles already configured, the profiles are automatically created after you upgrade to the latest version of Open Data Hub.

Important

By default, hardware profiles are hidden in the dashboard navigation menu and user interface, while accelerator profiles remain visible. In addition, user interface components associated with the deprecated accelerator profiles functionality are still displayed. To show the Settings → Hardware profiles option in the dashboard navigation menu, and the user interface components associated with hardware profiles, set the disableHardwareProfiles value to false in the OdhDashboardConfig custom resource (CR) in OpenShift Container Platform. For more information about setting dashboard configuration options, see Customizing the dashboard.

Additional resources

Enabling accelerators

Before you can use an accelerator in Open Data Hub, you must install the relevant software components. The installation process varies based on the accelerator type.

Prerequisites

You have logged in to your OpenShift Container Platform cluster.
You have the cluster-admin role in your OpenShift Container Platform cluster.
You have installed an accelerator and confirmed that it is detected in your environment.

Procedure

Follow the appropriate documentation to enable your accelerator:
- NVIDIA GPUs: See Enabling NVIDIA GPUs.
- Intel Gaudi AI accelerators: See Intel Gaudi AI Accelerator integration.
- AMD GPUs: See AMD GPU Integration.

After installing your accelerator, create an accelerator profile as described in: Working with accelerator profiles.

Important

Verification

From the Administrator perspective, go to the Operators → Installed Operators page. Confirm that the following Operators appear:
- The Operator for your accelerator
- Node Feature Discovery (NFD)
- Kernel Module Management (KMM)

The accelerator is correctly detected a few minutes after full installation of the Node Feature Discovery (NFD) and the relevant accelerator Operator. The OpenShift Container Platform command line interface (CLI) displays the appropriate output for the GPU worker node. For example, here is output confirming that an NVIDIA GPU is detected:

# Expected output when the accelerator is detected correctly
oc describe node <node name>
...
Capacity:
  cpu:                4
  ephemeral-storage:  313981932Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16076568Ki
  nvidia.com/gpu:     1
  pods:               250
Allocatable:
  cpu:                3920m
  ephemeral-storage:  288292006229
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             12828440Ki
  nvidia.com/gpu:     1
  pods:               250

Enabling NVIDIA GPUs

Before you can use NVIDIA GPUs in Open Data Hub, you must install the NVIDIA GPU Operator.

Prerequisites

You have logged in to your OpenShift Container Platform cluster.
You have the cluster-admin role in your OpenShift Container Platform cluster.
You have installed an NVIDIA GPU and confirmed that it is detected in your environment.

Procedure

To enable GPU support on an OpenShift cluster, follow the instructions here: NVIDIA GPU Operator on Red Hat OpenShift Container Platform in the NVIDIA documentation.

Important

After you install the Node Feature Discovery (NFD) Operator, you must create an instance of NodeFeatureDiscovery. In addition, after you install the NVIDIA GPU Operator, you must create a ClusterPolicy and populate it with default values.

Delete the migration-gpu-status ConfigMap.
1. In the OpenShift Container Platform web console, switch to the Administrator perspective.
2. Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate ConfigMap.
3. Search for the migration-gpu-status ConfigMap.
4. Click the action menu (⋮) and select Delete ConfigMap from the list.
  
  The Delete ConfigMap dialog appears.
5. Inspect the dialog and confirm that you are deleting the correct ConfigMap.
6. Click Delete.
Restart the dashboard replicaset.
1. In the OpenShift Container Platform web console, switch to the Administrator perspective.
2. Click Workloads → Deployments.
3. Set the Project to All Projects or redhat-ods-applications to ensure you can see the appropriate deployment.
4. Search for the rhods-dashboard deployment.
5. Click the action menu (⋮) and select Restart Rollout from the list.
6. Wait until the Status column indicates that all pods in the rollout have fully restarted.

Verification

The reset migration-gpu-status instance is present on the Instances tab on the AcceleratorProfile custom resource definition (CRD) details page.
From the Administrator perspective, go to the Operators → Installed Operators page. Confirm that the following Operators appear:
- NVIDIA GPU
- Node Feature Discovery (NFD)
- Kernel Module Management (KMM)

The GPU is correctly detected a few minutes after full installation of the Node Feature Discovery (NFD) and NVIDIA GPU Operators. The OpenShift Container Platform command line interface (CLI) displays the appropriate output for the GPU worker node. For example:

# Expected output when the GPU is detected properly
oc describe node <node name>
...
Capacity:
  cpu:                4
  ephemeral-storage:  313981932Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16076568Ki
  nvidia.com/gpu:     1
  pods:               250
Allocatable:
  cpu:                3920m
  ephemeral-storage:  288292006229
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             12828440Ki
  nvidia.com/gpu:     1
  pods:               250

After installing the NVIDIA GPU Operator, create a hardware profile as described in Working with accelerators.

Important

Intel Gaudi AI Accelerator integration

To accelerate your high-performance deep learning models, you can integrate Intel Gaudi AI accelerators into Open Data Hub. This integration enables your data scientists to use Gaudi libraries and software associated with Intel Gaudi AI accelerators through custom-configured workbench instances.

Intel Gaudi AI accelerators offer optimized performance for deep learning workloads, with the latest Gaudi 3 devices providing significant improvements in training speed and energy efficiency. These accelerators are suitable for enterprises running machine learning and AI applications on Open Data Hub.

Before you can enable Intel Gaudi AI accelerators in Open Data Hub, you must complete the following steps:

Install the latest version of the Intel Gaudi AI Accelerator Operator from OperatorHub.
Create and configure a custom workbench image for Intel Gaudi AI accelerators. A prebuilt workbench image for Gaudi accelerators is not included in Open Data Hub.

Manually define and configure an accelerator profile or a hardware profile for each Intel Gaudi AI device in your environment.

Important

Red Hat supports Intel Gaudi devices up to Intel Gaudi 3. The Intel Gaudi 3 accelerators, in particular, offer the following benefits:

Improved training throughput: Reduce the time required to train large models by using advanced tensor processing cores and increased memory bandwidth.
Energy efficiency: Lower power consumption while maintaining high performance, reducing operational costs for large-scale deployments.
Scalable architecture: Scale across multiple nodes for distributed training configurations.

Your OpenShift platform must support EC2 DL1 instances to use Intel Gaudi AI accelerators in an Amazon EC2 DL1 instance. You can use Intel Gaudi AI accelerators in workbench instances or model serving after you enable the accelerators, create a custom workbench image, and configure the accelerator profile or the hardware profile.

To identify the Intel Gaudi AI accelerators present in your deployment, use the lspci utility. For more information, see lspci(8) - Linux man page.

Important

The presence of Intel Gaudi AI accelerators in your deployment, as indicated by the lspci utility, does not guarantee that the devices are ready to use. You must ensure that all installation and configuration steps are completed successfully.

Additional resources

AMD GPU Integration

You can use AMD GPUs with Open Data Hub to accelerate AI and machine learning (ML) workloads. AMD GPUs provide high-performance compute capabilities, allowing users to process large data sets, train deep neural networks, and perform complex inference tasks more efficiently.

Integrating AMD GPUs with Open Data Hub involves the following components:

ROCm workbench images: Use the ROCm workbench images to streamline AI/ML workflows on AMD GPUs. These images include libraries and frameworks optimized with the AMD ROCm platform, enabling high-performance workloads for PyTorch and TensorFlow. The pre-configured images reduce setup time and provide an optimized environment for GPU-accelerated development and experimentation.
AMD GPU Operator: The AMD GPU Operator simplifies GPU integration by automating driver installation, device plugin setup, and node labeling for GPU resource management. It ensures compatibility between OpenShift and AMD hardware while enabling scaling of GPU-enabled workloads.

Verifying AMD GPU availability on your cluster

Before you proceed with the AMD GPU Operator installation process, you can verify the presence of an AMD GPU device on a node within your OpenShift Container Platform cluster. You can use commands such as lspci or oc to confirm hardware and resource availability.

Prerequisites

You have administrative access to the OpenShift Container Platform cluster.
You have a running OpenShift Container Platform cluster with a node equipped with an AMD GPU.
You have access to the OpenShift CLI (oc) and terminal access to the node.

Procedure

Use the OpenShift CLI to verify if GPU resources are allocatable:
1. List all nodes in the cluster to identify the node with an AMD GPU:
  oc get nodes
2. Note the name of the node where you expect the AMD GPU to be present.
3. Describe the node to check its resource allocation:
  oc describe node <node_name>
4. In the output, locate the Capacity and Allocatable sections and confirm that amd.com/gpu is listed. For example:
  Capacity: amd.com/gpu: 1 Allocatable: amd.com/gpu: 1
Check for the AMD GPU device using the lspci command:
1. Log in to the node:
  oc debug node/<node_name> chroot /host
2. Run the lspci command and search for the supported AMD device in your deployment. For example:
  lspci | grep -E "MI210|MI250|MI300"
3. Verify that the output includes one of the AMD GPU models. For example:
  03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD] Instinct MI210
Optional: Use the rocminfo command if the ROCm stack is installed on the node:
```
rocminfo
```
1. Confirm that the ROCm tool outputs details about the AMD GPU, such as compute units, memory, and driver status.

Verification

The oc describe node <node_name> command lists amd.com/gpu under Capacity and Allocatable.
The lspci command output identifies an AMD GPU as a PCI device matching one of the specified models (for example, MI210, MI250, MI300).
Optional: The rocminfo tool provides detailed GPU information, confirming driver and hardware configuration.

Additional resources

AMD GPU Operator GitHub Repository

Enabling AMD GPUs

Before you can use AMD GPUs in Open Data Hub, you must install the required dependencies, deploy the AMD GPU Operator, and configure the environment.

Prerequisites

You have logged in to OpenShift Container Platform.
You have the cluster-admin role in OpenShift Container Platform.
You have installed your AMD GPU and confirmed that it is detected in your environment.
Your OpenShift Container Platform environment supports EC2 DL1 instances if you are running on Amazon Web Services (AWS).

Procedure

Install the latest version of the AMD GPU Operator, as described in Install AMD GPU Operator on OpenShift.
After installing the AMD GPU Operator, configure the AMD drivers required by the Operator as described in the documentation: Configure AMD drivers for the GPU Operator.

Note	Alternatively, you can install the AMD GPU Operator from the Red Hat Catalog. For more information, see Install AMD GPU Operator from Red Hat Catalog.

After installing the AMD GPU Operator, create an accelerator profile, as described in Working with accelerator profiles.

Important

Verification

From the Administrator perspective, go to the Operators → Installed Operators page. Confirm that the following Operators appear:

AMD GPU Operator
Node Feature Discovery (NFD)
Kernel Module Management (KMM)

Note	Ensure that you follow all the steps for proper driver installation and configuration. Incorrect installation or configuration may prevent the AMD GPUs from being recognized or functioning properly.

Working with accelerator profiles

Important

Accelerator profiles are now deprecated. To target specific worker nodes for workbenches or model serving workloads, use hardware profiles. By default, the accelerator profiles feature still appears in the dashboard navigation menu, while the hardware profiles feature is hidden.

To disable accelerator profiles in the dashboard navigation menu, set the disableAcceleratorProfiles value to true in the OdhDashboardConfig Custom Resource (CR) in OpenShift Container Platform. To enable hardware profiles, set the disableHardwareProfiles value to false in the same CR.

Note: The spec.dashboardConfig.disableAcceleratorProfiles option is superseded by the spec.dashboardConfig.disableHardwareProfiles option. If both options are set to false, the disableHardwareProfiles option overrides the disableAcceleratorProfiles option, and the Settings → Hardware profiles menu item is shown in the dashboard navigation menu.

For more information about setting dashboard configuration options, see Customizing the dashboard.

Red Hat recommends that you migrate any existing accelerator profiles to hardware profiles as soon as possible to ensure continued support and compatibility.

To configure accelerators for your data scientists to use in Open Data Hub, you must create an associated accelerator profile. An accelerator profile is a custom resource definition (CRD) on OpenShift that has an AcceleratorProfile resource, and defines the specification of the accelerator. You can create and manage accelerator profiles by selecting Settings → Accelerator profiles on the Open Data Hub dashboard.

For accelerators that are new to your deployment, you must manually configure an accelerator profile for each accelerator. If your deployment contains an accelerator before you upgrade, the associated accelerator profile remains after the upgrade. You can manage the accelerators that appear to your data scientists by assigning specific accelerator profiles to your custom workbench images. This example shows the code for a Habana Gaudi 1 accelerator profile:

---
apiVersion: dashboard.opendatahub.io/v1alpha
kind: AcceleratorProfile
metadata:
  name: hpu-profile-first-gen-gaudi
spec:
  displayName: Habana HPU - 1st Gen Gaudi
  description: First Generation Habana Gaudi device
  enabled: true
  identifier: habana.ai/gaudi
  tolerations:
    - effect: NoSchedule
      key: habana.ai/gaudi
      operator: Exists
---

The accelerator profile code appears on the Instances tab on the details page for the AcceleratorProfile custom resource definition (CRD). For more information about accelerator profile attributes, see the following table:

Table 1. Accelerator profile attributes
Attribute	Type	Required	Description
displayName	String	Required	The display name of the accelerator profile.
description	String	Optional	Descriptive text defining the accelerator profile.
identifier	String	Required	A unique identifier defining the accelerator resource.
enabled	Boolean	Required	Determines if the accelerator is visible in Open Data Hub.
tolerations	Array	Optional	The tolerations that can apply to workbenches and serving runtimes that use the accelerator. For more information about toleration attributes in Open Data Hub, see Toleration v1 core.

Additional resources

Viewing accelerator profiles

If you have defined accelerator profiles for Open Data Hub, you can view, enable, and disable them from the Accelerator profiles page.

Prerequisites

You have logged in to Open Data Hub as a user with Open Data Hub administrator privileges.
Your deployment contains existing accelerator profiles.

Procedure

From the Open Data Hub dashboard, click Settings → Accelerator profiles.

The Accelerator profiles page appears, displaying existing accelerator profiles.
Inspect the list of accelerator profiles. To enable or disable an accelerator profile, on the row containing the accelerator profile, click the toggle in the Enable column.

Verification

The Accelerator profiles page appears appears, displaying existing accelerator profiles.

Creating an accelerator profile

To configure accelerators for your data scientists to use in Open Data Hub, you must create an associated accelerator profile.

Prerequisites

You have logged in to Open Data Hub as a user with Open Data Hub administrator privileges.

Procedure

From the Open Data Hub dashboard, click Settings → Accelerator profiles.

The Accelerator profiles page appears, displaying existing accelerator profiles. To enable or disable an existing accelerator profile, on the row containing the relevant accelerator profile, click the toggle in the Enable column.
Click Create accelerator profile.

The Create accelerator profile dialog appears.
In the Name field, enter a name for the accelerator profile.
In the Identifier field, enter a unique string that identifies the hardware accelerator associated with the accelerator profile.
Optional: In the Description field, enter a description for the accelerator profile.
To enable or disable the accelerator profile immediately after creation, click the toggle in the Enable column.
Optional: Add a toleration to schedule pods with matching taints.
1. Click Add toleration.
  
  The Add toleration dialog opens.
2. From the Operator list, select one of the following options:
  - Equal - The key/value/effect parameters must match. This is the default.
  - Exists - The key/effect parameters must match. You must leave a blank value parameter, which matches any.
3. From the Effect list, select one of the following options:
  - None
  - NoSchedule - New pods that do not match the taint are not scheduled onto that node. Existing pods on the node remain.
  - PreferNoSchedule - New pods that do not match the taint might be scheduled onto that node, but the scheduler tries not to. Existing pods on the node remain.
  - NoExecute - New pods that do not match the taint cannot be scheduled onto that node. Existing pods on the node that do not have a matching toleration are removed.
4. In the Key field, enter a toleration key. The key is any string, up to 253 characters. The key must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
5. In the Value field, enter a toleration value. The value is any string, up to 63 characters. The value must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
6. In the Toleration Seconds section, select one of the following options to specify how long a pod stays bound to a node that has a node condition.
  - Forever - Pods stays permanently bound to a node.
  - Custom value - Enter a value, in seconds, to define how long pods stay bound to a node that has a node condition.
7. Click Add.
Click Create accelerator profile.

Verification

The accelerator profile appears on the Accelerator profiles page.
The Accelerator list appears on the Start a basic workbench page. After you select an accelerator, the Number of accelerators field appears, which you can use to choose the number of accelerators for your workbench.
The accelerator profile appears on the Instances tab on the details page for the AcceleratorProfile custom resource definition (CRD).

Additional resources

Updating an accelerator profile

You can update the existing accelerator profiles in your deployment. You might want to change important identifying information, such as the display name, the identifier, or the description.

Prerequisites

You have logged in to Open Data Hub as a user with Open Data Hub administrator privileges.
The accelerator profile exists in your deployment.

Procedure

From the Open Data Hub dashboard, click Settings → Workbench images.

The Workbench images page appears. Previously imported workbench images are displayed. To enable or disable a previously imported workbench image, on the row containing the relevant workbench image, click the toggle in the Enable column.
Click the action menu (⋮) and select Edit from the list.

The Edit accelerator profile dialog opens.
In the Name field, update the accelerator profile name.
In the Identifier field, update the unique string that identifies the hardware accelerator associated with the accelerator profile, if applicable.
Optional: In the Description field, update the accelerator profile.
To enable or disable the accelerator profile immediately after creation, click the toggle in the Enable column.
Optional: Add a toleration to schedule pods with matching taints.
1. Click Add toleration.
  
  The Add toleration dialog opens.
2. From the Operator list, select one of the following options:
  - Equal - The key/value/effect parameters must match. This is the default.
  - Exists - The key/effect parameters must match. You must leave a blank value parameter, which matches any.
3. From the Effect list, select one of the following options:
  - None
  - NoSchedule - New pods that do not match the taint are not scheduled onto that node. Existing pods on the node remain.
  - PreferNoSchedule - New pods that do not match the taint might be scheduled onto that node, but the scheduler tries not to. Existing pods on the node remain.
  - NoExecute - New pods that do not match the taint cannot be scheduled onto that node. Existing pods on the node that do not have a matching toleration are removed.
4. In the Key field, enter a toleration key. The key is any string, up to 253 characters. The key must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
5. In the Value field, enter a toleration value. The value is any string, up to 63 characters. The value must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
6. In the Toleration Seconds section, select one of the following options to specify how long a pod stays bound to a node that has a node condition.
  - Forever - Pods stays permanently bound to a node.
  - Custom value - Enter a value, in seconds, to define how long pods stay bound to a node that has a node condition.
7. Click Add.
If your accelerator profile contains existing tolerations, you can edit them.
1. Click the action menu (⋮) on the row containing the toleration that you want to edit and select Edit from the list.
2. Complete the applicable fields to update the details of the toleration.
3. Click Update.
Click Update accelerator profile.

Verification

If your accelerator profile has new identifying information, this information appears in the Accelerator list on the Start a basic workbench page.

Additional resources

Deleting an accelerator profile

To discard accelerator profiles that you no longer require, you can delete them so that they do not appear on the dashboard.

Prerequisites

You have logged in to Open Data Hub as a user with Open Data Hub administrator privileges.
The accelerator profile that you want to delete exists in your deployment.

Procedure

From the Open Data Hub dashboard, click Settings → Accelerator profiles.

The Accelerator profiles page appears, displaying existing accelerator profiles.
Click the action menu (⋮) beside the accelerator profile that you want to delete and click Delete.

The Delete accelerator profile dialog opens.
Enter the name of the accelerator profile in the text field to confirm that you intend to delete it.
Click Delete.

Verification

The accelerator profile no longer appears on the Accelerator profiles page.

Additional resources

Configuring a recommended accelerator for workbench images

To help you indicate the most suitable accelerators to your data scientists, you can configure a recommended tag to appear on the dashboard.

Prerequisites

You have logged in to Open Data Hub as a user with Open Data Hub administrator privileges.
You have existing workbench images in your deployment.
You have enabled GPU support. This includes installing the Node Feature Discovery and NVIDIA GPU Operators. For more information, see NVIDIA GPU Operator on Red Hat OpenShift Container Platform in the NVIDIA documentation.

Procedure

From the Open Data Hub dashboard, click Settings → Workbench images.

The Workbench images page appears. Previously imported workbench images are displayed.
Click the action menu (⋮) and select Edit from the list.

The Update workbench image dialog opens.
From the Accelerator identifier list, select an identifier to set its accelerator as recommended with the workbench image. If the workbench image contains only one accelerator identifier, the identifier name displays by default.

Click Update.

Note

If you have already configured an accelerator identifier for a workbench image, you can specify a recommended accelerator for the workbench image by creating an associated accelerator profile or hardware profile. To do this, click Create profile on the row containing the workbench image and complete the relevant fields. If the workbench image does not contain an accelerator identifier, you must manually configure one before creating an associated accelerator profile or hardware profile.

Important

By default, hardware profiles are hidden in the dashboard navigation menu and user interface, while accelerator profiles remain visible. In addition, user interface components associated with the deprecated accelerator profiles functionality are still displayed. If you enable hardware profiles, the Hardware profiles list appears instead of the Accelerator profiles list. To show the Settings → Hardware profiles option in the dashboard navigation menu, and the user interface components associated with hardware profiles, set the disableHardwareProfiles value to false in the OdhDashboardConfig custom resource (CR) in OpenShift Container Platform. For more information about setting dashboard configuration options, see Customizing the dashboard.

Verification

When your data scientists select an accelerator with a specific workbench image, a tag appears next to the corresponding accelerator indicating its compatibility.

Configuring a recommended accelerator for serving runtimes

To help you indicate the most suitable accelerators to your data scientists, you can configure a recommended accelerator tag for your serving runtimes.

Prerequisites

You have logged in to Open Data Hub as a user with Open Data Hub administrator privileges.
You have enabled GPU support. This includes installing the Node Feature Discovery and NVIDIA GPU Operators. For more information, see NVIDIA GPU Operator on Red Hat OpenShift Container Platform in the NVIDIA documentation.

Procedure

From the Open Data Hub dashboard, click Settings → Serving runtimes.

The Serving runtimes page opens and shows the model-serving runtimes that are already installed and enabled in your Open Data Hub deployment. By default, the OpenVINO Model Server runtime is pre-installed and enabled in Open Data Hub.

Edit your custom runtime that you want to add the recommended accelerator tag to, click the action menu (⋮) and select Edit.

A page with an embedded YAML editor opens.

Note

You cannot directly edit the OpenVINO Model Server runtime that is included in Open Data Hub by default. However, you can clone this runtime and edit the cloned version. You can then add the edited clone as a new, custom runtime. To do this, click the action menu beside the OpenVINO Model Server and select Duplicate.

In the editor, enter the YAML code to apply the annotation opendatahub.io/recommended-accelerators. The excerpt in this example shows the annotation to set a recommended tag for an NVIDIA GPU accelerator:
```
metadata:
	annotations:
		opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]'
```
Click Update.

Verification

When your data scientists select an accelerator with a specific serving runtime, a tag appears next to the corresponding accelerator indicating its compatibility.

Working with hardware profiles

In Open Data Hub, you can schedule user workloads on worker nodes that have specific hardware configurations, such as hardware accelerators, CPU-only nodes, or specialized memory allocations. With hardware profiles, you can define these hardware resources explicitly, enabling precise targeting of workloads to specific nodes and improving resource management efficiency.

Important

By default, this feature is hidden from appearing in the dashboard navigation menu. To show the Settings → Hardware profiles option in the dashboard navigation menu, and other user interface components associated with hardware profiles, set the disableHardwareProfiles value to false in the OdhDashboardConfig custom resource (CR) in OpenShift Container Platform. For more information about setting dashboard configuration options, see Customizing the dashboard.

You can use hardware profiles to create profiles with hardware identifiers, explicit resource allocation limits (CPU, memory, and accelerators), tolerations, and node selectors. These capabilities are particularly beneficial in environments with heterogeneous hardware, including multiple GPU types, CPU-only configurations, memory-intensive workloads, or even single-node deployments. This targeted scheduling significantly enhances resource utilization, reduces overhead, and optimizes costs, especially in complex environments, such as clusters with diverse hardware.

To get started, contact your cluster administrator to identify hardware resources available in your cluster.

To configure specific hardware configurations for your data scientists to use in Open Data Hub, you must create an associated hardware profile. A hardware profile is a custom resource definition (CRD) on OpenShift that has a HardwareProfile resource, and defines the hardware specification. You can create and manage hardware profiles by selecting Settings → Hardware profiles on the Open Data Hub dashboard.

After you create and enable a hardware profile, users can select the hardware profile in the user interface when deploying workbenches, model-serving workloads, and pipelines, where applicable.

Additional resources

Creating a hardware profile

To configure specific hardware configurations for your data scientists to use in Open Data Hub, you must create an associated hardware profile.

Prerequisites

You have logged in to Open Data Hub as a user with Open Data Hub administrator privileges.
The relevant hardware is installed and you have confirmed that it is detected in your environment.

Procedure

From the Open Data Hub dashboard, click Settings → Hardware profiles.

The Hardware profiles page appears, displaying existing hardware profiles. To enable or disable an existing hardware profile, on the row containing the relevant hardware profile, click the toggle in the Enabled column.
Click Create hardware profile.

The Create hardware profile page appears.
In the Name field, enter a name for the hardware profile.
Optional: To change the default name of your Kubernetes resource, click Edit resource name and enter a name in the Resource name field. The resource name cannot be edited after creation.
Optional: In the Description field, enter a description for the hardware profile.
In the Visiblity section, set the hardware profile visibility level:
1. To access the hardware profile in all areas of Open Data Hub, leave the Visible everywhere radio button selected.
2. Click the Limited visibility radio button to limit the areas of Open Data Hub where your data scientists can use the hardware profile.
Optional: Configure node resource request limits:
1. Click Add resource.
  
  The Add resource dialog opens.
2. In the Resource label field, enter a unique resource label.
3. In the Resource identifier field, enter a unique resource identifier.
4. From the Resource type field, select a resource type from the list.
5. In the Default field, enter the default resource request limit. This value must be equal to or between the minimum and maximum limits.
6. In the Minimum allowed field, enter the minimum number of resources that users can request.
7. In the Maximum allowed field, enter the maximum number of resources that users can request:
  1. To set a specific maximum request limit, click the Set maximum limit radio button and enter a value.
  2. To set no maximum request limit, click the No maximum limit radio button.
8. Click Add.
Optional: Add a node selector to schedule pods on nodes with matching labels.
1. Click Add node selector.
  
  The Add node selector dialog opens.
2. In the Key field, enter a node selection key. The key must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
3. In the Value field, enter a node selection value. The value must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
4. Click Add.
Optional: Add a toleration to schedule pods with matching taints.
1. Click Add toleration.
  
  The Add toleration dialog opens.
2. From the Operator list, select one of the following options:
  - Equal - The key/value/effect parameters must match. This is the default.
  - Exists - The key/effect parameters must match. You must leave a blank value parameter, which matches any.
3. From the Effect list, select one of the following options:
  - None
  - NoSchedule - New pods that do not match the taint are not scheduled onto that node. Existing pods on the node remain.
  - PreferNoSchedule - New pods that do not match the taint might be scheduled onto that node, but the scheduler tries not to. Existing pods on the node remain.
  - NoExecute - New pods that do not match the taint cannot be scheduled onto that node. Existing pods on the node that do not have a matching toleration are removed.
4. In the Key field, enter a toleration key. The key must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
5. In the Value field, enter a toleration value. The value must begin with a letter or number, and may contain letters, numbers, hyphens, dots, and underscores.
6. In the Toleration Seconds section, select one of the following options to specify how long a pod stays bound to a node that has a node condition:
  - Forever - Pods stays permanently bound to a node.
  - Custom value - Enter a value, in seconds, to define how long pods stay bound to a node that has a node condition.
7. Click Add.
Click Create hardware profile.

Verification

The hardware profile appears on the Hardware profiles page.
The hardware profile appears in the Hardware profiles list on the Create workbench page.
The hardware profile appears on the Instances tab on the details page for the HardwareProfile custom resource definition (CRD).

Additional resources

Updating a hardware profile

You can update the existing hardware profiles in your deployment. You can change important identifying information, such as the display name, the identifier, or the description.

Prerequisites

You have logged in to Open Data Hub as a user with Open Data Hub administrator privileges.
The hardware profile exists in your deployment.

Procedure

From the Open Data Hub dashboard, click Settings → Hardware profiles.

The Hardware profiles page appears. Existing hardware profiles are displayed. To enable or disable a hardware profile, on the row containing the relevant hardware profile, click the toggle in the Enabled column.
Click the action menu (⋮) and select Edit from the list.

The Edit hardware profile dialog opens.
Make your changes.
Click Update hardware profile.

Verification

If your hardware profile has new identifying information, this information appears in the Hardware profile list on the Create workbench page.

Additional resources

Deleting a hardware profile

To discard hardware profiles that you no longer require, you can delete them so that they do not appear on the dashboard.

Prerequisites

You have logged in to Open Data Hub as a user with Open Data Hub administrator privileges.
The hardware profile that you want to delete exists in your deployment.

Procedure

From the Open Data Hub dashboard, click Settings → Hardware profiles.

The Hardware profiles page appears, displaying existing hardware profiles.
Click the action menu (⋮) beside the hardware profile that you want to delete and click Delete.

The Delete hardware profile dialog opens.
Enter the name of the hardware profile in the text field to confirm that you intend to delete it.
Click Delete.

Verification

The hardware profile no longer appears on the Hardware profiles page.

Additional resources

About GPU time slicing

GPU time slicing enables multiple workloads to share a single physical GPU by dividing processing time in short, alternating time slots. This method improves resource utilization, reduces idle GPU time, and allows multiple users to run AI/ML workloads concurrently in Open Data Hub. The NVIDIA GPU Operator manages this scheduling based on a time-slicing-config ConfigMap that defines the number of GPU slices for each physical GPU.

Time-slicing differs from Multi-Instance GPU (MIG) partitioning. While MIG provides memory and fault isolation, time-slicing shares the same GPU memory across workloads without strict isolation. Time-slicing is ideal for lightweight inference tasks, data preprocessing, and other scenarios where full GPU isolation is unnecessary.

Consider the following points when using GPU time slicing:

Memory sharing: All workloads share GPU memory. High memory usage by one workload can impact others.
Performance trade-offs: While time slicing allows multiple workloads to share a GPU, it does not provide strict resource isolation like MIG.
GPU compatibility: Time slicing is supported on specific NVIDIA GPUs.

Additional resources

Enabling GPU time slicing

To enable GPU time slicing in Open Data Hub, you must configure the NVIDIA GPU Operator to allow multiple workloads to share a single GPU.

Prerequisites

You have logged in to OpenShift Container Platform.
You have the cluster-admin role in OpenShift Container Platform.
You have installed and configured the NVIDIA GPU Operator.
The relevant nodes in your deployment contain NVIDIA GPUs.
The GPU in your deployment supports time slicing.
You installed the OpenShift command line interface (oc) as described in Installing the OpenShift CLI.

Procedure

Create a config map named time-slicing-config in the namespace that is used by the GPU operator. For NVIDIA GPUs, this is the nvidia-gpu-operator namespace.

Log in to the OpenShift Container Platform web console as a cluster administrator.
In the Administrator perspective, navigate to Workloads → ConfigMaps.
On the ConfigMap details page, click the Create Config Map button.
On the Create Config Map page, for Configure via, select YAML view.

In the Data field, enter the YAML code for the relevant GPU. Here is an example of a time-slicing-config config map for an NVIDIA T4 GPU:

Note	You can change the number of replicas to control the number of GPU slices available for each physical GPU. Increasing replicas might increase the risk of Out of Memory (OOM) errors if workloads exceed available GPU memory.

apiVersion: v1
kind: ConfigMap
metadata:
  name: time-slicing-config
data:
  tesla-t4: |-
    version: v1
    flags:
      migStrategy: none
    sharing:
      timeSlicing:
        renameByDefault: false
        failRequestsGreaterThanOne: false
        resources:
          - name: nvidia.com/gpu
            replicas: 4

Click Create.

Update the gpu-cluster-policy cluster policy to reference the time-slicing-config config map:
1. In the Administrator perspective, navigate to Operators → Installed Operators.
2. Search for the NVIDIA GPU Operator, and then click the Operator name to open the Operator details page.
3. Click the ClusterPolicy tab.
4. Select the gpu-cluster-policy resource from the list to open the ClusterPolicy details page.
5. Click the YAML tab and update the spec.devicePlugin section to reference the time-slicing-config config map. Here is an example of a gpu-cluster-policy cluster policy for an NVIDIA T4 GPU:
  apiVersion: nvidia.com/v1 kind: ClusterPolicy metadata: name: gpu-cluster-policy spec: devicePlugin: config: default: tesla-t4 name: time-slicing-config
6. Click Save.
Label the relevant machine set to apply time slicing:
1. In the Administrator perspective, navigate to Compute → Machine Sets.
2. Select the machine set for GPU time slicing from the list.
3. On the MachineSet details page, click the YAML tab and update the spec.template.spec.metadata.labels section to label the relevant machine set. Here is an example of a machine set with the appropriate machine label for an NVIDIA T4 GPU:
  spec: template: spec: metadata: labels: nvidia.com/device-plugin.config: tesla-t4
4. Click Save.

Verification

Verify that you have applied the config map correctly:

oc get configmap time-slicing-config -n nvidia-gpu-operator -o yaml

Check that the cluster policy includes the time-slicing configuration:
```
oc get clusterpolicy gpu-cluster-policy -o yaml
```

Ensure that the label is applied to nodes:

oc get nodes --show-labels | grep nvidia.com/device-plugin.config

Note	If workloads do not appear to be sharing the GPU, verify that the NVIDIA device plugin is running and that the correct labels are applied.

QUICK LINKS

STAY IN TOUCH