Info alert:Important Notice

Please note that more information about the previous v2 releases can be found here. You can use "Find a release" search bar to search for a particular release.

Working on data science projects

Table of Contents

Using data science projects
Using project workbenches
Using connections
Configuring cluster storage
Managing access to data science projects
Creating project-scoped resources for your project

As a data scientist, you can organize your data science work into a single project. A data science project in Open Data Hub can consist of the following components:

Workbenches: Creating a workbench allows you to work with models in your preferred IDE, such as JupyterLab.
Cluster storage: For data science projects that require data retention, you can add cluster storage to the project.
Connections: Adding a connection to your project allows you to connect data sources and sinks to your workbenches.
Pipelines: Standardize and automate machine learning workflows to enable you to further enhance and deploy your data science models.
Models and model servers: Deploy a trained data science model to serve intelligent applications. Your model is deployed with an endpoint that allows applications to send requests to the model.
Bias metrics for models: Creating bias metrics allows you to monitor your machine learning models for bias.

Using data science projects

Creating a data science project

To implement a data science workflow, you must create a project. In OpenShift, a project is a Kubernetes namespace with additional annotations, and is the main way that you can manage user access to resources. A project organizes your data science work in one place and also allows you to collaborate with other developers and data scientists in your organization.

Within a project, you can add the following functionality:

Connections so that you can access data without having to hardcode information like endpoints or credentials.
Workbenches for working with and processing data, and for developing models.
Deployed models so that you can test them and then integrate them into intelligent applications. Deploying a model makes it available as a service that you can access by using an API.
Pipelines for automating your ML workflow.

Prerequisites

You have logged in to Open Data Hub.
You have the appropriate roles and permissions to create projects.

Procedure

From the Open Data Hub dashboard, select Data science projects.

The Data science projects page shows a list of projects that you can access. For each user-requested project in the list, the Name column shows the project display name, the user who requested the project, and the project description.
Click Create project.
In the Create project dialog, update the Name field to enter a unique display name for your project.
Optional: If you want to change the default resource name for your project, click Edit resource name.

The resource name is what your resource is labeled in OpenShift. Valid characters include lowercase letters, numbers, and hyphens (-). The resource name cannot exceed 30 characters, and it must start with a letter and end with a letter or number.

Note: You cannot change the resource name after the project is created. You can edit only the display name and the description.
Optional: In the Description field, provide a project description.
Click Create.

Verification

A project details page opens. From this page, you can add connections, create workbenches, configure pipelines, and deploy models.

Updating a data science project

You can update the project details by changing the project name and description.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the action menu (⋮) beside the project whose details you want to update and click Edit project.

The Edit project dialog opens.
Optional: Edit the Name field to change the display name for your project.
Optional: Edit the Description field to change the description of your project.
Click Update.

Verification

You can see the updated project details on the Data science projects page.

Deleting a data science project

You can delete data science projects so that they do not appear on the Open Data Hub Data science projects page when you no longer want to use them.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the action menu (⋮) beside the project that you want to delete and then click Delete project.

The Delete project dialog opens.
Enter the project name in the text field to confirm that you intend to delete it.
Click Delete project.

Verification

The data science project that you deleted is no longer displayed on the Data science projects page.
Deleting a data science project deletes any associated workbenches, data science pipelines, cluster storage, and connections. This data is permanently deleted and is not recoverable.

Using project workbenches

Creating a workbench and selecting an IDE

A workbench is an isolated area where you can examine and work with ML models. You can also work with data and run programs, for example to prepare and clean data. While a workbench is not required if, for example, you only want to service an existing model, one is needed for most data science workflow tasks, such as writing code to process data or training a model.

When you create a workbench, you specify an image (an IDE, packages, and other dependencies). IDEs include JupyterLab and code-server.

The IDEs are based on a server-client architecture. Each IDE provides a server that runs in a container on the OpenShift cluster, while the user interface (the client) is displayed in your web browser. For example, the Jupyter workbench runs in a container on the Red Hat OpenShift cluster. The client is the JupyterLab interface that opens in your web browser on your local computer. All of the commands that you enter in JupyterLab are executed by the workbench. Similarly, other IDEs like code-server or RStudio Server provide a server that runs in a container on the OpenShift cluster, while the user interface is displayed in your web browser. This architecture allows you to interact through your local computer in a browser environment, while all processing occurs on the cluster. The cluster provides the benefits of larger available resources and security because the data being processed never leaves the cluster.

In a workbench, you can also configure connections (to access external data for training models and to save models so that you can deploy them) and cluster storage (for persisting data). Workbenches within the same project can share models and data through object storage with the data science pipelines and model servers.

For data science projects that require data retention, you can add container storage to the workbench you are creating.

Within a project, you can create multiple workbenches. When to create a new workbench depends on considerations, such as the following:

The workbench configuration (for example, CPU, RAM, or IDE). If you want to avoid editing the configuration of an existing workbench’s configuration to accommodate a new task, you can create a new workbench instead.
Separation of tasks or activities. For example, you might want to use one workbench for your Large Language Models (LLM) experimentation activities, another workbench dedicated to a demo, and another workbench for testing.

About workbench images

A workbench image is preinstalled with tools and libraries for model development. You can use the provided images, or an Open Data Hub administrator can create custom images tailored to your needs.

To provide a consistent, stable platform for your model development, many provided workbench images contain the same version of Python. Most workbench images available on Open Data Hub are pre-built and ready for you to use immediately after Open Data Hub is installed or upgraded.

The following table lists the workbench images that are installed with Open Data Hub by default.

If the preinstalled packages that are provided in these images are not sufficient for your use case, you have the following options:

Install additional libraries after launching a default image. This option is good if you want to add libraries on an ad hoc basis as you develop models. However, it can be challenging to manage the dependencies of installed libraries and your changes are not saved when the workbench restarts.
Create a custom image that includes the additional libraries or packages. For more information, see Creating custom workbench images.

Table 1. Default workbench images
Image name	Description
CUDA	If you are working with compute-intensive data science models that require GPU support, use the Compute Unified Device Architecture (CUDA) workbench image to gain access to the NVIDIA CUDA Toolkit. Using this toolkit, you can optimize your work by using GPU-accelerated libraries and optimization tools.
Standard Data Science	Use the Standard Data Science workbench image for models that do not require TensorFlow or PyTorch. This image contains commonly-used libraries to assist you in developing your machine learning models.
TensorFlow	TensorFlow is an open source platform for machine learning. With TensorFlow, you can build, train and deploy your machine learning models. TensorFlow contains advanced data visualization features, such as computational graph visualizations. It also allows you to easily monitor and track the progress of your models.
PyTorch	PyTorch is an open source machine learning library optimized for deep learning. If you are working with computer vision or natural language processing models, use the Pytorch workbench image.
Minimal Python	If you do not require advanced machine learning features, or additional resources for compute-intensive data science work, you can use the Minimal Python image to develop your models.
TrustyAI	Use the TrustyAI workbench image to leverage your data science work with model explainability, tracing, and accountability, and runtime monitoring. See the TrustyAI Explainability repository for some example Jupyter notebooks.
code-server	With the code-server workbench image, you can customize your workbench environment to meet your needs using a variety of extensions to add new languages, themes, debuggers, and connect to additional services. Enhance the efficiency of your data science work with syntax highlighting, auto-indentation, and bracket matching, as well as an automatic task runner for seamless automation. For more information, see code-server in GitHub. NOTE: Elyra-based pipelines are not available with the code-server workbench image.
RStudio Server	Use the RStudio Server workbench image to access the RStudio IDE, an integrated development environment for R, a programming language for statistical computing and graphics. For more information, see the RStudio Server site.
CUDA - RStudio Server	Use the CUDA - RStudio Server workbench image to access the RStudio IDE and NVIDIA CUDA Toolkit. RStudio is an integrated development environment for R, a programming language for statistical computing and graphics. With the NVIDIA CUDA toolkit, you can optimize your work using GPU-accelerated libraries and optimization tools. For more information, see the RStudio Server site.
ROCm	Use the ROCm workbench image to run AI and machine learning workloads on AMD GPUs in Open Data Hub. It includes ROCm libraries and tools optimized for high-performance GPU acceleration, supporting custom AI workflows and data processing tasks. Use this image integrating additional frameworks or dependencies tailored to your specific AI development needs.
ROCm-PyTorch	Use the ROCm-PyTorch workbench image to run PyTorch workloads on AMD GPUs in Open Data Hub. It includes ROCm-accelerated PyTorch libraries, enabling efficient deep learning training, inference, and experimentation. This image is designed for data scientists working with PyTorch-based workflows, offering integration with GPU scheduling.
ROCm-TensorFlow	Use the ROCm-TensorFlow workbench image to run TensorFlow workloads on AMD GPUs in Open Data Hub. It includes ROCm-accelerated TensorFlow libraries to support high-performance deep learning model training and inference. This image simplifies TensorFlow development on AMD GPUs and integrates with Open Data Hub for resource scaling and management.

Creating a workbench

When you create a workbench, you specify an image (an IDE, packages, and other dependencies). You can also configure connections, cluster storage, and add container storage.

Prerequisites

You have logged in to Open Data Hub.
You have created a project.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project that you want to add the workbench to.

A project details page opens.
Click the Workbenches tab.
Click Create workbench.

The Create workbench page opens.
In the Name field, enter a unique name for your workbench.
Optional: If you want to change the default resource name for your workbench, click Edit resource name.

The resource name is what your resource is labeled in OpenShift. Valid characters include lowercase letters, numbers, and hyphens (-). The resource name cannot exceed 30 characters, and it must start with a letter and end with a letter or number.

Note: You cannot change the resource name after the workbench is created. You can edit only the display name and the description.
Optional: In the Description field, enter a description for your workbench.
In the Workbench image section, complete the fields to specify the workbench image to use with your workbench.

From the Image selection list, select a workbench image that suits your use case. A workbench image includes an IDE and Python packages (reusable code). If project-scoped images exist, the Image selection list includes subheadings to distinguish between global images and project-scoped images.

Optionally, click View package information to view a list of packages that are included in the image that you selected.

If the workbench image has multiple versions available, select the workbench image version to use from the Version selection list. To use the latest package versions, Red Hat recommends that you use the most recently added image.

Note
You can change the workbench image after you create the workbench.

In the Deployment size section, select one of the following options, depending on whether the hardware profiles feature is enabled.

If the hardware profiles feature is not enabled:
1. From the Container size list, select the appropriate size for the size of the model that you want to train or tune.
  
  For example, to run the example fine-tuning job described in Fine-tuning a model by using Kubeflow Training, select Medium.
2. From the Accelerator list, select a suitable accelerator profile for your workbench.
  
  If project-scoped accelerator profiles exist, the Accelerator list includes subheadings to distinguish between global accelerator profiles and project-scoped accelerator profiles.

If the hardware profiles feature is enabled:

From the Hardware profile list, select a suitable hardware profile for your workbench.

If project-scoped hardware profiles exist, the Hardware profile list includes subheadings to distinguish between global hardware profiles and project-scoped hardware profiles.

The hardware profile specifies the number of CPUs and the amount of memory allocated to the container, setting the guaranteed minimum (request) and maximum (limit) for both.

If you want to change the default values, click Customize resource requests and limit and enter new minimum (request) and maximum (limit) values.

Important

By default, the hardware profiles feature is not enabled: hardware profiles are not shown in the dashboard navigation menu or elsewhere in the user interface. In addition, user interface components associated with the deprecated accelerator profiles functionality are still displayed. To show the Settings → Hardware profiles option in the dashboard navigation menu, and the user interface components associated with hardware profiles, set the disableHardwareProfiles value to false in the OdhDashboardConfig custom resource (CR) in OpenShift Container Platform. For more information about setting dashboard configuration options, see Customizing the dashboard.

Optional: In the Environment variables section, select and specify values for any environment variables.

Setting environment variables during the workbench configuration helps you save time later because you do not need to define them in the body of your workbenches, or with the IDE command line interface.

If you are using S3-compatible storage, add these recommended environment variables:
- AWS_ACCESS_KEY_ID specifies your Access Key ID for Amazon Web Services.
- AWS_SECRET_ACCESS_KEY specifies your Secret access key for the account specified in AWS_ACCESS_KEY_ID.
Open Data Hub stores the credentials as Kubernetes secrets in a protected namespace if you select Secret when you add the variable.
In the Cluster storage section, configure the storage for your workbench. Select one of the following options:
- Create new persistent storage to create storage that is retained after you shut down your workbench. Complete the relevant fields to define the storage:
  1. Enter a name for the cluster storage.
  2. Enter a description for the cluster storage.
  3. Select a storage class for the cluster storage.
    
    Note
    You cannot change the storage class after you add the cluster storage to the workbench.
  4. For storage classes that support multiple access modes, select an Access mode to define how the volume can be accessed. For more information, see About persistent storage.
    
    Only the access modes that have been enabled for the storage class by your cluster and Open Data Hub administrators are visible.
  5. Under Persistent storage size, enter a new size in gibibytes or mebibytes.
- Use existing persistent storage to reuse existing storage and select the storage from the Persistent storage list.
Optional: You can add a connection to your workbench. A connection is a resource that contains the configuration parameters needed to connect to a data source or sink, such as an object storage bucket. You can use storage buckets for storing data, models, and pipeline artifacts. You can also use a connection to specify the location of a model that you want to deploy.

In the Connections section, use an existing connection or create a new connection:
- Use an existing connection as follows:
  
  Click Attach existing connections.
  
  From the Connection list, select a connection that you previously defined.
- Create a new connection as follows:
  
  Click Create connection. The Add connection dialog opens.
  
  From the Connection type drop-down list, select the type of connection. The Connection details section is displayed.
  
  If you selected S3 compatible object storage in the preceding step, configure the connection details:
  
  In the Connection name field, enter a unique name for the connection.
  
  Optional: In the Description field, enter a description for the connection.
  
  In the Access key field, enter the access key ID for the S3-compatible object storage provider.
  
  In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.
  
  In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.
  
  In the Region field, enter the default region of your S3-compatible object storage account.
  
  In the Bucket field, enter the name of your S3-compatible object storage bucket.
  
  Click Create.
  
  If you selected URI in the preceding step, configure the connection details:
  
  In the Connection name field, enter a unique name for the connection.
  
  Optional: In the Description field, enter a description for the connection.
  
  In the URI field, enter the Uniform Resource Identifier (URI).
  
  Click Create.
Click Create workbench.

Verification

The workbench that you created is visible on the Workbenches tab for the project.
Any cluster storage that you associated with the workbench during the creation process is displayed on the Cluster storage tab for the project.
The Status column on the Workbenches tab displays a status of Starting when the workbench server is starting, and Running when the workbench has successfully started.
Optional: Click the open icon () to open the IDE in a new window.

Starting a workbench

You can manually start a data science project’s workbench from the Workbenches tab on the project details page. By default, workbenches start immediately after you create them.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project that contains a workbench.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project whose workbench you want to start.

A project details page opens.
Click the Workbenches tab.
In the Status column for the workbench that you want to start, click Start.

The Status column changes from Stopped to Starting when the workbench server is starting, and then to Running when the workbench has successfully started. * Optional: Click the open icon () to open the IDE in a new window.

Verification

The workbench that you started is displayed on the Workbenches tab for the project, with the status of Running.

Updating a project workbench

If your data science work requires you to change your workbench image, container size, or identifying information, you can update the properties of your project’s workbench. If you require extra power for use with large datasets, you can assign accelerators to your workbench to optimize performance.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project that has a workbench.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project whose workbench you want to update.

A project details page opens.
Click the Workbenches tab.
Click the action menu (⋮) beside the workbench that you want to update and then click Edit workbench.

The Edit workbench page opens.
Update any of the workbench properties and then click Update workbench.

Verification

The workbench that you updated is displayed on the Workbenches tab for the project.

Deleting a workbench from a data science project

You can delete workbenches from your data science projects to help you remove Jupyter notebooks that are no longer relevant to your work.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project with a workbench.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project that you want to delete the workbench from.

A project details page opens.
Click the Workbenches tab.
Click the action menu (⋮) beside the workbench that you want to delete and then click Delete workbench.

The Delete workbench dialog opens.
Enter the name of the workbench in the text field to confirm that you intend to delete it.
Click Delete workbench.

Verification

The workbench that you deleted is no longer displayed on the Workbenches tab for the project.
The custom resource (CR) associated with the workbench’s Jupyter notebook is deleted.

Using connections

Adding a connection to your data science project

You can enhance your data science project by adding a connection that contains the configuration parameters needed to connect to a data source or sink.

When you want to work with a very large data sets, you can store your data in an Open Container Initiative (OCI)-compliant registry, S3-compatible object storage bucket, or a URI-based repository, so that you do not fill up your local storage. You also have the option of associating the connection with an existing workbench that does not already have a connection.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project that you can add a connection to.
You have access to S3-compatible object storage, URI-based repository, or OCI-compliant registry.
If you intend to add the connection to an existing workbench, you have saved any data in the workbench to avoid losing work.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project that you want to add a connection to.

A project details page opens.
Click the Connections tab.
Click Add connection.
In the Add connection modal, select a Connection type. The OCI-compliant registry, S3 compatible object storage, and URI options are pre-installed connection types. Additional options might be available if your Open Data Hub administrator added them.

The Add connection form opens with fields specific to the connection type that you selected.
Enter a unique name for the connection.

A resource name is generated based on the name of the connection. A resource name is the label for the underlying resource in OpenShift.
Optional: Edit the default resource name. Note that you cannot change the resource name after you create the connection.
Optional: Provide a description of the connection.

Complete the form depending on the connection type that you selected. For example:

If you selected S3 compatible object storage as the connection type, configure the connection details:

In the Access key field, enter the access key ID for the S3-compatible object storage provider.
In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.

In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.

Note	Make sure to use the appropriate endpoint format. Improper formatting might cause connection errors or restrict access to storage resources. For more information about how to format object storage endpoints, see Overview of object storage endpoints.

In the Region field, enter the default region of your S3-compatible object storage account.
In the Bucket field, enter the name of your S3-compatible object storage bucket.
Click Create.

If you selected URI in the preceding step, in the URI field, enter the Uniform Resource Identifier (URI).
If you selected OCI-compliant registry in the preceding step, in the OCI storage location field, enter the URI.

Click Add connection.

Verification

The connection that you added is displayed on the Connections tab for the project.

Updating a connection

You can edit the configuration of an existing connection as described in this procedure.

Note	Any changes that you make to a connection are not applied to dependent resources (for example, a workbench) until those resources are restarted, redeployed, or otherwise regenerated.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project, created a workbench, and you have defined a connection.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project that contains the connection that you want to change.

A project details page opens.
Click the Connections tab.
Click the action menu (⋮) beside the connection that you want to change and then click Edit.

The Edit connection form opens.
Make your changes.
Click Save.

Verification

The updated connection is displayed on the Connections tab for the project.

Deleting a connection

You can delete connections that are no longer relevant to your data science project.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project with a connection.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project that you want to delete the connection from.

A project details page opens.
Click the Connections tab.
Click the action menu (⋮) beside the connection that you want to delete and then click Delete connection.

The Delete connection dialog opens.
Enter the name of the connection in the text field to confirm that you intend to delete it.
Click Delete connection.

Verification

The connection that you deleted is no longer displayed on the Connections page for the project.

Configuring cluster storage

About persistent storage

Open Data Hub uses persistent storage to support workbenches, project data, and model training.

Persistent storage is provisioned through OpenShift Container Platform storage classes and persistent volumes. Volume provisioning and data access are determined by access modes.

Understanding storage classes and access modes can help you choose the right storage for your use case and avoid potential risks when sharing data across multiple workbenches.

Storage classes in Open Data Hub

Storage classes in Open Data Hub are available from the underlying OpenShift cluster. A storage class defines how persistent volumes are provisioned, including which storage backend is used and what access modes the provisioned volumes can support. For more information, see Dynamic provisioning in the OpenShift documentation.

Cluster administrators create and configure storage classes in the OpenShift cluster. These storage classes provision persistent volumes that support one or more access modes, depending on the capabilities of the storage backend. Open Data Hub administrators then enable specific storage classes and access modes for use in Open Data Hub.

When adding cluster storage to your project or workbench, you can choose from any enabled storage classes and access modes.

Access modes

Storage classes create persistent volumes that can support different access modes, depending on the storage backend. Access modes control how a volume can be mounted and used by one or more workbenches. If a storage class allows more than one access mode, you can select the one that best fits your needs when you request storage. All persistent volumes support ReadWriteOnce (RWO) by default.

Access mode Description

Access mode	Description
`ReadWriteOnce (RWO)` (Default)	The storage can be attached to a single workbench or pod at a time and is ideal for most individual workloads. `RWO` is always enabled by default and cannot be disabled by the administrator.
`ReadWriteMany (RWX)`	The storage can be attached to many workbenches simultaneously. `RWX` enables shared data access, but can introduce data risks.
`ReadOnlyMany (ROX)`	The storage can be attached to many workbenches as read-only. `ROX` is useful for sharing reference data without allowing changes.
`ReadWriteOncePod (RWOP)`	The storage can be attached to a single pod on a single node with read-write permissions. `RWOP` is similar to `RWO` but includes additional node-level restrictions.

ReadWriteOnce (RWO) (Default)

The storage can be attached to a single workbench or pod at a time and is ideal for most individual workloads. RWO is always enabled by default and cannot be disabled by the administrator.

ReadWriteMany (RWX)

The storage can be attached to many workbenches simultaneously. RWX enables shared data access, but can introduce data risks.

ReadOnlyMany (ROX)

The storage can be attached to many workbenches as read-only. ROX is useful for sharing reference data without allowing changes.

ReadWriteOncePod (RWOP)

The storage can be attached to a single pod on a single node with read-write permissions. RWOP is similar to RWO but includes additional node-level restrictions.

Note	You can enable access modes that are required. A warning is displayed if you request an access mode with unknown support, but you can continue to select Save to create the storage class with the selected access mode.

Using shared storage (RWX)

The ReadWriteMany (RWX) access mode allows multiple workbenches to access and write to the same storage volume at the same time. Use RWX access mode for collaborative work where multiple users need to access shared datasets or project files.

However, shared storage introduces several risks:

Data corruption or data loss: If multiple workbenches modify the same part of a file simultaneously, the data can become corrupted or lost. Ensure your applications or workflows are designed to safely handle shared access, for example, by using file locking or database transactions.
Security and privacy: If a workbench with access to shared storage is compromised, all data on that volume might be at risk. Only share sensitive data with trusted workbenches and users.

To use shared storage safely:

Ensure that your tools or workflows are designed to work with shared storage and can manage simultaneous writes. For example, use databases or distributed data processing frameworks.
Be cautious with changes. Deleting or editing files affects everyone who shares the volume.
Back up your data regularly, which can help prevent data loss due to mistakes or misconfigurations.
Limit access to RWX volumes to trusted users and secure workbenches.
Use ReadWriteMany (RWX) only when collaboration on a shared volume is required. For most individual tasks, ReadWriteOnce (RWO) is ideal because only one workbench can write to the volume at a time.

Adding cluster storage to your data science project

For data science projects that require data to be retained, you can add cluster storage to the project. Additionally, you can also connect cluster storage to a specific project’s workbench.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project that you can add cluster storage to.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project that you want to add the cluster storage to.

A project details page opens.
Click the Cluster storage tab.
Click Add cluster storage.

The Add cluster storage dialog opens.
In the Name field, enter a unique name for the cluster storage.
Optional: In the Description field, enter a description for the cluster storage.
From the Storage class list, select the type of cluster storage.

Note
You cannot change the storage class after you add the cluster storage to the project.
For storage classes that support multiple access modes, select an Access mode to define how the volume can be accessed. For more information, see About persistent storage.

Only the access modes that have been enabled for the storage class by your cluster and Open Data Hub administrators are visible.
In the Persistent storage size section, specify a new size in gibibytes or mebibytes.
Optional: If you want to connect the cluster storage to an existing workbench:
1. In the Workbench connections section, click Add workbench.
2. In the Name field, select an existing workbench from the list.
3. In the Path format field, select Standard if your storage directory begins with /opt/app-root/src, otherwise select Custom.
4. In the Mount path field, enter the path to a model or directory within a container where a volume is mounted and accessible. The path must consist of lowercase alphanumeric characters or -. Use / to indicate subdirectories.
Click Add storage.

Verification

The cluster storage that you added is displayed on the Cluster storage tab for the project.
A new persistent volume claim (PVC) is created with the storage size that you defined.
The persistent volume claim (PVC) is visible as an attached storage on the Workbenches tab for the project.

Updating cluster storage

If your data science work requires you to change the identifying information of a project’s cluster storage or the workbench that the storage is connected to, you can update your project’s cluster storage to change these properties.

Note

You cannot directly change the storage class for cluster storage that is already configured for a workbench or project. To switch to a different storage class, you need to migrate your data to a new cluster storage instance that uses the required storage class. For more information, see Changing the storage class for an existing cluster storage instance.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project that contains cluster storage.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project whose storage you want to update.

A project details page opens.
Click the Cluster storage tab.
Click the action menu (⋮) beside the storage that you want to update and then click Edit storage.

The Update cluster storage page opens.
Optional: Edit the Name field to change the display name for your storage.
Optional: Edit the Description field to change the description of your storage.
Optional: In the Persistent storage size section, specify a new size in gibibytes or mebibytes.

Note that you can only increase the storage size. Updating the storage size restarts the workbench and makes it unavailable for a period of time that is usually proportional to the size change.
Optional: If you want to connect the cluster storage to a different workbench:
1. In the Workbench connections section, click Add workbench.
2. In the Name field, select an existing workbench from the list.
3. In the Path format field, select Standard if your storage directory begins with /opt/app-root/src, otherwise select Custom.
4. In the Mount path field, enter the path to a model or directory within a container where a volume is mounted and accessible. The path must consist of lowercase alphanumeric characters or -. Use / to indicate subdirectories.
Click Update storage.

If you increased the storage size, the workbench restarts and is unavailable for a period of time that is usually proportional to the size change.

Verification

The storage that you updated is displayed on the Cluster storage tab for the project.

Changing the storage class for an existing cluster storage instance

When you create a workbench with cluster storage, the cluster storage is tied to a specific storage class. Later, if your data science work requires a different storage class, or if the current storage class has been deprecated, you cannot directly change the storage class on the existing cluster storage instance. Instead, you must migrate your data to a new cluster storage instance that uses the storage class that you want to use.

Prerequisites

You have logged in to Open Data Hub.
You have created a workbench or data science project that contains cluster storage.

Procedure

Stop the workbench with the storage class that you want to change.
1. From the Open Data Hub dashboard, click Data science projects.
  
  The Data science projects page opens.
2. Click the name of the project with the cluster storage instance that uses the storage class you want to change.
  
  The project details page opens.
3. Click the Workbenches tab.
4. In the Status column for the relevant workbench, click Stop.
  
  Wait until the Status column for the relevant workbench changes from Running to Stopped.
Add a new cluster storage instance that uses the needed storage class.
1. Click the Cluster storage tab.
2. Click Add cluster storage.
  
  The Add cluster storage dialog opens.
3. Enter a name for the cluster storage.
4. Optional: Enter a description for the cluster storage.
5. Select the needed storage class for the cluster storage.
6. For storage classes that support multiple access modes, select an Access mode to define how the volume can be accessed. For more information, see About persistent storage.
  
  Only the access modes that have been enabled for the storage class by your cluster and Open Data Hub administrators are visible.
7. Under Persistent storage size, enter a size in gibibytes or mebibytes.
8. In the Workbench connections section, click Add workbench.
9. In the Name field, select an existing workbench from the list.
10. In the Path format field, select Standard if your storage directory begins with /opt/app-root/src, otherwise select Custom.
11. In the Mount path field, enter the path to a model or directory within a container where a volume is mounted and accessible. For example, backup.
12. Click Add storage.
Copy the data from the existing cluster storage instance to the new cluster storage instance.
1. Click the Workbenches tab.
2. In the Status column for the relevant workbench, click Start.
3. When the workbench status is Running, click Open to open the workbench.
4. In JupyterLab, click File → New → Terminal.
5. Copy the data to the new storage directory. Replace <mount_folder_name> with the storage directory of your new cluster storage instance.
  rsync -avO --exclude='/opt/app-root/src/__<mount_folder_name>__' /opt/app-root/src/ /opt/app-root/src/__<mount_folder_name>__/
  For example:
  rsync -avO --exclude='/opt/app-root/src/backup' /opt/app-root/src/ /opt/app-root/src/backup/
6. After the data has finished copying, log out of JupyterLab.
Stop the workbench.
1. Click the Workbenches tab.
2. In the Status column for the relevant workbench, click Stop.
  
  Wait until the Status column for the relevant workbench changes from Running to Stopped.
Remove the original cluster storage instance from the workbench.
1. Click the Cluster storage tab.
2. Click the action menu (⋮) beside the existing cluster storage instance, and then click Edit storage.
3. Under Existing connected workbenches, remove the workbench.
4. Click Update.
Update the mount folder of the new cluster storage instance by removing it and re-adding it to the workbench.
1. On the Cluster storage tab, click the action menu (⋮) beside the new cluster storage instance, and then click Edit storage.
2. Under Existing connected workbenches, remove the workbench.
3. Click Update.
4. Click the Workbenches tab.
5. Click the action menu (⋮) beside the workbench and then click Edit workbench.
6. In the Cluster storage section, under Use existing persistent storage, select the new cluster storage instance.
7. Click Update workbench.
Restart the workbench.
1. Click the Workbenches tab.
2. In the Status column for the relevant workbench, click Start.
Optional: The initial cluster storage that uses the previous storage class is still visible on the Cluster storage tab. If you no longer need this cluster storage (for example, if the storage class is deprecated), you can delete it.
Optional: You can delete the mount folder of your new cluster storage instance (for example, the backup folder).

Verification

On the Cluster storage tab for the project, the new cluster storage instance is displayed with the needed storage class in the Storage class column and the relevant workbench in the Connected workbenches column.
On the Workbenches tab for the project, the new cluster storage instance is displayed for the workbench in the Cluster storage section and has the mount path: /opt/app-root/src.

Deleting cluster storage from a data science project

You can delete cluster storage from your data science projects to help you free up resources and delete unwanted storage space.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project with cluster storage.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project that you want to delete the storage from.

A project details page opens.
Click the Cluster storage tab.
Click the action menu (⋮) beside the storage that you want to delete and then click Delete storage.

The Delete storage dialog opens.
Enter the name of the storage in the text field to confirm that you intend to delete it.
Click Delete storage.

Verification

The storage that you deleted is no longer displayed on the Cluster storage tab for the project.
The persistent volume (PV) and persistent volume claim (PVC) associated with the cluster storage are both permanently deleted. This data is not recoverable.

Managing access to data science projects

Granting access to a data science project

To enable your organization to work collaboratively, you can grant access to your data science project to other users and groups.

Prerequisites

You have logged in to Open Data Hub.
You have created a data science project.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
From the list of data science projects, click the name of the data science project that you want to grant access to.

A project details page opens.
Click the Permissions tab.

The Permissions page for the project opens.
Provide one or more users with access to the project.
1. In the Users section, click Add user.
2. In the Name field, enter the user name of the user whom you want to provide access to the project.
3. From the Permissions list, select one of the following access permission levels:
  - Admin: Users with this access level can edit project details and manage access to the project.
  - Contributor: Users with this access level can view and edit project components, such as its workbenches, connections, and storage.
4. To confirm your entry, click Confirm ().
5. Optional: To add an additional user, click Add user and repeat the process.
Provide one or more OpenShift groups with access to the project.
1. In the Groups section, click Add group.
2. From the Name list, select a group to provide access to the project.
  
  Note
  
  If you do not have cluster-admin permissions, the Name list is not visible. Instead, an input field is displayed enabling you to configure group permissions.
3. From the Permissions list, select one of the following access permission levels:
  - Admin: Groups with this access level can edit project details and manage access to the project.
  - Contributor: Groups with this access level can view and edit project components, such as its workbenches, connections, and storage.
4. To confirm your entry, click Confirm ().
5. Optional: To add an additional group, click Add group and repeat the process.

Verification

Users to whom you provided access to the project can perform only the actions permitted by their access permission level.
The Users and Groups sections on the Permissions tab show the respective users and groups that you granted access to.

Updating access to a data science project

To change the level of collaboration on your data science project, you can update the access permissions of users and groups who have access to your project.

Prerequisites

You have logged in to Open Data Hub.
You have Open Data Hub administrator privileges or you are the project owner.
You have created a data science project.
You have previously shared access to your project with other users or groups.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project that you want to change the access permissions of.

A project details page opens.
Click the Permissions tab.

The Permissions page for the project opens.
Update the user access permissions to the project.
1. Click the action menu (⋮) beside the user whose access permissions you want to update and click Edit.
2. In the Name field, update the user name of the user whom you want to provide access to the project.
3. From the Permissions list, update the user access permissions by selecting one of the following:
  - Admin: Users with this access level can edit project details and manage access to the project.
  - Contributor: Users with this access level can view and edit project components, such as its workbenches, connections, and storage.
4. To confirm the update to the entry, click Confirm ().
Update the OpenShift groups access permissions to the project.
1. Click the action menu (⋮) beside the group whose access permissions you want to update and click Edit.
2. From the Name list, update the group that has access to the project by selecting another group from the list.
  
  Note
  
  If you do not have cluster-admin permissions, the Name list is not visible. Instead, you can configure group permissions in the input field that is displayed.
3. From the Permissions list, update the group access permissions by selecting one of the following:
  - Admin: Groups with this access level can edit project details and manage access to the project.
  - Contributor: Groups with this access level can view and edit project components, such as its workbenches, connections, and storage.
4. To confirm the update to the entry, click Confirm ().

Verification

The Users and Groups sections on the Permissions tab show the respective users and groups whose project access permissions you changed.

Removing access to a data science project

If you no longer want to work collaboratively on your data science project, you can restrict access to your project by removing users and groups that you previously provided access to your project.

Prerequisites

You have logged in to Open Data Hub.
You have Open Data Hub administrator privileges or you are the project owner.
You have created a data science project.
You have previously shared access to your project with other users or groups.

Procedure

From the Open Data Hub dashboard, click Data science projects.

The Data science projects page opens.
Click the name of the project that you want to change the access permissions of.

A project details page opens.
Click the Permissions tab.

The Permissions page for the project opens.
Click the action menu (⋮) beside the user or group whose access permissions you want to revoke and click Delete.

Verification

Users whose access you have revoked can no longer perform the actions that were permitted by their access permission level.

Creating project-scoped resources for your project

As an Open Data Hub user, you can access global resources in all Open Data Hub projects, but you can access project-scoped resources within the specified project only.

As a user with access permissions to a project, you can create the following types of project-scoped resources for your Open Data Hub project:

Workbench images
Model-serving runtimes for KServe

All resource names must be unique within a project.

Note	A user with access permissions to a project can create project-scoped resources for that project, as described in Creating project-scoped resources.

Prerequisites

You can access the OpenShift Container Platform console.
An Open Data Hub administrator has set the disableProjectScoped dashboard configuration option to false, as described in Customizing the dashboard.
You can access a project in the Open Data Hub console.
You have example YAML code for the type of resource that you want to create.

You can get the YAML code from a trusted source, such as an existing project-scoped resource, a Git repository, or documentation. Alternatively, you can contact your cluster administrator to request the relevant YAML code.

Procedure

Log in to the OpenShift Container Platform console.
From a trusted source, copy the YAML code that you want to use for your project resource.

For example, if you can access an existing project-scoped resource in one of your projects, you can copy the YAML code as follows:
1. In the Administrator perspective, click Home → Search.
2. From the Projects list, select the appropriate project.
3. In the Resources list, search for the relevant resource type, as follows:
  - For workbench images, search for ImageStream.
  - For serving runtimes, search for Template. From the resulting list, find the templates that have the objects.kind specification set to ServingRuntime.
4. Select a resource, and then click the YAML tab.
5. Copy the YAML content, and then click Cancel.
From the Project list, select your project name.
From the toolbar, click the + icon to open the Import YAML page.
Paste the example YAML content into the code area.
Edit the metadata.namespace value to specify the name of your project.
If necessary, edit the metadata.name value to ensure that the resource name is unique within the specified project.
Optional: Edit the resource name that is displayed in the Open Data Hub console, as follows:
- For workbench images, edit the metadata.annotations.opendatahub.io/notebook-image-name value.
- For serving runtimes, edit the objects.metadata.annotations.openshift.io/display-name value.
Click Create.

Verification

Log in to the Open Data Hub console.
Verify that the project-scoped resource is shown in the specified project:
- For workbench images, when you create a workbench in the project, as described in Creating a workbench, the workbench image that you added is available in the Image selection list.
- For model-serving runtimes, see Deploying models on the single-model serving platform.

QUICK LINKS

STAY IN TOUCH