Open Data Hub logo

Info alert:Important Notice

Please note that more information about the previous v2 releases can be found here. You can use "Find a release" search bar to search for a particular release.

Working on data science projects

As a data scientist, you can organize your data science work into a single project. A data science project in Open Data Hub can consist of the following components:

Workbenches

Creating a workbench allows you to work with models in your preferred IDE, such as JupyterLab.

Cluster storage

For data science projects that require data retention, you can add cluster storage to the project.

Connections

Adding a connection to your project allows you to connect data sources and sinks to your workbenches.

Pipelines

Standardize and automate machine learning workflows to enable you to further enhance and deploy your data science models.

Models and model servers

Deploy a trained data science model to serve intelligent applications. Your model is deployed with an endpoint that allows applications to send requests to the model.

Bias metrics for models

Creating bias metrics allows you to monitor your machine learning models for bias.

Using data science projects

Creating a data science project

To implement a data science workflow, you must create a project. In OpenShift, a project is a Kubernetes namespace with additional annotations, and is the main way that you can manage user access to resources. A project organizes your data science work in one place and also allows you to collaborate with other developers and data scientists in your organization.

Within a project, you can add the following functionality:

  • Connections so that you can access data without having to hardcode information like endpoints or credentials.

  • Workbenches for working with and processing data, and for developing models.

  • Deployed models so that you can test them and then integrate them into intelligent applications. Deploying a model makes it available as a service that you can access by using an API.

  • Pipelines for automating your ML workflow.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have the appropriate roles and permissions to create projects.

Procedure
  1. From the Open Data Hub dashboard, select Data Science Projects.

    The Data Science Projects page shows a list of projects that you can access. For each user-requested project in the list, the Name column shows the project display name, the user who requested the project, and the project description.

  2. Click Create project.

  3. In the Create project dialog, update the Name field to enter a unique display name for your project.

  4. Optional: If you want to change the default resource name for your project, click Edit resource name.

    The resource name is what your resource is labeled in OpenShift. Valid characters include lowercase letters, numbers, and hyphens (-). The resource name cannot exceed 30 characters, and it must start with a letter and end with a letter or number.

    Note: You cannot change the resource name after the project is created. You can edit only the display name and the description.

  5. Optional: In the Description field, provide a project description.

  6. Click Create.

Verification
  • A project details page opens. From this page, you can add connections, create workbenches, configure pipelines, and deploy models.

Updating a data science project

You can update the project details by changing the project name and description.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the action menu () beside the project whose details you want to update and click Edit project.

    The Edit project dialog opens.

  3. Optional: Edit the Name field to change the display name for your project.

  4. Optional: Edit the Description field to change the description of your project.

  5. Click Update.

Verification
  • You can see the updated project details on the Data Science Projects page.

Deleting a data science project

You can delete data science projects so that they do not appear on the Open Data Hub Data Science Projects page when you no longer want to use them.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users) in OpenShift.

  • You have created a data science project.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the action menu () beside the project that you want to delete and then click Delete project.

    The Delete project dialog opens.

  3. Enter the project name in the text field to confirm that you intend to delete it.

  4. Click Delete project.

Verification
  • The data science project that you deleted is no longer displayed on the Data Science Projects page.

  • Deleting a data science project deletes any associated workbenches, data science pipelines, cluster storage, and connections. This data is permanently deleted and is not recoverable.

Using project workbenches

Creating a workbench and selecting an IDE

A workbench is an isolated area where you can examine and work with ML models. You can also work with data and run programs, for example to prepare and clean data. While a workbench is not required if, for example, you only want to service an existing model, one is needed for most data science workflow tasks, such as writing code to process data or training a model.

When you create a workbench, you specify an image (an IDE, packages, and other dependencies). IDEs include JupyterLab and code-server.

The IDEs are based on a server-client architecture. Each IDE provides a server that runs in a container on the OpenShift cluster, while the user interface (the client) is displayed in your web browser. For example, the Jupyter notebook server runs in a container on the Red Hat OpenShift cluster. The client is the JupyterLab interface that opens in your web browser on your local computer. All of the commands that you enter in JupyterLab are executed by the notebook server. Similarly, other IDEs like code-server or RStudio Server provide a server that runs in a container on the OpenShift cluster, while the user interface is displayed in your web browser. This architecture allows you to interact through your local computer in a browser environment, while all processing occurs on the cluster. The cluster provides the benefits of larger available resources and security because the data being processed never leaves the cluster.

In a workbench, you can also configure connections (to access external data for training models and to save models so that you can deploy them) and cluster storage (for persisting data). Workbenches within the same project can share models and data through object storage with the data science pipelines and model servers.

For data science projects that require data retention, you can add container storage to the workbench you are creating.

Within a project, you can create multiple workbenches. When to create a new workbench depends on considerations, such as the following:

  • The workbench configuration (for example, CPU, RAM, or IDE). If you want to avoid editing the configuration of an existing workbench’s configuration to accommodate a new task, you can create a new workbench instead.

  • Separation of tasks or activities. For example, you might want to use one workbench for your Large Language Models (LLM) experimentation activities, another workbench dedicated to a demo, and another workbench for testing.

About workbench images

A workbench image (sometimes referred to as a notebook image) is optimized with the tools and libraries that you need for model development. You can use the provided workbench images or an Open Data Hub administrator can create custom workbench images adapted to your needs.

To provide a consistent, stable platform for your model development, many provided workbench images contain the same version of Python. Most workbench images available on Open Data Hub are pre-built and ready for you to use immediately after Open Data Hub is installed or upgraded.

The following table lists the workbench images that are installed with Open Data Hub by default.

If the preinstalled packages that are provided in these images are not sufficient for your use case, you have the following options:

  • Install additional libraries after launching a default image. This option is good if you want to add libraries on an ad hoc basis as you develop models. However, it can be challenging to manage the dependencies of installed libraries and your changes are not saved when the workbench restarts.

  • Create a custom image that includes the additional libraries or packages. For more information, see Creating custom workbench images.

Table 1. Default workbench images
Image name Description

CUDA

If you are working with compute-intensive data science models that require GPU support, use the Compute Unified Device Architecture (CUDA) workbench image to gain access to the NVIDIA CUDA Toolkit. Using this toolkit, you can optimize your work by using GPU-accelerated libraries and optimization tools.

Standard Data Science

Use the Standard Data Science workbench image for models that do not require TensorFlow or PyTorch. This image contains commonly-used libraries to assist you in developing your machine learning models.

TensorFlow

TensorFlow is an open source platform for machine learning. With TensorFlow, you can build, train and deploy your machine learning models. TensorFlow contains advanced data visualization features, such as computational graph visualizations. It also allows you to easily monitor and track the progress of your models.

PyTorch

PyTorch is an open source machine learning library optimized for deep learning. If you are working with computer vision or natural language processing models, use the Pytorch workbench image.

Minimal Python

If you do not require advanced machine learning features, or additional resources for compute-intensive data science work, you can use the Minimal Python image to develop your models.

TrustyAI

Use the TrustyAI workbench image to leverage your data science work with model explainability, tracing, and accountability, and runtime monitoring. See the TrustyAI Explainability repository for some example Jupyter notebooks.

code-server

With the code-server workbench image, you can customize your workbench environment to meet your needs using a variety of extensions to add new languages, themes, debuggers, and connect to additional services. Enhance the efficiency of your data science work with syntax highlighting, auto-indentation, and bracket matching, as well as an automatic task runner for seamless automation. For more information, see code-server in GitHub.

NOTE: Elyra-based pipelines are not available with the code-server workbench image.

RStudio Server

Use the RStudio Server workbench image to access the RStudio IDE, an integrated development environment for R, a programming language for statistical computing and graphics. For more information, see the RStudio Server site.

CUDA - RStudio Server

Use the CUDA - RStudio Server workbench image to access the RStudio IDE and NVIDIA CUDA Toolkit. RStudio is an integrated development environment for R, a programming language for statistical computing and graphics. With the NVIDIA CUDA toolkit, you can optimize your work using GPU-accelerated libraries and optimization tools. For more information, see the RStudio Server site.

Creating a workbench

When you create a workbench, you specify an image (an IDE, packages, and other dependencies). You can also configure connections, cluster storage, and add container storage.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you use Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You created a project.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to add the workbench to.

    A project details page opens.

  3. Click the Workbenches tab.

  4. Click Create workbench.

    The Create workbench page opens.

  5. In the Name field, enter a unique name for your workbench.

  6. Optional: If you want to change the default resource name for your workbench, click Edit resource name.

    The resource name is what your resource is labeled in OpenShift. Valid characters include lowercase letters, numbers, and hyphens (-). The resource name cannot exceed 30 characters, and it must start with a letter and end with a letter or number.

    Note: You cannot change the resource name after the workbench is created. You can edit only the display name and the description.

  7. Optional: In the Description field, enter a description for your workbench.

  8. In the Notebook image section, complete the fields to specify the workbench image to use with your workbench.

    From the Image selection list, select a workbench image that suits your use case. A workbench image includes an IDE and Python packages (reusable code). Optionally, click View package information to view a list of packages that are included in the image that you selected.

    If the workbench image has multiple versions available, select the workbench image version to use from the Version selection list. To use the latest package versions, Red Hat recommends that you use the most recently added image.

    Note
    You can change the workbench image after you create the workbench.
  9. In the Deployment size section, from the Container size list, select a container size for your server. The container size specifies the number of CPUs and the amount of memory allocated to the container, setting the guaranteed minimum (request) and maximum (limit) for both.

  10. Optional: In the Environment variables section, select and specify values for any environment variables.

    Setting environment variables during the workbench configuration helps you save time later because you do not need to define them in the body of your notebooks, or with the IDE command line interface.

    If you are using S3-compatible storage, add these recommended environment variables:

    • AWS_ACCESS_KEY_ID specifies your Access Key ID for Amazon Web Services.

    • AWS_SECRET_ACCESS_KEY specifies your Secret access key for the account specified in AWS_ACCESS_KEY_ID.

    Open Data Hub stores the credentials as Kubernetes secrets in a protected namespace if you select Secret when you add the variable.

  11. In the Cluster storage section, configure the storage for your workbench. Select one of the following options:

    • Create new persistent storage to create storage that is retained after you shut down your workbench. Complete the relevant fields to define the storage:

      1. Enter a name for the cluster storage.

      2. Enter a description for the cluster storage.

      3. Select a storage class for the cluster storage.

        Note
        You cannot change the storage class after you add the cluster storage to the workbench.
      4. Under Persistent storage size, enter a new size in gibibytes or mebibytes.

    • Use existing persistent storage to reuse existing storage and select the storage from the Persistent storage list.

  12. Optional: You can add a connection to your workbench. A connection is a resource that contains the configuration parameters needed to connect to a data source or sink, such as an object storage bucket. You can use storage buckets for storing data, models, and pipeline artifacts. You can also use a connection to specify the location of a model that you want to deploy.

    In the Connections section, use an existing connection or create a new connection:

    • Use an existing connection as follows:

      1. Click Attach existing connections.

      2. From the Connection list, select a connection that you previously defined.

    • Create a new connection as follows:

      1. Click Create connection. The Add connection dialog appears.

      2. From the Connection type drop-down list, select the type of connection. The Connection details section appears.

      3. If you selected S3 compatible object storage in the preceding step, configure the connection details:

        1. In the Connection name field, enter a unique name for the connection.

        2. Optional: In the Description field, enter a description for the connection.

        3. In the Access key field, enter the access key ID for the S3-compatible object storage provider.

        4. In the Secret key field, enter the secret access key for the S3-compatible object storage account that you specified.

        5. In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.

        6. In the Region field, enter the default region of your S3-compatible object storage account.

        7. In the Bucket field, enter the name of your S3-compatible object storage bucket.

        8. Click Create.

      4. If you selected URI in the preceding step, configure the connection details:

        1. In the Connection name field, enter a unique name for the connection.

        2. Optional: In the Description field, enter a description for the connection.

        3. In the URI field, enter the Uniform Resource Identifier (URI).

        4. Click Create.

  13. Click Create workbench.

Verification
  • The workbench that you created appears on the Workbenches tab for the project.

  • Any cluster storage that you associated with the workbench during the creation process appears on the Cluster storage tab for the project.

  • The Status column on the Workbenches tab displays a status of Starting when the workbench server is starting, and Running when the workbench has successfully started.

  • Optional: Click the Open link to open the IDE in a new window.

Starting a workbench

You can manually start a data science project’s workbench from the Workbenches tab on the project details page. By default, workbenches start immediately after you create them.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project that contains a workbench.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project whose workbench you want to start.

    A project details page opens.

  3. Click the Workbenches tab.

  4. In the Status column for the workbench that you want to start, click Start.

    The Status column changes from Stopped to Starting when the workbench server is starting, and then to Running when the workbench has successfully started. * Optional: Click the Open link to open the IDE in a new window.

Verification
  • The workbench that you started appears on the Workbenches tab for the project, with the status of Running.

Updating a project workbench

If your data science work requires you to change your workbench’s notebook image, container size, or identifying information, you can update the properties of your project’s workbench. If you require extra power for use with large datasets, you can assign accelerators to your workbench to optimize performance.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you use Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project that has a workbench.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project whose workbench you want to update.

    A project details page opens.

  3. Click the Workbenches tab.

  4. Click the action menu () beside the workbench that you want to update and then click Edit workbench.

    The Edit workbench page opens.

  5. Update any of the workbench properties and then click Update workbench.

Verification
  • The workbench that you updated appears on the Workbenches tab for the project.

Deleting a workbench from a data science project

You can delete workbenches from your data science projects to help you remove Jupyter notebooks that are no longer relevant to your work.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project with a workbench.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to delete the workbench from.

    A project details page opens.

  3. Click the Workbenches tab.

  4. Click the action menu () beside the workbench that you want to delete and then click Delete workbench.

    The Delete workbench dialog opens.

  5. Enter the name of the workbench in the text field to confirm that you intend to delete it.

  6. Click Delete workbench.

Verification
  • The workbench that you deleted is no longer displayed on the Workbenches tab for the project.

  • The custom resource (CR) associated with the workbench’s Jupyter notebook is deleted.

Using connections

Adding a connection to your data science project

You can enhance your data science project by adding a connection to a data source. When you want to work with a very large data sets, you can store your data in an S3-compatible object storage bucket, so that you do not fill up your local storage. You also have the option of associating the connection with an existing workbench that does not already have a connection.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project that you can add a connection to.

  • You have access to S3-compatible object storage.

  • If you intend to add the connection to an existing workbench, you have saved any data in the workbench to avoid losing work.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to add a connection to.

    A project details page opens.

  3. Click the Connections tab.

  4. Click Add connection.

    The Add connection dialog opens.

  5. In the Name field, enter a unique name for the connection.

  6. In the Access key field, enter the access key ID for your S3-compatible object storage provider.

  7. In the Secret key field, enter the secret access key for the S3-compatible object storage account you specified.

  8. In the Endpoint field, enter the endpoint of your S3-compatible object storage bucket.

    Note
    Use the appropriate endpoint format for your object storage type. Improper formatting might cause connection errors or restrict access to storage resources. For more information about how to format object storage endpoints, see Overview of object storage endpoints.
  9. In the Region field, enter the default region of your S3-compatible object storage account.

  10. In the Bucket field, enter the name of your S3-compatible object storage bucket.

  11. Optional: From the Connected workbench list, select a workbench to connect.

  12. Click Add connection.

Verification
  • The connection that you added appears on the Connections tab for the project.

  • If you selected a workbench, the workbench is visible in the Connected workbenches column on the Connections tab for the project.

Deleting a connection

You can delete connections that are no longer relevant to your data science project.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project with a connection.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to delete the connection from.

    A project details page opens.

  3. Click the Connections tab.

  4. Click the action menu () beside the connection that you want to delete and then click Delete connection.

    The Delete connection dialog opens.

  5. Enter the name of the connection in the text field to confirm that you intend to delete it.

  6. Click Delete connection.

Verification
  • The connection that you deleted is no longer displayed on the Connections page for the project.

Updating a connected data source

To use an existing data source with a different workbench, you can change the data source that is connected to your project’s workbench.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project, created a workbench, and you have defined a connection.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project whose data source you want to change.

    A project details page opens.

  3. Click the Connections tab.

  4. Click the action menu () beside the data source that you want to change and then click Edit connection.

    The Edit connection dialog opens.

  5. In the Connected workbench section, select an existing workbench from the list.

  6. Click Update connection.

Verification
  • The updated connection is displayed on the Connections tab for the project.

  • You can access your S3 data source using environment variables in the connected workbench.

Configuring cluster storage

Adding cluster storage to your data science project

For data science projects that require data to be retained, you can add cluster storage to the project. Additionally, you can also connect cluster storage to a specific project’s workbench.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project that you can add cluster storage to.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to add the cluster storage to.

    A project details page opens.

  3. Click the Cluster storage tab.

  4. Click Add cluster storage.

    The Add cluster storage dialog opens.

  5. In the Name field, enter a unique name for the cluster storage.

  6. Optional: In the Description field, enter a description for the cluster storage.

  7. Optional: From the Storage class list, select the type of cluster storage.

    Note
    You cannot change the storage class after you add the cluster storage to the project.
  8. In the Persistent storage size section, specify a new size in gibibytes or mebibytes.

  9. Optional: If you want to connect the cluster storage to an existing workbench:

    1. From the Connected workbench list, select a workbench.

    2. In the Mount folder name field, enter the name of storage directory.

  10. Click Add storage.

Verification
  • The cluster storage that you added appears on the Cluster storage tab for the project.

  • A new persistent volume claim (PVC) is created with the storage size that you defined.

  • The persistent volume claim (PVC) is visible as an attached storage on the Workbenches tab for the project.

Updating cluster storage

If your data science work requires you to change the identifying information of a project’s cluster storage or the workbench that the storage is connected to, you can update your project’s cluster storage to change these properties.

Note
You cannot directly change the storage class for cluster storage that is already configured for a workbench or project. To switch to a different storage class, you need to migrate your data to a new cluster storage instance that uses the required storage class. For more information, see Changing the storage class for an existing cluster storage instance.
Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project that contains cluster storage.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project whose storage you want to update.

    A project details page opens.

  3. Click the Cluster storage tab.

  4. Click the action menu () beside the storage that you want to update and then click Edit storage.

    The Update cluster storage page opens.

  5. Optional: Edit the Name field to change the display name for your storage.

  6. Optional: Edit the Description field to change the description of your storage.

  7. Optional: In the Persistent storage size section, specify a new size in gibibytes or mebibytes.

    Note that you can only increase the storage size. Updating the storage size restarts the workbench and makes it unavailable for a period of time that is usually proportional to the size change.

  8. Optional: If you want to connect the cluster storage to a different workbench:

    1. From the Connected workbench list, select the workbench.

    2. In the Mount folder name field, enter the name of storage directory.

  9. Click Update.

If you increased the storage size, the workbench restarts and is unavailable for a period of time that is usually proportional to the size change.

Verification
  • The storage that you updated appears on the Cluster storage tab for the project.

Changing the storage class for an existing cluster storage instance

When you create a workbench with cluster storage, the cluster storage is tied to a specific storage class. Later, if your data science work requires a different storage class, or if the current storage class has been deprecated, you cannot directly change the storage class on the existing cluster storage instance. Instead, you must migrate your data to a new cluster storage instance that uses the storage class that you want to use.

Prerequisites
  • You have logged in to Open Data Hub.

  • You have created a workbench or data science project that contains cluster storage.

Procedure
  1. Stop the workbench with the storage class that you want to change.

    1. From the Open Data Hub dashboard, click Data Science Projects.

      The Data Science Projects page opens.

    2. Click the name of the project with the cluster storage instance that uses the storage class you want to change.

      The project details page opens.

    3. Click the Workbenches tab.

    4. In the Status column for the relevant workbench, click Stop.

      Wait until the Status column for the relevant workbench changes from Running to Stopped.

  2. Add a new cluster storage instance that uses the needed storage class.

    1. Click the Cluster storage tab.

    2. Click Add cluster storage.

      The Add cluster storage dialog opens.

    3. Enter a name for the cluster storage.

    4. Optional: Enter a description for the cluster storage.

    5. Select the needed storage class for the cluster storage.

    6. Under Persistent storage size, enter a size in gibibytes or mebibytes.

    7. Under Connected workbench, select the workbench with the storage class that you want to change.

    8. Under Mount folder name, enter a new storage directory for the cluster storage to mount to. For example, backup.

    9. Click Add storage.

  3. Copy the data from the existing cluster storage instance to the new cluster storage instance.

    1. Click the Workbenches tab.

    2. In the Status column for the relevant workbench, click Start.

    3. When the workbench status is Running, click Open to open the workbench.

    4. In JupyterLab, click FileNewTerminal.

    5. Copy the data to the new storage directory. Replace <mount_folder_name> with the storage directory of your new cluster storage instance.

      rsync -avO --exclude='/opt/app-root/src/__<mount_folder_name>__' /opt/app-root/src/ /opt/app-root/src/__<mount_folder_name>__/

      For example:

      rsync -avO --exclude='/opt/app-root/src/backup' /opt/app-root/src/ /opt/app-root/src/backup/
    6. After the data has finished copying, log out of JupyterLab.

  4. Stop the workbench.

    1. Click the Workbenches tab.

    2. In the Status column for the relevant workbench, click Stop.

      Wait until the Status column for the relevant workbench changes from Running to Stopped.

  5. Remove the original cluster storage instance from the workbench.

    1. Click the Cluster storage tab.

    2. Click the action menu () beside the existing cluster storage instance, and then click Edit storage.

    3. Under Existing connected workbenches, remove the workbench.

    4. Click Update.

  6. Update the mount folder of the new cluster storage instance by removing it and re-adding it to the workbench.

    1. On the Cluster storage tab, click the action menu () beside the new cluster storage instance, and then click Edit storage.

    2. Under Existing connected workbenches, remove the workbench.

    3. Click Update.

    4. Click the Workbenches tab.

    5. Click the action menu () beside the workbench and then click Edit workbench.

    6. In the Cluster storage section, under Use existing persistent storage, select the new cluster storage instance.

    7. Click Update workbench.

  7. Restart the workbench.

    1. Click the Workbenches tab.

    2. In the Status column for the relevant workbench, click Start.

  8. Optional: The initial cluster storage that uses the previous storage class still appears on the Cluster storage tab. If you no longer need this cluster storage (for example, if the storage class is deprecated), you can delete it.

  9. Optional: You can delete the mount folder of your new cluster storage instance (for example, the backup folder).

Verification
  • On the Cluster storage tab for the project, the new cluster storage instance appears with the needed storage class in the Storage class column and the relevant workbench in the Connected workbenches column.

  • On the Workbenches tab for the project, the new cluster storage instance appears for the workbench in the Cluster storage section and has the mount path: /opt/app-root/src.

Deleting cluster storage from a data science project

You can delete cluster storage from your data science projects to help you free up resources and delete unwanted storage space.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project with cluster storage.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to delete the storage from.

    A project details page opens.

  3. Click the Cluster storage tab.

  4. Click the action menu () beside the storage that you want to delete and then click Delete storage.

    The Delete storage dialog opens.

  5. Enter the name of the storage in the text field to confirm that you intend to delete it.

  6. Click Delete storage.

Verification
  • The storage that you deleted is no longer displayed on the Cluster storage tab for the project.

  • The persistent volume (PV) and persistent volume claim (PVC) associated with the cluster storage are both permanently deleted. This data is not recoverable.

Managing access to data science projects

Configuring access to a data science project

To enable you to work collaboratively on your data science projects with other users, you can share access to your project. After creating your project, you can then set the appropriate access permissions from the Open Data Hub user interface.

You can assign the following access permission levels to your data science projects:

  • Admin - Users can modify all areas of a project, including its details (project name and description), components, and access permissions.

  • Contributor - Users can modify a project’s components, such as its workbench, but they cannot edit a project’s access permissions or its details (project name and description).

Sharing access to a data science project

To enable your organization to work collaboratively, you can share access to your data science project with other users and groups.

Prerequisites
  • You have logged in to Open Data Hub.

  • If you are using Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.

  • You have created a data science project.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. From the list of data science projects, click the name of the data science project that you want to share access to.

    A project details page opens.

  3. Click the Permissions tab.

    The Permissions page for the project opens.

  4. Provide one or more users with access to the project.

    1. In the Users section, click Add user.

    2. In the Name field, enter the user name of the user whom you want to provide access to the project.

    3. From the Permissions list, select one of the following access permission levels:

      • Admin: Users with this access level can edit project details and manage access to the project.

      • Contributor: Users with this access level can view and edit project components, such as its workbenches, connections, and storage.

    4. To confirm your entry, click Confirm (The Confirm icon).

    5. Optional: To add an additional user, click Add user and repeat the process.

  5. Provide one or more OpenShift groups with access to the project.

    1. In the Groups section, click Add group.

    2. From the Name list, select a group to provide access to the project.

      Note

      If you do not have cluster-admin permissions, the Name list is not visible. Instead, an input field is displayed enabling you to configure group permissions.

    3. From the Permissions list, select one of the following access permission levels:

      • Admin: Groups with this access permission level can edit project details and manage access to the project.

      • Edit: Groups with this access permission level can view and edit project components, such as its workbenches, connections, and storage.

    4. To confirm your entry, click Confirm (The Confirm icon).

    5. Optional: To add an additional group, click Add group and repeat the process.

Verification
  • Users to whom you provided access to the project can perform only the actions permitted by their access permission level.

  • The Users and Groups sections on the Permissions tab show the respective users and groups that you provided with access to the project.

Updating access to a data science project

To change the level of collaboration on your data science project, you can update the access permissions of users and groups who have access to your project.

Prerequisites
  • You have logged in to Open Data Hub.

  • You have Open Data Hub administrator privileges or you are the project owner.

  • You have created a data science project.

  • You have previously shared access to your project with other users or groups.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to change the access permissions of.

    A project details page opens.

  3. Click the Permissions tab.

    The Permissions page for the project opens.

  4. Update the user access permissions to the project.

    1. Click the action menu () beside the user whose access permissions you want to update and click Edit.

    2. In the Name field, update the user name of the user whom you want to provide access to the project.

    3. From the Permissions list, update the user access permissions by selecting one of the following:

      • Admin: Users with this access level can edit project details and manage access to the project.

      • Contributor: Users with this access level can view and edit project components, such as its workbenches, connections, and storage.

    4. To confirm the update to the entry, click Confirm (The Confirm icon).

  5. Update the OpenShift groups access permissions to the project.

    1. Click the action menu () beside the group whose access permissions you want to update and click Edit.

    2. From the Name list, update the group that has access to the project by selecting another group from the list.

      Note

      If you do not have cluster-admin permissions, the Name list is not visible. Instead, you can configure group permissions in the input field that appears.

    3. From the Permissions list, update the group access permissions by selecting one of the following:

      • Admin: Groups with this access permission level can edit project details and manage access to the project.

      • Edit: Groups with this access permission level can view and edit project components, such as its workbenches, connections, and storage.

    4. To confirm the update to the entry, click Confirm (The Confirm icon).

Verification
  • The Users and Groups sections on the Permissions tab show the respective users and groups whose project access permissions you changed.

Removing access to a data science project

If you no longer want to work collaboratively on your data science project, you can restrict access to your project by removing users and groups that you previously provided access to your project.

Prerequisites
  • You have logged in to Open Data Hub.

  • You have Open Data Hub administrator privileges or you are the project owner.

  • You have created a data science project.

  • You have previously shared access to your project with other users or groups.

Procedure
  1. From the Open Data Hub dashboard, click Data Science Projects.

    The Data Science Projects page opens.

  2. Click the name of the project that you want to change the access permissions of.

    A project details page opens.

  3. Click the Permissions tab.

    The Permissions page for the project opens.

  4. Click the action menu () beside the user or group whose access permissions you want to revoke and click Delete.

Verification
  • Users whose access you have revoked can no longer perform the actions that were permitted by their access permission level.