Quick Installation


This tutorial is based on old - no longer developed - version of Open Data Hub. You can find the new version of the document here.

The steps are also available in a tutorial video available on the OpenShift youtube channel.


Installing ODH requires OpenShift 3.11 or 4.x. Documentation for OpenShift can be located (here). All screenshots and instructions are from OpenShift 4.2. For the purposes of this quick start, we used try.openshift.com on AWS. Tutorials have also been tested on Code Ready Containers with 16GB of RAM.

We will not be installing optional components such as Argo, Seldon, AI Library, and Kafka. For these components, there are additional pre-requisites detailed in the advanced installation. These additional pre-requisites must be installed before the Open Data Hub Operator if you intend to install these optional components.

Installing the Open Data Hub Operator

The Open Data Hub operator is available in the OpenShift 4.x Community Operators section. You can install it from the OpenShift webui by following the steps below:

  1. From the OpenShift console, log in as a user with cluster-admin privileges. For a developer installation from try.openshift.com including AWS and CRC, the kubeadmin user will work. Log in to OpenShift
  2. Create a new namespace for your installation of Open Data Hub. Create Namespace
  3. Find Open Data Hub in the OperatorHub catalog.
    1. Select the new namespace if not already selected.
    2. Under Operators, select OperatorHub for a list of community operators.
    3. Filter for Open Data Hub or look under Big Data for the icon for Open Data Hub. OperatorHub
  4. Click the Install button and follow the installation instructions to install the Open Data Hub operator. Install
  5. To view the status of the Open Data Hub operator installation, find the Open Data Hub Operator under Operators -> Installed Operators (inside the namespace you created earlier). Once the STATUS field displays InstallSucceeded, you can proceed to create a new Open Data Hub deployment. Installed Operators

Create a New Open Data Hub Deployment

The Open Data Hub operator will create new Open Data Hub deployments and manage its components. Let’s create a new Open Data Hub deployment.

  1. Find the Open Data Hub Operator under Installed Operators (inside the namespace you created earlier) Installed Operators

  2. Click on the Open Data Hub Operator to bring up the detail. Open Data Hub Operator

  3. Click Create Instance to create a new deployment. Create New ODH

  4. Here you’ll be presented with a YAML file to customize your deployment. Most options are disabled, and for this tutorial we’ll leave them that way and modify some of the parameters to make sure the components for JupyterHub and Spark fit within our cluster resource constraints. Take note of some parameters:
    • the name of your deployment example-opendatahub
       name: example-opendatahub
    • the deployed components designated by odh_deploy:
        odh_deploy: true
        # Set the Jupyter notebook pod to 1CPU and 2Gi of memory
        notebook_cpu: 1
        notebook_memory: 1Gi
          image: "quay.io/opendatahub/spark-cluster-image:spark22python36"
            instances: 1
            # Reduce the master node to 1CPU and 1GB 
                memory: 1Gi
                cpu: 1
                memory: 512Mi
                cpu: 500m
            # Disable creation of the spark worker node in the cluster
            instances: 0
                memory: 1Gi
                cpu: 1
                memory: 256Mi
                cpu: 200m
        odh_deploy: true
       # Reduce the memory requirements
      odh_deploy: false
  5. Leave the YAML intact and click Create. If you accepted the default name, this will trigger an Open Data Hub deployment named example-opendatahub with JupyterHub and Spark.

  6. Verify the installation by viewing the Open Data Hub tab within the operator details. You Should see example-opendatahub listed. ODH List

  7. Verify the installation by viewing the project workload. JupyterHub, Spark, and Prometheus should all be running. Verify Status