Monitoring data science models

Table of Contents

Enabling the TrustyAI Service for a data science project
- Enabling the TrustyAI Service by using the dashboard
- Enabling the TrustyAI Service by using the CLI
Authenticating the TrustyAI service
Sending training data to a model
Configuring bias metrics for a model
Viewing bias metrics for a model
Supported bias metrics

The TrustyAI service provides data scientists with value-added capabilities, such as explainability (enriching model execution information through XAI algorithms) and model bias detection.

Note	The monitoring models for bias (TrustyAI feature) is supported only on a multi-model serving platform that is based on the ModelMesh component. It is not supported on a single model serving platform that is based on the KServe component.

Enabling the TrustyAI Service for a data science project

When you install Open Data Hub, the TrustyAI Operator is included with the other components in the Open Data Hub namespace. The TrustyAI operator is responsible for deploying and managing TrustyAI services, as well as managing all other resources required by TrustyAI.

These management tasks include configuring storage, creating the service monitors, and configuring the serving runtimes and routes.

The TrustyAI operator manages all enabled TrustyAI services across any number of projects on your cluster.

For each data science project (namespace) that contains models for which the data scientists on your team want to monitor bias metrics, enable an instance of the TrustyAI service. You can enable an instance of the TrustyAI service by using the Open Data Hub dashboard or by using the OpenShift command-line interface (CLI).

Note	You should enable only one instance of the TrustyAI service in a project. The presence of multiple instances in the same project can result in unexpected behavior.

Enabling the TrustyAI Service by using the dashboard

Enable an instance of the TrustyAI service on each data science project that contains models that the data scientist wants to monitor. The TrustyAI service instance provides the URL that the data scientist uses to monitor and analyze any number of models deployed into a data science project.

The following procedure describes how to use the Open Data Hub dashboard to enable an instance of the TrustyAI service.

Prerequisites

On the OpenShift cluster where Open Data Hub is installed, you have enabled user workload monitoring as described in Enabling monitoring for user-defined projects.
You have enabled the multi-model serving platform as described in Enabling the multi-model serving platform.

Note	The monitoring models for bias (TrustyAI feature) is supported only on a multi-model serving platform that is based on the ModelMesh component. It is not supported on a single model serving platform that is based on the KServe component.

You have installed Open Data Hub as described in Quick Installation(v2).
The trustyai component is set to Managed for the Open Data Hub Operator.

To verify this setting, navigate to Operators → Installed Operators → Open Data Hub Operator → Data Science Cluster. Select the default instance and then click YAML. Scroll down to view the spec:components setting:
```
 trustyai:
      devFlags: {}
      managementState: Managed
```
NOTE: If the trustyai component is set to Removed, edit the YAML file to set it to Managed.
You have logged in to Open Data Hub.
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.
The data scientist has created a data science project, as described in Creating a data science project, that contains (or will contain) the models that the data scientist wants to monitor.

Procedure

Log in to Open Data Hub.
From the Open Data Hub dashboard, click Data Science Projects.
In the Data Science Projects page, click the name of the project that contains (or will contain) the models that the data scientist wants to monitor.
In the project details, page, click Settings.
Select the Enable model bias monitoring checkbox.

Verification

In the OpenShift Container Platform web console, click Workloads → Pods.
From the project list, select the project namespace in which you enabled TrustyAI.
Confirm that there is a running pod for the TrustyAI service. The pod has a naming pattern similar to the following example:
```
trustyai-service-5d45b5884f-96h5z
```

Enabling the TrustyAI Service by using the CLI

Enable an instance of the TrustyAI service on each data science project that contains models that the data scientist wants to monitor. The TrustyAI service instance provides the URL that a data scientist uses to monitor and analyze any number of models deployed into a data science project.

The following procedure describes how to use the OpenShift command-line interface (CLI) to enable an instance of the TrustyAI service.

Prerequisites

On the OpenShift cluster where Open Data Hub is installed, you have enabled user workload monitoring as described in Enabling monitoring for user-defined projects.
You have enabled the multi-model serving platform as described in Enabling the multi-model serving platform.

Note	The monitoring models for bias (TrustyAI feature) is supported only on a multi-model serving platform that is based on the ModelMesh component. It is not supported on a single model serving platform that is based on the KServe component.

You have installed Open Data Hub as described in Quick Installation(v2).
The trustyai component is set to Managed for the Open Data Hub Operator.

To verify this setting, navigate to Operators → Installed Operators → Open Data Hub Operator → Data Science Cluster. Select the default instance and then click YAML. Scroll down to view the spec:components setting:
```
 trustyai:
      devFlags: {}
      managementState: Managed
```
NOTE: If the trustyai component is set to Removed, edit the YAML file to set it to Managed.
You have logged in to Open Data Hub.
If you are using specialized Open Data Hub groups, you are part of the user group or admin group (for example, odh-users or odh-admins) in OpenShift.
The data scientist has created a data science project, as described in Creating a data science project, that contains (or will contain) the models that the data scientist wants to monitor.

Procedure

Login to your cluster.
```
 oc login
```
Navigate to the data science project that contains (or will contain) the models that the data scientist wants to monitor.
```
oc project <project-name>
```
For example:
```
oc project mydsproject
```
Create a TrustyAIService custom resource (CR) file, for example trustyai_crd.yaml:
```
apiVersion: trustyai.opendatahub.io/v1alpha1
kind: TrustyAIService
metadata:
  name: trustyai-service
spec:
  storage:
	format: "PVC"
	folder: "/inputs"
	size: "1Gi"
  data:
	filename: "data.csv"
	format: "CSV"
  metrics:
	schedule: "5s"
	batchSize: 5000 # Optional, default is 5000
```
Here is a description of the fields:

metadata.name

The name of the TrustyAI service instance.

spec.storage.format

The storage format for the data. Currently, only persistent volume control (PVC) format is supported.

spec.storage.folder

The location within the PVC where you want to store the data.

spec.storage.size

The size of the PVC to request.

spec.data.filename

The suffix for the stored data files.

spec.data.format

The format of the data. Currently, only comma-separated value (CSV) format is supported.

spec.metrics.schedule

The interval at which to calculate the metrics. The default is 5s. The duration is specified with the ISO-8601 format. For example, 5s for 5 seconds, 5m for 5 minutes, and 5h for 5 hours.

spec.metrics.batchSize

The observation’s historical window size to use for metrics calculation. The default is 5000 (that is, the metrics are calculated by using the latest 5000 inferences).
Add the TrustyAI service’s CR to your project:
```
oc apply -f trustyai_crd.yaml
```
This command returns output similar to the following:
```
trusty-service created
```

Verification

Use this command to verify that you enabled the TrustyAI Service:

oc get pods | grep trustyai

You should see a response similar to the following:

trustyai-service-5d45b5884f-96h5z             1/1     Running

Authenticating the TrustyAI service

You can access TrustyAI service external endpoints only if you provide OAuth proxy (oauth-proxy) authentication. You must obtain a user token, or a token from a service account with sufficient privileges, and then pass the token to the TrustyAI service when using curl commands.

Prerequisites

You installed the OpenShift command line interface (oc) as described in Get Started with the CLI.
Your OpenShift cluster administrator has installed Open Data Hub, added you as a user to the OpenShift cluster, and enabled the TrustyAI service for the data science project where the models are deployed.

Procedure

In a terminal window, log in to OpenShift Container Platform where Open Data Hub is deployed.
```
oc login
```
Enter the following command to get the user token on OpenShift Container Platform.
```
TOKEN=$(oc whoami -t)
```

Verification

Enter the following command to check the user token variable:
```
echo $TOKEN
```

Next step

When running curl commands, pass the token to the TrustyAI service using the Authorization header. For example:

curl -H "Authorization: Bearer $TOKEN" $TRUSTY_ROUTE

Sending training data to a model

You must send training data through your model so that TrustyAI can compute baseline fairness values.

Prerequisites

Your OpenShift cluster administrator has installed Open Data Hub and enabled the TrustyAI service for the data science project where the model is deployed.
You have logged in to Open Data Hub.

Procedure

Get the inference endpoints for the deployed model, as described in Viewing a deployed model.
Send data to this endpoint. For more information, see the KServe v2 Inference Protocol documentation.

Verification

Follow these steps to view cluster metrics and verify that TrustyAI is receiving data.

Log in to the OpenShift Container Platform web console.
Switch to the Administrator perspective.
In navigation menu, click Observe → Metrics.
On the Metrics page, in the Expression field, enter trustyai_model_observations_total and then click Run queries. Your model should be listed and reporting observed inferences.
Optional: Select a refresh interval from the list in the upper-right corner. For example, select 15 seconds.
Optional: Select a time range from the list above the graph. For example, select 5m.

Next step

Configuring bias metrics for a model

As a data scientist, you might want to monitor your machine learning models for bias. This means monitoring for algorithmic deficiencies that might skew the outcomes or decisions that the model produces. Importantly, this type of monitoring helps you to ensure that the model is not biased against particular protected groups or features.

Open Data Hub provides a set of metrics that help you to monitor your models for bias. You can use the Open Data Hub interface to choose an available metric and then configure model-specific details such as a protected attribute, the privileged and unprivileged groups, the outcome you want to monitor, and a threshold for bias. You then see a chart of the calculated values for a specified number of model inferences.

The sections that follow describe how to configure your models for bias monitoring.

Creating a bias metric

To monitor a deployed model for bias, you must first create bias metrics. When you create a bias metric, you specify details relevant to your model such as a protected attribute, privileged and unprivileged groups, a model outcome and a value that you want to monitor, and the acceptable threshold for bias.

You can create a bias metric for a model by using the Open Data Hub dashboard or by using the OpenShift command-line interface (CLI).

Creating a bias metric by using the dashboard

The following procedure describes how to use the Open Data Hub dashboard to create a bias metric for a model.

Prerequisites

You are familiar with the bias metrics that Open Data Hub supports and how to interpret them.
You are familiar with the specific data set schema and understand the names and meanings of the inputs and outputs.
Your OpenShift cluster administrator has installed Open Data Hub and enabled the TrustyAI service for the data science project where the models are deployed.
You have logged in to Open Data Hub.
You have a user token for authentication as described in Authenticating the TrustyAI service.
You have sent training data to your models as described in Sending training data to a model.

Procedure

Optional: To set the TRUSTY_ROUTE variable, follow these steps.
1. In a terminal window, log in to the OpenShift cluster where Open Data Hub is deployed.
  oc login
2. Set the TRUSTY_ROUTE variable to the external route for the TrustyAI service pod.
  TRUSTY_ROUTE=https://$(oc get route/trustyai-service --template={{.spec.host}})
In the left menu of the Open Data Hub dashboard, click Model Serving.
On the Deployed models page, select your project from the drop-down list.
Click the name of the model that you want to configure bias metrics for.
On the metrics page for the model, click the Model bias tab.
Click Configure.

In the Configure bias metrics dialog, complete the following steps to configure bias metrics:

In the Metric name field, type a unique name for your bias metric. Note that you cannot change the name of this metric later.
From the Metric type list, select one of the metrics types that are available in Open Data Hub.

In the Protected attribute field, type the name of an attribute in your model that you want to monitor for bias.

Tip	You can use a `curl` command to query the metadata endpoint and view input attribute names and values. For example: `curl -H "Authorization: Bearer $TOKEN" $TRUSTY_ROUTE/info \| jq ".[0].data.inputSchema"`

In the Privileged value field, type the name of a privileged group for the protected attribute that you specified.
In the Unprivileged value field, type the name of an unprivileged group for the protected attribute that you specified.

In the Output field, type the name of the model outcome that you want to monitor for bias.

Tip	You can use a `curl` command to query the metadata endpoint and view output attribute names and values. For example: `curl -H "Authorization: Bearer $TOKEN" $TRUSTY_ROUTE/info \| jq ".[0].data.outputSchema"`

In the Output value field, type the value of the outcome that you want to monitor for bias.
In the Violation threshold field, type the bias threshold for your selected metric type. This threshold value defines how far the specified metric can be from the fairness value for your metric, before the model is considered biased.
In the Metric batch size field, type the number of model inferences that Open Data Hub includes each time it calculates the metric.

Ensure that the values you entered are correct.

Note	You cannot edit a model bias metric configuration after you create it. Instead, you can duplicate a metric and then edit (configure) it; however, the history of the original metric is not applied to the copy.

Click Configure.

Verification

The Bias metric configuration page shows the bias metrics that you configured for your model.

Next step

To view metrics, on the Bias metric configuration page, click View metrics in the upper-right corner.

Creating a bias metric by using the CLI

The following procedure describes how to use the OpenShift command-line interface (CLI) to create a bias metric for a model.

Prerequisites

You are familiar with the bias metrics that Open Data Hub supports and how to interpret them.
You are familiar with the specific data set schema and understand the names and meanings of the inputs and outputs.
Your OpenShift cluster administrator has installed Open Data Hub and enabled the TrustyAI service for the data science project where the models are deployed.
You installed the OpenShift command line interface (oc) as described in Get Started with the CLI.
You have a user token for authentication as described in Authenticating the TrustyAI service.
You have sent training data to your models as described in Sending training data to a model.

Procedure

In a terminal window, log in to the OpenShift cluster where Open Data Hub is deployed.
```
oc login
```

Set the TRUSTY_ROUTE variable to the external route for the TrustyAI service pod.

TRUSTY_ROUTE=https://$(oc get route/trustyai-service --template={{.spec.host}})

Optionally, get the full list of TrustyAI service endpoints and payloads.

curl -H "Authorization: Bearer $TOKEN" --location $TRUSTY_ROUTE/q/openapi

Configure bias metrics by running a command that uses the following syntax and payload structure:

Syntax:
```
curl -sk -H "Authorization: Bearer $TOKEN" -X POST --location $TRUSTY_ROUTE/metrics/spd/request \
 --header 'Content-Type: application/json' \
 --data <payload>
```
Payload structure:

modelId

The name of the model to query.

protectedAttribute

The name of the feature that distinguishes the groups that you are checking for fairness.

privilegedAttribute

The suspected favored (positively biased) class.

unprivilegedAttribute

The suspected unfavored (negatively biased) class.

outcomeName

The name of the output that provides the output you are examining for fairness.

favorableOutcome

The value of the outcomeName output that describes the favorable or desired model prediction.

batchSize

The number of previous inferences to include in the calculation.

For example:

curl -sk -H "Authorization: Bearer $TOKEN" -X POST --location $TRUSTY_ROUTE/metrics/group/fairness/spd/ \
     --header 'Content-Type: application/json' \
     --data "{
                 \"modelId\": \"demo-loan-nn-onnx-alpha\",
                 \"protectedAttribute\": \"Is Male-Identifying?\",
                 \"privilegedAttribute\": 1.0,
                 \"unprivilegedAttribute\": 0.0,
                 \"outcomeName\": \"Will Default?\",
                 \"favorableOutcome\": 0,
                 \"batchSize\": 5000
             }"

Verification

The bias metrics request should return output similar to the following:

{
   "timestamp":"2023-10-24T12:06:04.586+00:00",
   "type":"metric",
   "value":-0.0029676404469311524,
   "namedValues":null,
   "specificDefinition":"The SPD of -0.002968 indicates that the likelihood of Group:Is Male-Identifying?=1.0 receiving Outcome:Will Default?=0 was -0.296764 percentage points lower than that of Group:Is Male-Identifying?=0.0.",
   "name":"SPD",
   "id":"d2707d5b-cae9-41aa-bcd3-d950176cbbaf",
   "thresholds":{"lowerBound":-0.1,"upperBound":0.1,"outsideBounds":false}
}

The specificDefinition field helps you understand the real-world interpretation of these metric values. For this example, the model is fair over the Is Male-Identifying? field, with the rate of positive outcome only differing by about -0.3%.

Duplicating a bias metric

If you want to edit an existing metric, you can duplicate (copy) it in the Open Data Hub interface and then edit the values in the copy. However, note that the history of the original metric is not applied to the copy.

Prerequisites

You are familiar with the bias metrics that Open Data Hub supports and how to interpret them.
You are familiar with the specific data set schema and understand the names and meanings of the inputs and outputs.
You have logged in to Open Data Hub.
There is an existing bias metric that you want to duplicate.

Procedure

In the left menu of the Open Data Hub dashboard, click Model Serving.
On the Deployed models page, click the name of the model with the bias metric that you want to duplicate.
On the metrics page for the model, click the Model bias tab.
Click Configure.
On the Bias metric configuration page, click the action menu (⋮) next to the metric that you want to copy and then click Duplicate.
In the Configure bias metric dialog, follow these steps:
1. In the Metric name field, type a unique name for your bias metric. Note that you cannot change the name of this metric later.
2. Change the values of the fields as needed. For a description of these fields, see Creating a bias metric by using the dashboard.
Ensure that the values you entered are correct, and then click Configure.

Verification

The Bias metric configuration page shows the bias metrics that you configured for your model.

Next step

To view metrics, on the Bias metric configuration page, click View metrics in the upper-right corner.

Deleting a bias metric

You can delete a bias metric for a model by using the Open Data Hub dashboard or by using the OpenShift command-line interface (CLI).

Deleting a bias metric by using the dashboard

The following procedure describes how to use the Open Data Hub dashboard to delete a bias metric for a model.

Prerequisites

You have logged in to Open Data Hub.
There is an existing bias metric that you want to delete.

Procedure

In the left menu of the Open Data Hub dashboard, click Model Serving.
On the Deployed models page, click the name of the model with the bias metric that you want to delete.
On the metrics page for the model, click the Model bias tab.
Click Configure.
Click the action menu (⋮) next to the metric that you want to delete and then click Delete.
In the Delete bias metric dialog, type the metric name to confirm the deletion.

Note

You cannot undo deleting a bias metric.
Click Delete bias metric.

Verification

The Bias metric configuration page does not show the bias metric that you deleted.

Deleting a bias metric by using the CLI

The following procedure describes how to use the OpenShift command-line interface (CLI) to delete a bias metric for a model.

Prerequisites

You have logged in to Open Data Hub.
There is an existing bias metric that you want to delete.
You have a user token for authentication as described in Authenticating the TrustyAI service.

Procedure

In a terminal window, log in to the OpenShift cluster where Open Data Hub is deployed.
```
oc login
```
Optional: To list all currently active requests for a metric, use GET /metrics/{{metric}}/requests. For example, to list all currently scheduled SPD metrics, type:
```
curl -H "Authorization: Bearer $TOKEN" -X GET --location "$TRUSTY_ROUTE/metrics/spd/requests"
```

To delete a metric, send an HTTP DELETE request to the /metrics/$METRIC/request endpoint to stop the periodic calculation, including the id of periodic task that you want to cancel in the payload. For example:

curl -H "Authorization: Bearer $TOKEN" -X DELETE --location "$TRUSTY_ROUTE/metrics/spd/request" \
    -H "Content-Type: application/json" \
    -d "{
          \"requestId\": \"3281c891-e2a5-4eb3-b05d-7f3831acbb56\"
        }"

Verification

Use GET /metrics/{{metric}}/requests to list all currently active requests for the metric and verify the metric that you deleted is not shown. For example:

curl -H "Authorization: Bearer $TOKEN" -X GET --location "$TRUSTY_ROUTE/metrics/spd/requests"

Viewing bias metrics for a model

After you configure your model for bias monitoring, you can use the Open Data Hub dashboard to view and update the metrics that you configured.

Prerequisite

You configured bias metrics for your model as described in Creating a bias metric.

Procedure

In the Open Data Hub dashboard, click Model Serving.
On the Deployed models page, click the name of a model that you want to view bias metrics for.
To update the metrics shown on the page, follow these steps:
1. In the Metrics to display section, use the Select a metric list to select a metric to show on the page.
  
  Note
  Each time you select a metric to show on the page, an additional Select a metric list appears. This enables you to show multiple metrics on the page.
2. From the Time range list in the upper-right corner, select a value.
3. From the Refresh interval list in the upper-right corner, select a value.
  
  The metrics page shows the metrics that you selected.
Optional: To remove one or more metrics from the page, in the Metrics to display section, perform one of the following actions:
- To remove an individual metric, click the cancel icon (✖) next to the metric name.
- To remove all metrics, click the cancel icon (✖) in the Select a metric list.
Optional: To return to configuring bias metrics for the model, on the metrics page, click Configure in the upper-right corner.

Verification

The metrics page shows the metrics selections that you made.

Supported bias metrics

Open Data Hub supports the following bias metrics.

Statistical Parity Difference

Statistical Parity Difference (SPD) is the difference in the probability of a favorable outcome prediction between unprivileged and privileged groups. The formal definition of SPD is the following:

ŷ = 1 is the favorable outcome.
Dᵤ and Dₚ are the unprivileged and privileged group data.

You can interpret SPD values as follows:

A value of 0 means that the model is behaving fairly for a selected attribute (for example, race, gender).
A value in the range -0.1 to 0.1 means that the model is reasonably fair for a selected attribute. Instead, you can attribute the difference in probability to other factors, such as the sample size.
A value outside the range -0.1 to 0.1 indicates that the model is unfair for a selected attribute.
A negative value indicates that the model has bias against the unprivileged group.
A positive value indicates that the model has bias against the privileged group.

Disparate Impact Ratio

Disparate Impact Ratio (DIR) is the ratio of the probability of a favorable outcome prediction for unprivileged groups to that of privileged groups. The formal definition of DIR is the following:

ŷ = 1 is the favorable outcome.
Dᵤ and Dₚ are the unprivileged and privileged group data.

The threshold to identify bias depends on your own criteria and specific use case.

For example, if your threshold for identifying bias is represented by a DIR value below 0.8 or above 1.2, you can interpret the DIR values as follows:

A value of 1 means that the model is fair for a selected attribute.
A value of between 0.8 and 1.2 means that the model is reasonably fair for a selected attribute.
A value below 0.8 or above 1.2 indicates bias.

QUICK LINKS

STAY IN TOUCH