Info alert:Important Notice

Please note that more information about the previous v2 releases can be found here. You can use "Find a release" search bar to search for a particular release.

Experimenting with models in the gen AI playground

Table of Contents

Experimenting with models in the gen AI playground

Experimenting with models in the gen AI playground

Use the generative AI (gen AI) playground feature in Open Data Hub to evaluate, test, and interact with foundation and custom models in your project. You can test prompt engineering with Retrieval-Augmented Generation (RAG) and validate model behavior before using the model in an application.

Playground overview

The generative AI (gen AI) playground is an interactive environment within the Open Data Hub dashboard where you can prototype and evaluate foundation models, custom models, and Model Control Protocol (MCP) servers before you use them in an application.

You can test different configurations, including retrieval augmented generation (RAG), to determine the right assets for your use-case. After you find an effective configuration, you can retrieve a Python template that serves as a starting point for building and iterating in a local development environment.

Note	Important: The playground is a stateless environment. If you refresh your browser or end your session, all chat history and parameter settings will be lost.

Core capabilities

The playground includes a core set of features:

Interact with models: You can chat with both foundational and custom-deployed models.
Test with RAG: You can test prompt engineering with document-based Retrieval-Augmented Generation (RAG) by uploading your own documents for the model to use as context.
Integrate MCP servers: You can authorize and interact with approved Model Control Protocol (MCP) servers and their tools.
Export configurations: You can export prompts and parameter configurations as a code template to iterate with in your local IDE.

Playground prerequisites

Before you can configure and use the gen AI playground feature, you must meet prerequisites at both the cluster and user levels.

Cluster administrator prerequisites

Before a user can configure a playground instance, a cluster administrator must complete the following setup tasks:

Ensure that Open Data Hub is installed on an OpenShift Container Platform cluster running version 4.19 or later.
Set the value of the spec.dashboardConfig.genAiStudio dashboard configuration option to true. For more information, see Dashboard configuration options.
If using Open Data Hub groups, add users to the odh-users and odh-admins OpenShift group.
Ensure that the Llama Stack Operator is enabled on the OpenShift Container Platform cluster by setting its managementState field to Managed in the DataScienceCluster custom resource (CR) of the Open Data Hub Operator. For more information, see Activating the Llama Stack Operator.
Configure Model Control Protocol (MCP) servers to test models with external tools. For more information, see Configuring model control protocol servers.

User prerequisites

After the cluster administrator completes the setup, you must complete the following tasks before you can configure your playground instance:

You are logged in to Open Data Hub.
If you are using Open Data Hub groups, you are a member of the appropriate user or admin group.
Create a project. The playground instance is tied to a project context. For more information, see Creating a project.
Add a connection to your project. For more information about creating connections, see Adding a connection to your project.
Deploy a model in your project and make it available as an AI asset endpoint. For more information, see Deploying models on the model serving platform.

After you complete these tasks, the project is ready for you to configure your playground instance.

Model and runtime requirements for the playground

To successfully use the retrieval augmented generation (RAG) and Model Control Protocol (MCP) features in the playground, the model you deploy must meet specific requirements. Not all models offer the same capabilities.

Key model selection factors

Tool calling capabilities

The model must support tool calling to interact with the playground’s RAG and MCP features. You must check the model card (for example, on Hugging Face) to verify this capability. For more information, see Tool calling in the VLLM documentation.

Context length

Models with larger context windows are recommended for RAG applications. A larger context window allows the model to process more retrieved documents and maintain longer conversation histories.

vLLM version and configuration

Tool calling functionality depends heavily on the version of vLLM used in your model serving runtime.

Version: Use the latest vLLM version included in Open Data Hub for optimal compatibility.
Runtime arguments: You must configure specific runtime arguments in the model serving runtime to enable tool calling. Common arguments include (not exhaustive):
- --enable-auto-tool-choice
- --tool-call-parser
  
  For more information, see Tool calling in the VLLM documentation.

Important

If these requirements are not met, the model might fail to search documents or execute tools without returning a clear error message.

Example model configuration

The following table describes an example configuration for the Qwen/Qwen3-14B-AWQ model for use in the playground. You can use this as a reference when configuring your own model runtime arguments.

Table 1. Example configuration for Qwen/Qwen3-14B-AWQ
Field	Configuration Details
Model	Qwen/Qwen3-14B-AWQ
vLLM Runtime	vLLM NVIDIA GPU ServingRuntime for KServe
Hardware Profile	NVIDIA A10G (24GB VRAM)
Custom Runtime Arguments	`--dtype=auto` `--max-model-len=32768` `--enable-auto-tool-choice` `--tool-call-parser=hermes` `--reasoning-parser=qwen3` `--gpu-memory-utilization=0.90`

Configuring Model Control Protocol (MCP) servers

A cluster administrator must configure and enable the Model Control Protocol (MCP) servers at the platform level before users can interact with external tools in the Generative AI Playground. This configuration is done by creating a ConfigMap in the redhat-ods-applications namespace, which holds the necessary information for each MCP server.

Prerequisites

You have cluster admin privileges for your OpenShift cluster.
You have installed the OpenShift CLI (oc) as described in the appropriate documentation for your cluster:
- Installing the OpenShift CLI for OpenShift Container Platform

Procedure

Create a file named gen-ai-aa-mcp-servers.yaml with the following YAML content. You can add multiple server entries under the data: field.

kind: ConfigMap
apiVersion: v1
metadata:
  name: gen-ai-aa-mcp-servers
  namespace: redhat-ods-applications
data:
  GitHub-MCP-Server: |
    {
      "url": "https://api.githubcopilot.com/mcp/x/repos/readonly",
      "description": "The GitHub MCP server enables exploration and interaction with repositories, code, and developer resources on GitHub. It provides programmatic access to repositories, issues, pull requests, and related project data, allowing automation and integration within development workflows. With this service, developers can query repositories, discover project metadata, and streamline code-related tasks through MCP-compatible tools."
    }

Important

The ConfigMap key (GitHub-MCP-Server) is case-sensitive and must be unique. The content provided under this key must be valid JSON format.

Apply the ConfigMap to the cluster by running the following command:
```
oc apply -f gen-ai-aa-mcp-servers.yaml
```

Verification

Confirm that the ConfigMap was successfully applied by running the following command:

oc get configmap gen-ai-aa-mcp-servers -n redhat-ods-applications -o yaml | grep GitHub-MCP-Server

The output should contain the key name, confirming its successful creation:
```
GitHub-MCP-Server: |
```

About the AI assets endpoint page

The AI asset endpoints page is a central dashboard for managing the generative AI assets available for you to use within your project.

The page organizes assets into two categories:

Models: Lists all generative AI models deployed in your project that have been designated as available assets. For a model to be available, you must select the Add as AI asset endpoint check box when deploying it. For more information, see Deploying models on the model serving platform.
MCP Server: Lists all available MCP servers associated with your project.

The primary purpose of this page is to provide a starting point for using these assets. From here, you can perform actions such as adding a model to a playground instance for testing.

Important

The assets listed on the AI assets endpoint page page are scoped to your currently selected project. You will only see models and servers that are deployed and available within that specific project.

Configuring a playground for your project

Configure a generative AI (gen AI) playground for your project, so that you can interact with your deployed generative AI models and connect to backend servers, such as the Model Control Protocol (MCP).

Prerequisites

You have created a project.
You have deployed a model in your project and added your model as an AI asset endpoint.
If your cluster administrator has configured Model Control Protocol (MCP) servers, they are accessible within your OpenShift environment.

Procedure

Follow these steps to configure the playground:

Perform one of the following actions:
1. From the Open Data Hub dashboard side navigation menu, click Gen AI studio → Playground.
  1. Select the project containing your model deployment from the Project drop-down list.
  2. Click Create playground.
    
    The Configure playground dialog opens.
2. From the Open Data Hub dashboard side navigation menu, click Gen AI studio → AI asset endpoints.
  1. Select the project containing your model deployment from the Project drop-down list.
  2. Click the Models tab.
  3. Locate the model that you want to create a playground for, and then click Add to playground
    
    The Configure playground dialog opens.
Select the check box next to the model deployment you want to interact with in this playground instance.
Click Create.

Wait for the playground interface to finish loading.
Optional: Expand the MCP servers section. Select the checkbox for the appropriate MCP server instance to connect to.

Verification

The playground interface loads successfully.
The Model details section displays information about the selected model deployment.
The selected MCP server shows a successful connection status within the MCP servers section.

Testing baseline model responses

Use the playground to test and evaluate your model’s baseline responses.

Prerequisites

You have created a playground for your deployed model.

Procedure

To test your model, follow these steps:

From the Open Data Hub dashboard, click Gen AI studio → Playground.
From the model list, select the model that you want to test.
Adjust the following model parameters as needed:
1. Temperature: Control the randomness of the model’s output. Use values between 0 and 2.
  
  The temperature value directly influences creativity: * Values near 0 (0-0.3): Produce deterministic and factual responses, and are recommended for objective or factual tasks. * Values around 0.7: A common default for balanced output. * Values near 1 (0.7-1): Increase creativity and randomness, and are recommended for generative or creative tasks. * Values above 1 (such as 2): Typically produce incoherent output.
2. Streaming: Show the LLM’s response as it is being generated. This is helpful for testing model latency and seeing the model’s progress in real time. When streaming is off, the full response will not render until it is complete.
3. System instructions: Review or edit the text to define the context, persona, or instructions for the model. The playground provides a default prompt.
In the chat input field, type a query.
Click Send.
Observe the model’s response.

Verification

The model provides a response based on its general knowledge or pre-trained data.

Testing your model with retrieval augmented generation (RAG)

You can enhance your model’s responses by providing it with contextual information from your own documents using retrieval augmented generation (RAG). You can upload documents to the vector database associated with the playground to provide context for your model’s responses.

Important

The RAG feature of the gen AI playground is currently configured to work only with an inline vector database. There is currently no mechanism to configure the playground to connect this RAG feature to an external or remote vector database.

Prerequisites

You have configured a playground for your project.
You have the document files ready to upload. The supported file formats are PDF, DOC, or CSV. You can upload up to 10 files, with a maximum size of 10MB per file.

Procedure

From the Open Data Hub dashboard, click Gen AI studio → Playground.
In the Playground interface, click the toggle in the RAG section and then expand the section.
Click Upload.

The Upload files dialog opens.
Drag and drop your file or click to browse and select a file from your local system.
Optional: Adjust the Maximum chunk length and Chunk overlap and Delimiter values as needed for your document type. For more information about these settings, see Understanding RAG settings.
Click Upload.

Wait for the file to finish processing. A Source uploaded notification appears, and the file is listed under Uploaded files.
Repeat these steps to upload additional files if needed.
In the System instructions field, review or edit the text to define the context, persona, or instructions for the model. The playground provides a default prompt.
In the chat input field, ask a question related to your documents that the model would not know otherwise.

Observe the model’s response.

Tip

If a model is reluctant to use the RAG feature (its knowledge search tool), you can modify the prompt in the System instructions field to explicitly guide its behavior.

You can refine the system prompt by including directives such as:

To force use: "You MUST use the knowledge_search tool to obtain updated information."
To specify context: "Always search the knowledge base before answering questions about company policies, recent events, or specific documentation."

This ensures the model actively utilizes the available RAG documents rather than relying solely on its pre-trained data.

Verification

The model retrieves information from the uploaded documents to answer the question.

Understanding RAG settings

When you upload a document for retrieval augmented generation (RAG), you can configure the following settings to optimize how the text is processed.

Maximum chunk length	The maximum word count for each text section ("chunk") created from your uploaded files. Smaller chunks are recommended for precise data retrieval. Larger chunks are recommended for tasks requiring broader context, such as summarization.
Chunk overlap	The number of words from the end of one text section (chunk) that are repeated at the start of the next one. This overlap helps maintain continuous context across chunks, improving model responses. For example, the following sentence is chunked differently depending on the chunk overlap: "Chunk overlap can improve the quality of model responses." Maximum chunk length = 4, Chunk overlap = 1 Chunk overlap can improve improve the quality of of model responses. Maximum chunk length = 4, Chunk overlap = 0 Chunk overlap can improve the quality of model responses.
Delimiter	A character or string that specifies where a text chunk should end. This helps define text boundaries alongside maximum chunk length and overlap, ensuring sentences or paragraphs remain intact. Examples of delimiters: `.` (period) — splits at sentence boundaries `\n` (newline) — splits at paragraph boundaries `;` (semicolon) — splits at clause boundaries For example, the following sentence is split as follows depending on the delimiter: "This is the first sentence. This is the second sentence." Maximum chunk length = 4 , Chunk overlap = 0 This is the first sentence. This is the second sentence. Maximum chunk length = 4 , Chunk overlap = 0, Delimiter = 0 This is the first sentence. This is the second sentence.

Testing with model control protocol (MCP) servers

Authorize and interact with connected MCP servers to use their integrated tools directly from the playground chat.

Prerequisites

You have deployed a model with tool-calling capabilities enabled in your project.
You have configured a playground instance for your project.
A cluster administrator has configured an MCP server, and the server is listed and available in the MCP servers section of your playground.

Procedure

In the MCP servers section, select the checkbox for the server that you want to use.
Click the Auth icon next to the server name.

The Authorize MCP server dialog opens.
If the server requires a token, enter the token in the Access token field and click Authorize.

A Connection successful message appears.

Note

Authorization tokens for MCP servers are stored only for the current browser session. If you close your browser, you must re-authorize the server.
Click Close.
Click the View tools (wrench) icon for the same MCP server.

A modal appears, listing all available tools for that server. You can copy a tool name to use in the chat.
In the chat input field, type a query that uses one of the available tools.
Click the Send button or press Enter.

Verification

The AI bot responds, indicating it is using the tool.
The bot provides the output from the tool.

Exporting your playground configuration

Export your gen AI playground configuration as a Python code template so that you can use it in your local development environment, such as a notebook or IDE.

Important

This code is a template and is not a runnable script. It provides a starting point that shows your configuration, including the model, MCP tools, and RAG files used.

Prerequisites

You have configured your playground instance with the settings that you want to capture in your code template. This includes:
- Selecting a model.
- Setting model parameters, such as model temperature, to your desired values.
- (Optional) Uploading files and enabling the RAG function.
- (Optional) Authorizing and enabling any MCP servers that you intend to use.

Procedure

In the playground, configure your desired settings.
Click the View code button.

A dialog opens, displaying a Python code template.
Click Copy code.
Paste the code into your local development environment.

Verification

Review the pasted code in your local environment.
Confirm that the template includes the correct model, MCP tools, and RAG files from your playground configuration.

Updating your playground configuration

You can update the configuration of your playground instance to add new models, re-register models that were stopped, or change the existing configuration.

Warning

Updating the playground configuration will permanently delete the inline vector database for all users in your project.

Prerequisites

You have configured a playground for your project.

Procedure

From the Open Data Hub dashboard, click Gen AI studio → Playground.
Select the project containing your model deployment from the Project dropdown list.
In the upper-right corner of your playground, click the action menu (⋮) and select Update configuration.
On the configuration screen, select or clear the checkboxes for the models you want to make available.
Click Update.

Verification

The playground configuration is updated with a new selection of models.

Deleting a playground from your project

You can delete a playground instance from a project. This removes instance for all users who have access to that project.

Prerequisites

You have configured a playground for your project.

Procedure

From the Open Data Hub dashboard, click Gen AI studio → Playground.
Select the project containing your model deployment from the Project drop-down list.
In the upper-right corner of your playground, click the action menu (⋮) and select Delete playground.

Note

This action deletes the playground for every user in the project.
Confirm the deletion.

Verification

Confirm that the playground is deleted from the project.
In the Gen AI studio → AI asset endpoints page, models no longer show the Try in playground button and instead show the Add to playground button.

Next steps

You have successfully deployed and tested a model using the playground with RAG and MCP tools. For more information on the next steps, see the following resources:

Developing in an IDE: Working in your data science IDE

Learn how to access your workbench IDE (JupyterLab, code-server, or RStudio Server) to develop models.

Troubleshooting playground issues

If you encounter issues while using the playground, refer to the following scenarios and solutions.

The chatbot thinks indefinitely

Problem After sending a query, the chatbot shows a thinking indicator but never returns a response.

Cause This issue often occurs when the query or the accumulated context exceeds the maximum context length (sequence length) configured for the model.

Solution

In the OpenShift AI dashboard, click the Applications menu and select OpenShift Console.
Navigate to your project’s namespace.
Check the logs for the following pods:
- The playground pod: lsd-genai-playground-<id>
- The model serving pod: <model-name>-predictor-<id>
Look for errors related to context length limits or memory (OOM) constraints.

The model does not use RAG data

Problem The model answers questions using its training data instead of searching the uploaded RAG documents.

Solution

Update the System instructions in the playground to explicitly force the use of the search tool.

Example: "You MUST use the knowledge_search tool to obtain updated information."
Example: "Always search the knowledge base before answering questions about company policies."

MCP servers are missing from the UI

Problem The MCP servers section is empty or not visible in the playground configuration.

Cause MCP servers must be configured at the cluster level by an administrator.

Solution

Contact your OpenShift AI administrator to configure the required MCP servers. Administrators can find a list of available servers in the Red Hat OpenShift AI documentation.

The model fails to call MCP tools

Problem The model attempts to use a tool but fails, or outputs raw XML tags (e.g., <tool_call>).

Cause

The model does not support tool calling.
The vLLM runtime arguments are missing or incorrect.
Known Issue: Some models (e.g., Qwen3-4B-Instruct) may output raw tags if the correct reasoning parser is not available in the current vLLM version.

Solution

Verify the model supports tool calling on its Hugging Face model card.
In the model’s deployment settings, ensure the following Custom Runtime Arguments are present:
- --enable-auto-tool-choice
- --tool-call-parser
If the model outputs <think> tags, you can hide them by adding /no_think to your prompt.

QUICK LINKS

STAY IN TOUCH