#Upgrade pip to the latest version
!pip3 install --upgrade pip
#Install Boto3
!pip3 install boto3
#Install Boto3 libraries
import os
import boto3
from botocore.client import Config
from boto3 import session
#Check Boto3 version
!pip3 show boto3
Info alert:Important Notice
Please note that more information about the previous v2 releases can be found here. You can use "Find a release" search bar to search for a particular release.
Working with data in an S3-compatible object store
- Prerequisites
- Creating an S3 client
- Listing available buckets in your object store
- Creating a bucket in your object store
- Listing files in your bucket
- Downloading files from your bucket
- Uploading files to your bucket
- Copying files between buckets
- Deleting files from your bucket
- Deleting a bucket from your object store
- Additional resources
If you have data stored in an S3-compatible object store such as Ceph, MinIO, or IBM Cloud Object Storage, you can access the data from your workbench.
Prerequisites
-
You have created a workbench in Open Data Hub. For more information, see Creating a workbench and selecting an IDE.
-
You have access to an S3-compatible object store.
-
You have the credentials for your S3-compatible object storage account.
-
You have files to work with in your object store.
-
You have configured a data connection for your workbench based on the credentials of your S3-compatible storage account. For more information, see Using data connections.
Creating an S3 client
To interact with data stored in an S3-compatible object store from your workbench, you must create a local client to handle requests to the AWS S3 service by using an AWS SDK such as Boto3.
Boto3 is an AWS SDK for Python that provides an API for creating and managing AWS services, such as AWS S3 or S3-compatible object storage.
After you have configured a Boto3 client for the S3 service from your workbench, you can connect and work with data in your S3-compatible object store.
-
You have access to an S3-compatible object store.
-
You have stored files in a bucket on your object store.
-
You have logged in to Open Data Hub.
-
If you are using Open Data Hub groups, you are part of the user group or admin group (for example,
odh-users
orodh-admins
) in OpenShift. -
You have created a data science project.
-
You have added a workbench to the project using a Jupyter notebook image.
-
You have configured a data connection for your workbench based on the credentials of your S3-compatible storage account.
-
From the Open Data Hub dashboard, click Data Science Projects.
-
Click the name of the project that contains the workbench that you want to start.
-
If the workbench that you want to use is not already running, click the action menu (⋮) beside the workbench, and click Start.
The Status column changes from Stopped to Starting when the workbench server is starting, and then to Running when the workbench has successfully started.
-
After the workbench has started, click the Open link next to your workbench.
Your Jupyter environment window opens.
-
On the toolbar, click the Git Clone icon and then select Clone a Repository.
-
In the Clone a repo dialog, enter the following URL
https://github.com/opendatahub-io/odh-doc-examples.git
and then click Clone. -
In the file browser, select the newly-created
odh-doc-examples
folder. -
Double-click the newly created
storage
folder.You see a Jupyter notebook named
s3client_examples.ipynb
. -
Double-click the
s3client_examples.ipynb
file to launch the notebook.The notebook opens. You see code examples for the following tasks:
-
Installing Boto3 and required Boto3 libraries
-
Creating an S3 client session
-
Creating an S3 client connection
-
Listing files
-
Creating a bucket
-
Uploading a file to a bucket
-
Downloading a file from a bucket
-
Copying files between buckets
-
Deleting an object from a bucket
-
Deleting a bucket
-
-
In the notebook, locate the following instructions to install Boto3 and its required libraries, and run the code cell:
The instructions in the code cell update the Python Package Manager (pip) to the latest version, install Boto3 and its required libraries, and display the version of Boto3 installed.
-
Locate the following instructions to create an S3 client and session. Run the code cell.
#Creating an S3 client #Define credentials key_id = os.environ.get(AWS_ACCESS_KEY_ID) secret_key = os.environ.get(AWS_SECRET_ACCESS_KEY) endpoint = os.environ.get(AWS_S3_ENDPOINT) region = os.environ.get(AWS_DEFAULT_REGION) #Define client session session = Boto3.session.Session(aws_access_key_id=key_id, aws_secret_access_key=secret_key) #Define client connection s3_client = Boto3.client(s3, aws_access_key_id=key_id, aws_secret_access_key=secret_key,aws_session_token=None, config=Boto3.session.Config(signature_version=s3v4), endpoint_url=endpoint, region_name=region)
The instructions in the code cell configure an S3 client and establish a session to your S3-compatible object store.
-
To use the S3 client to connect to your object store and list the available buckets, locate the following instructions to list buckets and run the code cell:
s3_client.list_buckets()
A successful response includes a
HTTPStatusCode
of200
and a list of buckets similar to the following output:HTTPStatusCode: 200, Buckets: [{Name: aqs086-image-registry, CreationDate: datetime.datetime(2024, 1, 16, 20, 21, 36, 244000, tzinfo=tzlocal ())}]
Listing available buckets in your object store
To list buckets that are available in your object store, use the list_bucket()
method.
-
You have cloned the
odh-doc-examples
repository to your workbench. -
You have opened the
s3client_examples.ipynb
file in your workbench. -
You have installed Boto3 and configured the S3 client.
-
In the notebook, locate the following instructions that lists available buckets and then run the code cell.
#List available buckets s3_client.list_buckets()
A successful response includes an HTTP request status code of
200
and a list of buckets, similar to the following output:'HTTPStatusCode': 200, 'Buckets': [{'Name': 'aqs086-image-registry', 'CreationDate': datetime.datetime(2024, 1, 16, 20, 21, 36, 244000, tzinfo=tzlocal( ))},
-
Locate the instructions that prints only the names of available buckets and execute the code cell.
#Print only names of available buckets for bucket in s3_client.list_buckets()[‘Buckets’]: print(bucket[‘Name’])
The output displays the names of the buckets, similar to the following example.
aqs086-image-registry aqs087-image-registry aqs135-image-registry aqs246-image-registry
Creating a bucket in your object store
To create a bucket in your object store from your workbench, use the create_bucket()
method.
-
You have cloned the
odh-doc-examples
repository to your workbench. -
You have opened the
s3client_examples.ipynb
file in your workbench. -
You have installed Boto3 and configured an S3 client.
-
In the notebook, locate the following instructions to create a bucket:
#Create bucket ` s3_client.create_bucket(Bucket='<bucket_name>')
-
Replace
<name_of_the_bucket>
with the name of the bucket that you want to create, as shown in the example, and then run the code cell.#Create bucket s3_client.create_bucket(Bucket='aqs43-image-registry')
The output displays an HTTP response status code of
200
, indicating a successful request.
-
Locate the instructions to list buckets and execute the code cell.
for bucket in s3_client.list_bucket()[‘Buckets’]: print(bucket[‘Name’])
The bucket that you created appears in the output.
Listing files in your bucket
To list files in a specific bucket, use the list_bucket_v2()
method.
-
You have cloned the
odh-doc-examples
repository to your workbench. -
You have opened the
s3client_examples.ipynb
file in your workbench. -
You have installed Boto3 and configured an S3 client.
-
In the notebook, locate the following code for listing files.
#List files #Replace <bucket_name> with the name of the bucket. bucket_name = ‘<bucket_name>’ s3_client.list_objects_v2(Bucket=bucket_name)
-
Replace
<bucket_name>
with the name of your own bucket, as shown in the example, and then run the code cell.#List files #Replace <bucket_name> with the name of the bucket. bucket_name = ‘aqs27-registry’ s3_client.list_objects_v2(Bucket=bucket_name)
The output displays information about the files that are available in the specified bucket.
-
Locate the code cell that lists only the names of the files.
#Print only names of files bucket_name = ‘<bucket_name>’ for key in s3_client.list_objects_v2(Bucket=bucket_name)[‘Contents’]: print(key[‘Key’])
-
Replace
_<bucket_name>_
with the name of your bucket, as shown in the example, and run the code cell:#Print only names of files bucket_name = ‘aqs27-registry’ for key in s3_client.list_objects_v2(Bucket=bucket_name)[‘Contents’]: print(key[‘Key’])
The output displays a list of file names that are available in the specified bucket.
-
Refine the previous query to specify a file path, by locating the following code cell:
bucket_name = ‘<bucket_name>’ for key in s3_client.list_objects_v2(Bucket=bucket_name,Prefix=’<start_of_file_path’)[‘Contents’]: print(key[‘Key’])
-
Replace
_<bucket_name>_
and_<start_of_file_path>_
with your own values and run the code cell.
Downloading files from your bucket
To download a file from your bucket to your workbench, use the download_file()
method.
-
You have cloned the
odh-doc-examples
repository to your workbench. -
You have opened the
s3client_examples.ipynb
file in your workbench. -
You have installed Boto3 and configured an S3 client.
-
In the notebook, locate the following instructions to download files from a bucket:
#Download file from bucket #Replace the following values with your own: #<bucket_name>: The name of the bucket. #<object_name>: The name of the file to download. Must include full path to the file on the bucket. #<file_name>: The name of the file when downloaded. s3_client.download_file('<bucket_name>','<object_name>','<file_name>')
-
Modify the code sample:
-
Replace
<bucket_name>
with the name of the bucket that the file is located in… Replace<object_name>
with the name of the file that you want to download. -
Replace
<file_name>
with the name and path that you want the file to be downloaded to, as shown in the example.s3_client.download_file('aqs086-image-registry', 'series35-image36-086.csv', '\tmp\series35-image36-086.csv_old')
-
-
Run the code cell.
-
The file that you downloaded appears in the path that you specified on your workbench.
Uploading files to your bucket
To upload files to your bucket from your workbench, use the upload_file()
method.
-
You have cloned the
odh-doc-examples
repository to your workbench. -
You have opened the
s3client_examples.ipynb
file in your workbench. -
You have installed Boto3 and configured an S3 client.
-
You have imported the files that you want to upload to your object store to your workbench.
-
In the notebook, locate the instructions to upload files to a bucket.
#Upload file to bucket #Replace <file_name>, <bucket_name>, and <object_name> with your values. #<file_name>: Name of the file to upload. This must include the full local path to the file on your notebook. #<bucket_name>: The name of the bucket to upload the file to. #<object_name>: The full key to use to save the file to the bucket. s3_client.upload_file('<file_name>', '<bucket_name>', '<object_name>')
-
Replace
_<file_name>_
,_<bucket_name>_
and_<object_name>_
with your own values, as shown in the example, and then run the code cell.s3_client.upload_file('image-973-series123.csv', 'aqs973-image-registry', '/tmp/image-973-series124.csv')
-
Locate the following instructions to list files in a bucket:
#Upload Verification for key in s3_client.list_objects_v2(Bucket='<bucket_name>')['Contents']: print(key['Key'])
-
Replace
<bucket_name>
with the name of your bucket, as shown in the example, and then run the code cell.#Upload Verification for key in s3_client.list_objects_v2(Bucket='aqs973-image-registry')['Contents']: print(key['Key'])
The file that you uploaded is displayed in the output.
Copying files between buckets
To copy files between buckets in your object store from your workbench, use the copy()
method.
-
You have cloned the
odh-doc-examples
repository to your workbench. -
You have opened the
s3client_examples.ipynb
file in your workbench. -
You have installed Boto3 and configured an S3 client.
-
You know the key of the source file that you want to copy, and the bucket that the file is stored in.
-
In the notebook, locate the following instructions to copy files between buckets:
#Copying files between buckets #Replace the placeholder values with your own. copy_source = { 'Bucket': '<bucket_name>', 'Key': '<key>' } s3_client.copy(copy_source, '<destination bucket>', '<destination_key>')
-
Within the
copy_source
block, replace_<bucket_name>_
with the name of the source bucket and_<key>_
with the key of the source file, as shown in the example.copy_source = { 'Bucket': 'aqs086-image-registry', 'Key': 'series43-image12-086.csv' }
-
Replace the
<destination_bucket>
with the name of the bucket to copy to, and<destination_key>
with the name of the key to copy to, as shown in the example. Execute the code cell.s3_client.copy(copy_source, 'aqs971-image-registry', '/tmp/series43-image12-086.csv')
-
Locate the following instructions to list objects in a bucket.
#Copy Verification bucket_name = '<bucket_name>' for key in s3_client.list_objects_v2(Bucket=bucket_name)['Contents']: print(key['Key'])
-
Replace
<bucket_name>
with the name of the destination bucket, as shown in the example, and run the code cell.#Copy Verification bucket_name = 'aqs971-image-registry' for key in s3_client.list_objects_v2(Bucket=bucket_name)['Contents']: print(key['Key']).
The file that you copied is displayed in the output.
Deleting files from your bucket
To delete files from your bucket from your workbench, use the delete_file()
method.
-
You have cloned the
odh-doc-examples
repository to your workbench. -
You have opened the
s3client_examples.ipynb
file in your workbench. -
You have installed Boto3 and configured an S3 client.
-
You know the key of the file you want to delete and the bucket that the file is located in.
-
In the notebook, locate the following instructions to delete files from a bucket:
#Delete files from bucket s3_client.delete_object(Bucket='<bucket_name>', Key='<object_key>')
-
Replace
<bucket_name>
with the name of your bucket and<key>
with the key of the file you want to delete, as shown in the example. Run the code cell.#Delete object from bucket s3_client.delete_object(Bucket='aqs971-image-registry', Key='/tmp/series43-image12-086.csv')
The output displays a HTTP response status code of
204
, which indicates that the request was successful.
-
Locate the following instructions to list files in a bucket:
#Delete Object Verification bucket_name = '<bucket_name>' for key in s3_client.list_objects_v2(Bucket=bucket_name)['Contents']: print(key['Key'])
-
Replace
_<bucket_name>_
with the name of your bucket, as shown in the example and run the code cell.#Delete Object Verification bucket_name = 'aqs971-image-registry' for key in s3_client.list_objects_v2(Bucket=bucket_name)['Contents']: print(key['Key'])
The deleted file does not appear in the output.
Deleting a bucket from your object store
To delete a bucket from your object store from your workbench, use the delete_bucket()
method.
-
You have cloned the
odh-doc-examples
repository to your workbench. -
You have opened the
s3client_examples.ipynb
file in your workbench. -
You have installed Boto3 and configured an S3 client.
-
You have ensured that the bucket that you want to delete is empty.
-
In the notebook, locate the following instructions to delete a bucket:
#Delete bucket s3_client.delete_bucket(Bucket='<bucket_name>')
-
Replace
<bucket_name>
with the name of the bucket that you want to delete and run the code cell.#Delete bucket s3_client.delete_bucket(Bucket='aqs971-image-registry')
The output displays an HTTP response status code of
200
, which indicates that the request was successful.
-
Locate the instructions to list buckets, and execute the code cell.
for bucket in s3_client.list_bucket()[‘Buckets’]: print(bucket[‘Name’])
The bucket that you deleted does not appear in the output.