Discovering NLU in Cloud Pak for Data

H Singh
4 min readJun 15, 2021

I have been working with my client on an AI project for two years. Due to compliance and regulatory requirements the client has decided to move from the IBM Cloud Watson API to the IBM Cloud Pak for Data (CP4D) on-prem. We mapped out the API during our initial discussion and made sure that CP4D supports both Watson Knowledge Studio (WKS) and Natural Language Understanding (NLU). During our initial investigation, it looked good, and both Cloud API products were supported in CP4D. The CP4D Discovery API has NLU support and WKS mapped directly to the same product.

I started the migration journey to CP4D with my client. Deployment of OpenShift and Cloud Pak’s took roughly two weeks using ansible automation scripts (if you need the script fctoibm GitHub link). After deploying and setting up the system, the team started to migrate from the cloud deployment to the on-prem system. Two issues were discovered during our migration; first, the complete export of WKS data from cloud to on-prem workspace is not supported. Only partial data can be exported like model’s entity/relationship schema. This would be a prime candidate for an enhancement to the product by creating a plugin to copy the data from one workspace to another workspace. I could not completely resolve the issue due to the limitation of data and database access on the shared WKS cloud service. Documentation was available on exporting some items such as AI models, entity/relationship schema, and training documents. Our team ended up doing a manual migration for WKS workspace.

Soon after exporting the AI models to CP4D in WKS, the client wanted to test the NLU feature provide by Discovery. Two things jumped out right away, the CP4D NLU feature request and response is different than how Cloud NLU API works. Any customer who has integrated the NLU would have to the write some integration code when migrating to Discovery NLU. I will get into the details soon, but to help the client, earn their confidence, and keep the product as a viable solution, I jumped into action and wrote integration code to solve the issue. As I have learned from my previous mistake, do not try to boil the ocean, solve the problem in small components and integrate more features incrementally. My initial thought was to create an integration API similar to the Cloud NLU service by wrapping the Discovery NLU API and deploy on OCP. However, I took the path of lesser resistance and created a standalone Java class to test the water first.

Let’s get to the heart of the problem. Instead of describing the differences in a long and winding paragraph, I created the table below for easier comparison.

To address these differences, you can download the Java project I created from https://github.com/fctoibm/discovery_nlu.git which contains all the necessary jar files and classes. Some of the variables to run the Java class will require an understanding of the usage of Discovery API.

Steps to implement

  1. One or more machine learning WKS models deployed in CP4D.
  2. Provisioned Discovery instance in CP4D.
  3. Login into the CP4D instance using your username/password and obtain the provisioned Discovery instance {bearer token} for the provisioned service instance. https://{cpd_cluster_host}{:port}/zen/#/myInstances

4. Copy the {discovery-instance-URL} information with instance ID. (Hint: instance id is part of the URL )

5. Get the project_id by using following curl request

curl — location — request GET {discovery-instance-URL}/api/v2/projects?version=2019–11–29' — header 'Authorization: Bearer {bearer token}'

6. Get the collection_id by using following curl request

curl — location — request GET {discovery-instance-URL}/api/v2/projects/{project_id}/collections?version=2019–11–29' — header 'Authorization: Bearer {bearer token}'

7. Update the parameter in java class, and run the main java program you should see something similar to following

I hope you found the above helpful information; my next goal is to wrap the java program into the docker container and provide the REST API deployed on OpenShift.

--

--