Databricks
This guide provides detailed instructions for integrating Databricks with DinMo
Databricks is a widely-used platform for data engineering, data science, and analytics, built around Apache Spark and SQL. This guide provides detailed instructions for integrating Databricks with DinMo, covering setup, required permissions, and troubleshooting. It is designed to help you set up a secure and efficient connection between Databricks and DinMo, similar to our Redshift integration.
Prerequisites
Before starting the integration, ensure you have:
Access to a Databricks workspace (AWS, Azure, or GCP).
Permissions to create users, schemas, and assign privileges in Databricks.
A DinMo workspace with access to add new data sources.
Use DinMo’s fixed IPs when network security policies in you organization require specific IP allowlisting, such as firewall or VPC configurations. See our Networking section for more guidance and find the IPs DinMo use to connect to your systems.
Connecting Databricks to DinMo
Step 1: Create Connection Credentials for DinMo
To enable DinMo to connect securely to your Databricks workspace, you'll need to generate a Personal Access Token and gather specific connection details. Follow these steps to set up the necessary credentials:
Access Your Databricks Workspace
Log in to your Databricks Account Console.
Navigate to the Workspaces page.
Select the workspace that DinMo will connect to.
Click Open workspace to enter the selected workspace.
Generate a Personal Access Token
In the workspace, click on your profile icon in the top-right corner and select Settings.
From the left-hand menu, choose Developer (or Access Tokens).
Click on Generate New Token.
Provide a name for the token, such as "DinMo Integration".
For the Lifetime, you can leave it blank to prevent the token from expiring.
Click Generate and securely save the token value; you'll need it later for DinMo's configuration.
Step 2: Create DinMo Schemas in Databricks
Before proceeding with DinMo's configuration, you need to set up specific schemas and grant the necessary permissions in your Databricks workspace. This setup allows DinMo to store technical data and access the tables and views required for its operations.
Permissions Required
DinMo requires two types of permissions:
Data Tables and Views: For the tables and views you want to use in DinMo (which can reside in any catalog), grant the following permissions to the DinMo user:
SELECT
MODIFY
DinMo Technical Schemas: For DinMo's dedicated schemas (
dinmo_segments
,dinmo_stats
,dinmo_delta_storage
, anddinmo_predictions
), grant ALL PRIVILEGES to the DinMo user.
Creating Schemas and Granting Permissions
Optional – Specify a Catalog: You may choose to include a Databricks catalog, which is the top level in Unity Catalog's three-tier namespace (catalog.schema.table
). If you specify a catalog, DinMo's technical schemas will be created within that catalog. If left unspecified, the schemas will be created in the default catalog. You can still access data from other catalogs for model creation by referencing the full path in DinMo's SQL interface.
Steps:
Create DinMo Technical Schemas
Run the following SQL commands in your Databricks workspace to create the necessary schemas:
Note: Replace
[catalog_name]
with your actual catalog name if you're using a specific catalog. Omit[catalog_name]
if using the default catalog.Grant Permissions to DinMo User
Grant the necessary privileges to the DinMo user by executing the following commands:
Why Create Dedicated Schemas?
Why Create Dedicated Schemas?
DinMo uses dedicated schemas to manage its technical data, including segment queries, activation snapshots, and statistical information. Organizing this data in separate schemas enhances:
Organization: Keeps technical data isolated from your business data.
Security: Allows for precise access control and auditing.
Performance Management: Improves query performance by optimizing how data is stored and accessed.
Step 3: Add Databricks as a Source in DinMo
Navigate to Workspace Settings:
In your DinMo workspace, go to Workspace Settings.
Add New Source:
Click on the Sources tab.
Select Add New Source.
Choose Databricks:
From the list of available sources, select Databricks.
Collect Connection Details within your Databricks account
Within the workspace, go to the Compute section from the left-hand menu.
Decide whether DinMo will connect to a SQL Warehouse or an All-Purpose Cluster:
For SQL Warehouses:
Select the desired SQL Warehouse.
Navigate to the Connection Details tab.
For All-Purpose Clusters:
Select the cluster you intend to use.
Go to the Configuration tab.
Scroll down and expand Advanced Options.
Click on the JDBC /ODBC tab.
Gather the following details:
Server Hostname: The URL of your Databricks workspace.
Example:
adb-1234567890123456.7.azuredatabricks.net
.Found in the workspace URL or under Compute > SQL Warehouses > Connection Details.
HTTP Path: The HTTP Path to your SQL Warehouse or cluster.
Found under Compute > SQL Warehouses or Clusters > Connection Details.
Port: Typically
443
for HTTPS connections.Access Token: Personal access token created in Step 1
Schema: The default schema that DinMo will use to connect to your Databricks
(Optional) Catalog: If DinMo technical schemas in Step 2 were created in a specific catalog, this field should match that catalog. If left empty, it defaults to the workspace’s default catalog.
Finish Databricks configuration within DinMo
Fill in the connection form with the with the collected credentials
Test and Save:
Click Save & Test to verify the connection.
If successful, the Databricks source is now connected to DinMo.
Before allowing to create a new source DinMo will check:
Network connectivity
Databricks credentials
Permission to list schemas and tables (at least in the initial schema)
Permission to write to DinMo's dedicated schemas
dinmo_segments
,dinmo_stats
,dinmo_delta_storage
anddinmo_predictions
Additional Recommendations
Dedicated User Account: Consider creating a specific user account for DinMo. This practice enhances security by allowing better auditing and access control.
Permissions: Ensure that the account or token used has the necessary permissions to access the compute resources and data required by DinMo.
Security Note: Always store your access tokens and credentials securely. Do not share them publicly or commit them to version control systems.
By completing these steps, DinMo will be able to connect to your Databricks workspace securely, allowing for seamless integration and data operations.
Troubleshooting Common Issues
When integrating DinMo with Databricks, you might encounter some common issues. Below are some of the main problems and their solutions.
Mismatch of Catalog Specification
Problem: The catalog specified in DinMo's configuration doesn't match the catalog used when creating the technical schemas in Step 2.
Solution: Ensure that the catalog you enter during DinMo's configuration is the same one where you created the DinMo technical schemas in Step 2. If you left the catalog unspecified (i.e., you used the default catalog) when creating the schemas, then leave the catalog field blank in DinMo's configuration. This alignment is crucial for DinMo to locate and access the necessary schemas.
Insufficient Permissions for DinMo User
Problem: DinMo cannot access required tables or schemas due to inadequate permissions granted to the DinMo user.
Solution: Verify that the DinMo user has the necessary permissions:
For Data Tables and Views: Ensure
SELECT
permissions are granted on all tables and views you want to use in DinMo.For DinMo Technical Schemas: Ensure
ALL PRIVILEGES
are granted on the schemas created in Step 2. Also ensure that the email in the SQL script is the one linked to the Databricks account that created the API key.Action: Re-run the permission-granting SQL commands if necessary, replacing placeholders with your actual catalog name and DinMo user email.
Invalid or Expired Personal Access Token
Problem: DinMo fails to connect to Databricks due to an invalid or expired Personal Access Token.
Solution:
Generate a New Token:
Go to your Databricks workspace and click on your profile icon in the top-right corner.
Select Settings and navigate to the Access Tokens section under Developer (or directly under User Settings).
Click on Generate New Token.
Provide a meaningful name (e.g., "DinMo Integration") and, if possible, set the token to never expire by leaving the lifetime field blank.
Click Generate and securely copy the new token.
Update DinMo Configuration:
Go to DinMo's configuration settings for the Databricks connection.
Replace the old token with the new one you just generated.
Save the configuration and attempt to reconnect.
If you encounter an error or question not listed below and need assistance, don't hesitate to reach out to your account manager. We're here to help.
Last updated