Databricks

This guide provides detailed instructions for integrating Databricks with DinMo

Databricks is a widely-used platform for data engineering, data science, and analytics, built around Apache Spark and SQL. This guide provides detailed instructions for integrating Databricks with DinMo, covering setup, required permissions, and troubleshooting. It is designed to help you set up a secure and efficient connection between Databricks and DinMo, similar to our Redshift integration.

Prerequisites

Before starting the integration, ensure you have:

  • Access to a Databricks workspace (AWS, Azure, or GCP).

  • Permissions to create users, schemas, and assign privileges in Databricks.

  • A DinMo workspace with access to add new data sources.

Use DinMo’s fixed IPs when network security policies in you organization require specific IP allowlisting, such as firewall or VPC configurations. See our Networking section for more guidance and find the IPs DinMo use to connect to your systems.

Connecting Databricks to DinMo

Step 1: Create Connection Credentials for DinMo

To enable DinMo to connect securely to your Databricks workspace, you'll need to generate a Personal Access Token and gather specific connection details. Follow these steps to set up the necessary credentials:

  1. Access Your Databricks Workspace

    • Log in to your Databricks Account Console.

    • Navigate to the Workspaces page.

    • Select the workspace that DinMo will connect to.

    • Click Open workspace to enter the selected workspace.

  2. Generate a Personal Access Token

    • In the workspace, click on your profile icon in the top-right corner and select Settings.

    • From the left-hand menu, choose Developer (or Access Tokens).

    • Click on Generate New Token.

    • Provide a name for the token, such as "DinMo Integration".

    • For the Lifetime, you can leave it blank to prevent the token from expiring.

    • Click Generate and securely save the token value; you'll need it later for DinMo's configuration.

Step 2: Create DinMo Schemas in Databricks

Before proceeding with DinMo's configuration, you need to set up specific schemas and grant the necessary permissions in your Databricks workspace. This setup allows DinMo to store technical data and access the tables and views required for its operations.

Permissions Required

DinMo requires two types of permissions:

  1. Data Tables and Views: For the tables and views you want to use in DinMo (which can reside in any catalog), grant the following permissions to the DinMo user:

    • SELECT

    • MODIFY

  2. DinMo Technical Schemas: For DinMo's dedicated schemas (dinmo_segments, dinmo_stats, dinmo_delta_storage, and dinmo_predictions), grant ALL PRIVILEGES to the DinMo user.

Creating Schemas and Granting Permissions

Optional – Specify a Catalog: You may choose to include a Databricks catalog, which is the top level in Unity Catalog's three-tier namespace (catalog.schema.table). If you specify a catalog, DinMo's technical schemas will be created within that catalog. If left unspecified, the schemas will be created in the default catalog. You can still access data from other catalogs for model creation by referencing the full path in DinMo's SQL interface.

Steps:

  1. Create DinMo Technical Schemas

    Run the following SQL commands in your Databricks workspace to create the necessary schemas:

    -- Create schemas for DinMo's technical data
    CREATE SCHEMA IF NOT EXISTS [catalog_name].dinmo_delta_storage;
    CREATE SCHEMA IF NOT EXISTS [catalog_name].dinmo_segments;
    CREATE SCHEMA IF NOT EXISTS [catalog_name].dinmo_stats;
    
    -- If you're using DinMo's predictions module
    CREATE SCHEMA IF NOT EXISTS [catalog_name].dinmo_predictions;

    Note: Replace [catalog_name] with your actual catalog name if you're using a specific catalog. Omit [catalog_name] if using the default catalog.

  2. Grant Permissions to DinMo User

    Grant the necessary privileges to the DinMo user by executing the following commands:

    -- Grant full privileges on the technical schemas to the DinMo user
    -- Replace `dinmo_user@yourdomain.com` with the email of the Databricks account that created the API key.
    GRANT ALL PRIVILEGES ON SCHEMA [catalog_name].dinmo_delta_storage TO `dinmo_user@yourdomain.com`;
    GRANT ALL PRIVILEGES ON SCHEMA [catalog_name].dinmo_segments TO `dinmo_user@yourdomain.com`;
    GRANT ALL PRIVILEGES ON SCHEMA [catalog_name].dinmo_stats TO `dinmo_user@yourdomain.com`;
    
    -- If using DinMo's predictions module
    GRANT ALL PRIVILEGES ON SCHEMA [catalog_name].dinmo_predictions TO `dinmo_user@yourdomain.com`;

Why Create Dedicated Schemas?

Why Create Dedicated Schemas?

DinMo uses dedicated schemas to manage its technical data, including segment queries, activation snapshots, and statistical information. Organizing this data in separate schemas enhances:

  • Organization: Keeps technical data isolated from your business data.

  • Security: Allows for precise access control and auditing.

  • Performance Management: Improves query performance by optimizing how data is stored and accessed.

Step 3: Add Databricks as a Source in DinMo

  1. Navigate to Workspace Settings:

    • In your DinMo workspace, go to Workspace Settings.

  2. Add New Source:

    • Click on the Sources tab.

    • Select Add New Source.

  3. Choose Databricks:

    • From the list of available sources, select Databricks.

  4. Collect Connection Details within your Databricks account

    • Within the workspace, go to the Compute section from the left-hand menu.

    • Decide whether DinMo will connect to a SQL Warehouse or an All-Purpose Cluster:

      • For SQL Warehouses:

        • Select the desired SQL Warehouse.

        • Navigate to the Connection Details tab

      • For All-Purpose Clusters:

        • Select the cluster you intend to use.

          • Go to the Configuration tab.

          • Scroll down and expand Advanced Options.

          • Click on the JDBC /ODBC tab.

    • Gather the following details:

      • Server Hostname: The URL of your Databricks workspace.

        • Example: adb-1234567890123456.7.azuredatabricks.net.

        • Found in the workspace URL or under Compute > SQL Warehouses > Connection Details.

      • HTTP Path: The HTTP Path to your SQL Warehouse or cluster.

        • Found under Compute > SQL Warehouses or Clusters > Connection Details.

      • Port: Typically 443 for HTTPS connections.

      • Access Token: Personal access token created in Step 1

      • Schema: The default schema that DinMo will use to connect to your Databricks

      • (Optional) Catalog: If DinMo technical schemas in Step 2 were created in a specific catalog, this field should match that catalog. If left empty, it defaults to the workspace’s default catalog.

  5. Finish Databricks configuration within DinMo

    • Fill in the connection form with the with the collected credentials

    • Test and Save:

      • Click Save & Test to verify the connection.

      • If successful, the Databricks source is now connected to DinMo.

Before allowing to create a new source DinMo will check:

  • Network connectivity

  • Databricks credentials

  • Permission to list schemas and tables (at least in the initial schema)

  • Permission to write to DinMo's dedicated schemas dinmo_segments, dinmo_stats, dinmo_delta_storage and dinmo_predictions

Additional Recommendations

  • Dedicated User Account: Consider creating a specific user account for DinMo. This practice enhances security by allowing better auditing and access control.

  • Permissions: Ensure that the account or token used has the necessary permissions to access the compute resources and data required by DinMo.

  • Security Note: Always store your access tokens and credentials securely. Do not share them publicly or commit them to version control systems.

By completing these steps, DinMo will be able to connect to your Databricks workspace securely, allowing for seamless integration and data operations.

Troubleshooting Common Issues

When integrating DinMo with Databricks, you might encounter some common issues. Below are some of the main problems and their solutions.

  1. Mismatch of Catalog Specification

    • Problem: The catalog specified in DinMo's configuration doesn't match the catalog used when creating the technical schemas in Step 2.

    • Solution: Ensure that the catalog you enter during DinMo's configuration is the same one where you created the DinMo technical schemas in Step 2. If you left the catalog unspecified (i.e., you used the default catalog) when creating the schemas, then leave the catalog field blank in DinMo's configuration. This alignment is crucial for DinMo to locate and access the necessary schemas.

  2. Insufficient Permissions for DinMo User

    • Problem: DinMo cannot access required tables or schemas due to inadequate permissions granted to the DinMo user.

    • Solution: Verify that the DinMo user has the necessary permissions:

      • For Data Tables and Views: Ensure SELECT permissions are granted on all tables and views you want to use in DinMo.

        GRANT SELECT ON TABLE [catalog_name].schema_name.table_name TO `dinmo_user@yourdomain.com`;
      • For DinMo Technical Schemas: Ensure ALL PRIVILEGES are granted on the schemas created in Step 2. Also ensure that the email in the SQL script is the one linked to the Databricks account that created the API key.

        GRANT ALL PRIVILEGES ON SCHEMA [catalog_name].dinmo_delta_storage TO `dinmo_user@yourdomain.com`;
        GRANT ALL PRIVILEGES ON SCHEMA [catalog_name].dinmo_segments TO `dinmo_user@yourdomain.com`;
        GRANT ALL PRIVILEGES ON SCHEMA [catalog_name].dinmo_stats TO `dinmo_user@yourdomain.com`;
        
        -- If using DinMo's predictions module
        GRANT ALL PRIVILEGES ON SCHEMA [catalog_name].dinmo_predictions TO `dinmo_user@yourdomain.com`;
      • Action: Re-run the permission-granting SQL commands if necessary, replacing placeholders with your actual catalog name and DinMo user email.

  3. Invalid or Expired Personal Access Token

    • Problem: DinMo fails to connect to Databricks due to an invalid or expired Personal Access Token.

    • Solution:

      • Generate a New Token:

        • Go to your Databricks workspace and click on your profile icon in the top-right corner.

        • Select Settings and navigate to the Access Tokens section under Developer (or directly under User Settings).

        • Click on Generate New Token.

        • Provide a meaningful name (e.g., "DinMo Integration") and, if possible, set the token to never expire by leaving the lifetime field blank.

        • Click Generate and securely copy the new token.

      • Update DinMo Configuration:

        • Go to DinMo's configuration settings for the Databricks connection.

        • Replace the old token with the new one you just generated.

        • Save the configuration and attempt to reconnect.


If you encounter an error or question not listed below and need assistance, don't hesitate to reach out to your account manager. We're here to help.

Last updated