Overview

The Identity Resolution Module is only available in Business and Enterprise Plans. Reach out to your dedicated account manager enable it in your workspace.

Overview

The Identity Resolution module in DinMo enables organizations to consolidate and manage customer identities across multiple data sources. This ensures a unified and accurate view of each customer by resolving identities based on configurable rules and identifiers. By consolidating duplicate records, you enhance data quality, improve analytics, and deliver more personalized user experiences.

Identity Resolution Concepts

Identifiers


Identifiers are key attributes within each model that represent distinguishing features of a customer or entity, playing a central role in accurately matching records across data sources. When configuring a new identity project, users should decide on the identifiers they want to include. DinMo offers a selection of standard identifier types (e.g., email, phone, user ID) or the flexibility to create custom identifiers specific to business needs (e.g., company registration number, Adobe ID).

For each identifier, users must specify the fields in each model that correspond to the selected identifiers. This process, called identifier mapping, informs DinMo how to match profiles across different models, enabling accurate cross-model identity resolution.

Identifier Standardization: To improve matching accuracy, users can select cleaning and preprocessing techniques for each identifier before merging. DinMo offers the following prebuilt cleaning options:

  • Case Insensitive: Converts text to lowercase, ignoring case differences.

  • Trim: Removes leading and trailing whitespaces.

  • Only Numeric: Removes all non-numeric characters.

Additionally, DinMo provides specific standardization techniques for common identifiers like emails, phones, and addresses to further enhance matching consistency.

Conflict Limit: Users can also set a limit on how many unique values are allowed per identifier within a single profile. If the limit is exceeded, affected rows or profiles are excluded from merging in the golden record table, ensuring data consistency.

Order Fields

The Order Field is a model field of type datetime or date that determines the sequence in which model records are processed. For an event model, this field typically corresponds to the event’s timestamp, while for a user model, it indicates the last update time for the user record. This is a crucial configuration parameter that significantly impacts the identity resolution (IDR) process in DinMo.

DinMo uses the Order Field to:

  • Detect Profile Updates: Profiles with changes to identifiers (e.g., a new value replacing a null) are flagged for inclusion in the next incremental Identity Resolution run, ensuring the system reflects the most current information.

  • Resolve Conflicts: When an identifier has multiple values across matched profiles, DinMo resolves conflicts by retaining the most recent non-null, non-empty value based on the Order Field.

Rule-Based Matching

DinMo employs a combination of deterministic and fuzzy rule-based techniques to identify duplicate profiles or entities across your source records, based on configurable identifier matching rules. When two or more records meet the criteria of a defined matching rule, they are linked with a shared dinmo_id and will be merged into a single profile if additional merging criteria are satisfied.

Rule Sets

Rule Sets are collections of matching rules that define how records are evaluated to determine whether they represent the same entity. Each rule set can be configured to handle specific matching scenarios, such as distinguishing between individual and corporate profiles or assigning high and low-priority rules to optimize matching accuracy.

To refine matching precision, you can apply filters to limit a rule set to a subset of source records only, such as records from a particular country, profile category, or other relevant segments.

Rules

A Rule is a defined combination of identifiers that must have the same standardized values for two profiles to be considered a match. Rules are only applied if all required identifier values are non-null in both records being compared; otherwise, the rule will not consider the records for matching. This ensures accurate matching by preventing incomplete data from affecting results.

Merging Rules

DinMo merges duplicate profiles into the golden record output table when multiple profiles match at least one of the defined rules in the identity project, provided that no conflict limit rule prevents the merge.

DinMo uses the Order Field to determine which non-null values should be retained, prioritizing the most recent values based on each profile’s ranking in the Order Field.

Note: The dinmo_id for a profile may change if its identifiers are updated. If the updated identifiers satisfy at least one matching rule and align with another existing profile, the profiles will be merged, and the dinmo_id will be adjusted accordingly.

Identity Resolution Process

The DinMo Identity Resolution engine processes records from multiple models to create unified customer profiles. It follows a structured, rule-based approach to identify and merge duplicate profiles across sources, producing several key output tables that facilitate both operational use and data quality audits.

Process Overview

  1. Union of Source Models

    • Multiple source models are imported, each containing customer or entity records with identifiers.

    • Source data is standardized in a central Lookup Table, where all records are aligned to common identifiers and preprocessed (e.g., trimming, case normalization) to ensure consistency.

  2. Identity Resolution Matching Engine

    • The core of the identity resolution process occurs in the matching engine, which applies defined rule sets to identify duplicates across data sources.

    • The matching engine produces three main outputs:

      • Match Table: Assigns a unique dinmo_id to each set of matched records, effectively grouping duplicate profiles from different sources under a single identifier.

      • Unresolved Records Table: Stores records that could not be processed due to data issues, such as exceeding the configured identifier limit or conflicting primary keys.

      • Matched IDs Table: Lists all unique identifiers associated with each resolved profile (dinmo_id), capturing every identifier linked to a unified profile.

  3. Golden Record Creation

    • From the previous outputs, DinMo generates a Golden Record Table that contains the deduplicated set of resolved customer profiles. This table consolidates information from all matched records, selecting the most recent and complete values based on the configured Order Field.

    • Each record in the Golden Record Table represents a “golden” profile—a single view of the customer or entity that integrates information from all matching sources.

  4. Identity Report Table Generation

    • DinMo also creates an Identity Report Table that provides metrics and data quality insights on the identity resolution process. This table includes key statistics such as the total records processed, the number of duplicates identified, and the quality of the unified profiles.

    • This output enables ongoing assessment of the matching process’s effectiveness and provides visibility into data quality issues.

Output Tables and Schemas

The identity resolution feature generates several output tables with distinct purposes. All these tables are stored within the connected source in a specific dataset (or schema) called dinmo_identity

1. identity_lookup_[project_id]

Description: The lookup table that contains standardized source data ready for matching. This table serves as the foundational dataset for identity resolution.

Schema:

Column
Type
Description

pk

STRING

Primary key of the source record

order_field

TIMESTAMP

Timestamp used for ordering records

[Identifiers]

[original identifier type]

All identifiers used for matching (e.g., first_name, last_name, email)

_source_model_name

STRING

Name of the source model

_source_model_pk_name

STRING

Name of the primary key field in the source model

_source_model_order_field_name

STRING

Name of the order field in the source model

_record_id

INTEGER

Unique record ID within DinMo

_project_created_at

TIMESTAMP

Creation timestamp of the project

_project_updated_at

TIMESTAMP

Last update timestamp of the project

2. match_pairs_[project_id]

Description: Stores pairs of records that match based on the applied rules, enabling an audit of which records were linked together.

Schema:

Column
Type
Description

record_id_a

INTEGER

Record ID of the first record

record_id_b

INTEGER

Record ID of the second record

rule_applied

STRING

Rule applied to match the records

3. identity_match_[project_id]

Description: Contains all matched records with their assigned dinmo_id, representing unified customer identities.

Schema: Inherits all columns from identity_lookup_[project_id], plus:

Column
Type
Description

dinmo_id

INTEGER

Unique identifier assigned to the entity

4. identity_matched_ids_[project_id]

Description: Lists all unique identifiers linked to each dinmo_id, enabling cross-reference and identifier tracking.

Schema:

Column
Type
Description

dinmo_id

INTEGER

Unique identifier assigned to the entity

source_pk

STRING

Primary key from the source record

identifier_name

STRING

Name of the identifier

value

STRING

Value of the identifier

_source_model_name

STRING

Source model name

_first_seen_ts

TIMESTAMP

First seen timestamp for the identifier

_last_seen_ts

TIMESTAMP

Last seen timestamp for the identifier

5. identity_unresolved_records_[project_id]

Description: Contains records that were excluded from the matching process due to data inconsistencies, such as duplicate primary keys or exceeding identifier limits.

Schema: Inherits all columns from identity_lookup_[project_id], plus:

Column
Type
Description

unresolved_reason

STRING

Explanation for the unresolved status

unresolved_reason_category

STRING

Category of the unresolved reason

dinmo_id

INTEGER

dinmo_id if available

run_date

DATE

Date of the identity resolution run

6. identity_golden_[project_id]

Description: Consolidated golden record table representing unified customer profiles, with the most recent data selected for each attribute.

Schema:

Column
Type
Description

dinmo_id

INTEGER

Unique identifier assigned to the entity

pk

STRING

Representative primary key

[Identifiers]

STRING

Consolidated identifiers for the entity

order_field

TIMESTAMP

Most recent timestamp for the record

7. identity_golden_report_[project_id]

Description: Provides detailed metrics and insights into the identity resolution process.

Schema:

Column
Type
Description

project_id

STRING

Project identifier

run_date

DATE

Date of the identity resolution run

metric_category

STRING

Category of the metric (e.g., "Matches per rule", "Data Quality")

metric

STRING

Description of the metric

value

INTEGER

Value of the metric


Using Outputs for Analysis and Audit

1. Analyzing Matched Records

  • identity_match_[project_id]: This table provides a list of all matched records, allowing users to review each matched group of records. Using dinmo_id, users can trace back all records associated with a unique entity.

2. Investigating Unresolved Records

  • identity_unresolved_records_[project_id]: This table captures records that could not be resolved. It includes reasons for unresolved status, helping users identify data quality issues such as duplicate primary keys or exceeding identifier limits.

3. Auditing Matching Rules

  • match_pairs_[project_id]: By examining this table, users can audit which rules successfully linked records together. The table records the rule that led to each match, enabling a review of rule effectiveness and a detailed audit trail.

4. Monitoring Metrics

  • identity_golden_report_[project_id]: This report provides comprehensive metrics on the identity resolution process, including the number of duplicates identified, unresolved records, and statistics on matching rules. It serves as a valuable resource for evaluating data quality, monitoring the performance of matching rules, and adjusting configurations.

5. Utilizing Golden Records for a Unified Customer View

  • identity_golden_[project_id]: This table is the single source of truth for customer data after identity resolution. It provides a deduplicated, unified view of each entity, which can be used for analytics, customer service, and integration with CRM or marketing platforms. It will be automatically linked to a model so you can use to build audiences and update destinations.


Scheduled Identity Resolution Process

Full Run

  • Objective: To process the entire dataset comprehensively and create a unified baseline.

  • Characteristics: Processes all records and applies all rules, generating dinmo_ids for all matched entities.

  • Use Cases: Initial setup, after major data migrations, or when substantial rule changes are applied.

Incremental Run

  • Objective: To process only new or updated records, maintaining up-to-date identity information with minimal resource use.

  • Characteristics: Focuses on records that have changed since the last run, applying rules only to these records.

  • Use Cases: Routine scheduling (e.g., daily or weekly), post-data imports, or after data updates.

Process Flow for Incremental Runs

  1. Identify New and Updated Records: Detect records added or modified since the last run, based on timestamps.

  2. Apply Matching Rules to Incremental Data: Apply the rule sets to newly identified records.

  3. Update dinmo_ids and Outputs:

    • Assign dinmo_ids to new matches or update existing entities as necessary.

  4. Metrics and Audits: Refresh identity_golden_report_[project_id] to reflect any updates.

Considerations for Incremental Runs

  • Consistency: Incremental runs are designed to preserve existing dinmo_ids wherever possible, ensuring stability across runs.

  • Efficiency: Incremental processing reduces compute resources by focusing only on modified records.

  • Regular Audits: Scheduled audits of metrics help monitor data quality and adjust configurations if needed.


Archive

Last updated