Overview
The Identity Resolution Module is only available in Business and Enterprise Plans. Reach out to your dedicated account manager enable it in your workspace.
Overview
The Identity Resolution module in DinMo enables organizations to consolidate and manage customer identities across multiple data sources. This ensures a unified and accurate view of each customer by resolving identities based on configurable rules and identifiers. By consolidating duplicate records, you enhance data quality, improve analytics, and deliver more personalized user experiences.
Identity Resolution Concepts
Identifiers
Identifiers are key attributes within each model that represent distinguishing features of a customer or entity, playing a central role in accurately matching records across data sources. When configuring a new identity project, users should decide on the identifiers they want to include. DinMo offers a selection of standard identifier types (e.g., email, phone, user ID) or the flexibility to create custom identifiers specific to business needs (e.g., company registration number, Adobe ID).
For each identifier, users must specify the fields in each model that correspond to the selected identifiers. This process, called identifier mapping, informs DinMo how to match profiles across different models, enabling accurate cross-model identity resolution.
Identifier Standardization: To improve matching accuracy, users can select cleaning and preprocessing techniques for each identifier before merging. DinMo offers the following prebuilt cleaning options:
Case Insensitive: Converts text to lowercase, ignoring case differences.
Trim: Removes leading and trailing whitespaces.
Only Numeric: Removes all non-numeric characters.
Additionally, DinMo provides specific standardization techniques for common identifiers like emails, phones, and addresses to further enhance matching consistency.
Conflict Limit: Users can also set a limit on how many unique values are allowed per identifier within a single profile. If the limit is exceeded, affected rows or profiles are excluded from merging in the golden record table, ensuring data consistency.
Order Fields
The Order Field is a model field of type datetime
or date
that determines the sequence in which model records are processed. For an event model, this field typically corresponds to the event’s timestamp, while for a user model, it indicates the last update time for the user record. This is a crucial configuration parameter that significantly impacts the identity resolution (IDR) process in DinMo.
DinMo uses the Order Field to:
Detect Profile Updates: Profiles with changes to identifiers (e.g., a new value replacing a null) are flagged for inclusion in the next incremental Identity Resolution run, ensuring the system reflects the most current information.
Resolve Conflicts: When an identifier has multiple values across matched profiles, DinMo resolves conflicts by retaining the most recent non-null, non-empty value based on the Order Field.
Rule-Based Matching
DinMo employs a combination of deterministic and fuzzy rule-based techniques to identify duplicate profiles or entities across your source records, based on configurable identifier matching rules. When two or more records meet the criteria of a defined matching rule, they are linked with a shared dinmo_id
and will be merged into a single profile if additional merging criteria are satisfied.
Rule Sets
Rule Sets are collections of matching rules that define how records are evaluated to determine whether they represent the same entity. Each rule set can be configured to handle specific matching scenarios, such as distinguishing between individual and corporate profiles or assigning high and low-priority rules to optimize matching accuracy.
To refine matching precision, you can apply filters to limit a rule set to a subset of source records only, such as records from a particular country, profile category, or other relevant segments.
Rules
A Rule is a defined combination of identifiers that must have the same standardized values for two profiles to be considered a match. Rules are only applied if all required identifier values are non-null in both records being compared; otherwise, the rule will not consider the records for matching. This ensures accurate matching by preventing incomplete data from affecting results.
Merging Rules
DinMo merges duplicate profiles into the golden record output table when multiple profiles match at least one of the defined rules in the identity project, provided that no conflict limit rule prevents the merge.
DinMo uses the Order Field to determine which non-null values should be retained, prioritizing the most recent values based on each profile’s ranking in the Order Field.
Note: The dinmo_id
for a profile may change if its identifiers are updated. If the updated identifiers satisfy at least one matching rule and align with another existing profile, the profiles will be merged, and the dinmo_id
will be adjusted accordingly.
Identity Resolution Process
The DinMo Identity Resolution engine processes records from multiple models to create unified customer profiles. It follows a structured, rule-based approach to identify and merge duplicate profiles across sources, producing several key output tables that facilitate both operational use and data quality audits.
Process Overview
Union of Source Models
Multiple source models are imported, each containing customer or entity records with identifiers.
Source data is standardized in a central Lookup Table, where all records are aligned to common identifiers and preprocessed (e.g., trimming, case normalization) to ensure consistency.
Identity Resolution Matching Engine
The core of the identity resolution process occurs in the matching engine, which applies defined rule sets to identify duplicates across data sources.
The matching engine produces three main outputs:
Match Table: Assigns a unique
dinmo_id
to each set of matched records, effectively grouping duplicate profiles from different sources under a single identifier.Unresolved Records Table: Stores records that could not be processed due to data issues, such as exceeding the configured identifier limit or conflicting primary keys.
Matched IDs Table: Lists all unique identifiers associated with each resolved profile (
dinmo_id
), capturing every identifier linked to a unified profile.
Golden Record Creation
From the previous outputs, DinMo generates a Golden Record Table that contains the deduplicated set of resolved customer profiles. This table consolidates information from all matched records, selecting the most recent and complete values based on the configured Order Field.
Each record in the Golden Record Table represents a “golden” profile—a single view of the customer or entity that integrates information from all matching sources.
Identity Report Table Generation
DinMo also creates an Identity Report Table that provides metrics and data quality insights on the identity resolution process. This table includes key statistics such as the total records processed, the number of duplicates identified, and the quality of the unified profiles.
This output enables ongoing assessment of the matching process’s effectiveness and provides visibility into data quality issues.
Output Tables and Schemas
The identity resolution feature generates several output tables with distinct purposes. All these tables are stored within the connected source in a specific dataset (or schema) called dinmo_identity
1. identity_lookup_[project_id]
identity_lookup_[project_id]
Description: The lookup table that contains standardized source data ready for matching. This table serves as the foundational dataset for identity resolution.
Schema:
pk
STRING
Primary key of the source record
order_field
TIMESTAMP
Timestamp used for ordering records
[Identifiers]
[original identifier type]
All identifiers used for matching (e.g., first_name
, last_name
, email
)
_source_model_name
STRING
Name of the source model
_source_model_pk_name
STRING
Name of the primary key field in the source model
_source_model_order_field_name
STRING
Name of the order field in the source model
_record_id
INTEGER
Unique record ID within DinMo
_project_created_at
TIMESTAMP
Creation timestamp of the project
_project_updated_at
TIMESTAMP
Last update timestamp of the project
2. match_pairs_[project_id]
match_pairs_[project_id]
Description: Stores pairs of records that match based on the applied rules, enabling an audit of which records were linked together.
Schema:
record_id_a
INTEGER
Record ID of the first record
record_id_b
INTEGER
Record ID of the second record
rule_applied
STRING
Rule applied to match the records
3. identity_match_[project_id]
identity_match_[project_id]
Description: Contains all matched records with their assigned dinmo_id
, representing unified customer identities.
Schema: Inherits all columns from identity_lookup_[project_id]
, plus:
dinmo_id
INTEGER
Unique identifier assigned to the entity
4. identity_matched_ids_[project_id]
identity_matched_ids_[project_id]
Description: Lists all unique identifiers linked to each dinmo_id
, enabling cross-reference and identifier tracking.
Schema:
dinmo_id
INTEGER
Unique identifier assigned to the entity
source_pk
STRING
Primary key from the source record
identifier_name
STRING
Name of the identifier
value
STRING
Value of the identifier
_source_model_name
STRING
Source model name
_first_seen_ts
TIMESTAMP
First seen timestamp for the identifier
_last_seen_ts
TIMESTAMP
Last seen timestamp for the identifier
5. identity_unresolved_records_[project_id]
identity_unresolved_records_[project_id]
Description: Contains records that were excluded from the matching process due to data inconsistencies, such as duplicate primary keys or exceeding identifier limits.
Schema: Inherits all columns from identity_lookup_[project_id]
, plus:
unresolved_reason
STRING
Explanation for the unresolved status
unresolved_reason_category
STRING
Category of the unresolved reason
dinmo_id
INTEGER
dinmo_id if available
run_date
DATE
Date of the identity resolution run
6. identity_golden_[project_id]
identity_golden_[project_id]
Description: Consolidated golden record table representing unified customer profiles, with the most recent data selected for each attribute.
Schema:
dinmo_id
INTEGER
Unique identifier assigned to the entity
pk
STRING
Representative primary key
[Identifiers]
STRING
Consolidated identifiers for the entity
order_field
TIMESTAMP
Most recent timestamp for the record
7. identity_golden_report_[project_id]
identity_golden_report_[project_id]
Description: Provides detailed metrics and insights into the identity resolution process.
Schema:
project_id
STRING
Project identifier
run_date
DATE
Date of the identity resolution run
metric_category
STRING
Category of the metric (e.g., "Matches per rule", "Data Quality")
metric
STRING
Description of the metric
value
INTEGER
Value of the metric
Using Outputs for Analysis and Audit
1. Analyzing Matched Records
identity_match_[project_id]
: This table provides a list of all matched records, allowing users to review each matched group of records. Usingdinmo_id
, users can trace back all records associated with a unique entity.
2. Investigating Unresolved Records
identity_unresolved_records_[project_id]
: This table captures records that could not be resolved. It includes reasons for unresolved status, helping users identify data quality issues such as duplicate primary keys or exceeding identifier limits.
3. Auditing Matching Rules
match_pairs_[project_id]
: By examining this table, users can audit which rules successfully linked records together. The table records the rule that led to each match, enabling a review of rule effectiveness and a detailed audit trail.
4. Monitoring Metrics
identity_golden_report_[project_id]
: This report provides comprehensive metrics on the identity resolution process, including the number of duplicates identified, unresolved records, and statistics on matching rules. It serves as a valuable resource for evaluating data quality, monitoring the performance of matching rules, and adjusting configurations.
5. Utilizing Golden Records for a Unified Customer View
identity_golden_[project_id]
: This table is the single source of truth for customer data after identity resolution. It provides a deduplicated, unified view of each entity, which can be used for analytics, customer service, and integration with CRM or marketing platforms. It will be automatically linked to a model so you can use to build audiences and update destinations.
Scheduled Identity Resolution Process
Full Run
Objective: To process the entire dataset comprehensively and create a unified baseline.
Characteristics: Processes all records and applies all rules, generating
dinmo_id
s for all matched entities.Use Cases: Initial setup, after major data migrations, or when substantial rule changes are applied.
Incremental Run
Objective: To process only new or updated records, maintaining up-to-date identity information with minimal resource use.
Characteristics: Focuses on records that have changed since the last run, applying rules only to these records.
Use Cases: Routine scheduling (e.g., daily or weekly), post-data imports, or after data updates.
Process Flow for Incremental Runs
Identify New and Updated Records: Detect records added or modified since the last run, based on timestamps.
Apply Matching Rules to Incremental Data: Apply the rule sets to newly identified records.
Update
dinmo_id
s and Outputs:Assign
dinmo_id
s to new matches or update existing entities as necessary.
Metrics and Audits: Refresh
identity_golden_report_[project_id]
to reflect any updates.
Considerations for Incremental Runs
Consistency: Incremental runs are designed to preserve existing
dinmo_id
s wherever possible, ensuring stability across runs.Efficiency: Incremental processing reduces compute resources by focusing only on modified records.
Regular Audits: Scheduled audits of metrics help monitor data quality and adjust configurations if needed.
Last updated