Prepare event data

Before creating an Event Stitching project, make sure your event models contain the fields DinMo needs to process events safely and explain the resulting event profile graph.

Event Stitching works best when each input model is an event table: one row represents one event that happened at a specific time.

What you need

Requirement
Why it matters

Event models

Tables or models where each row is an event, such as page views, sessions, app events, purchases, or conversions.

Model primary key

Stable identifier for one event row. DinMo uses the model primary key for idempotency and audit.

Timestamp field

Standard timestamp field on the model. DinMo uses it to order observations and evaluate stitching lifetime.

Event partition column

Date or timestamp column used to select complete event windows efficiently.

Event identifiers

Columns that carry identity evidence, such as user ID, email, anonymous ID, cookie ID, device ID, session ID, click ID, or IP address.

Output permissions

Permission to create or replace Event Stitching output tables in the configured output dataset or schema.

Good input models

Good Event Stitching inputs are behavioral tables:

  • web events

  • app events

  • product usage events

  • sessions

  • purchases

  • conversions

  • support interactions

  • campaign interactions

Avoid profile-like tables such as contacts, accounts, users, leads, subscribers, and customers. Event Stitching expects event-grain models.

Model primary key

The model primary key must identify one event row inside a source model.

Good primary keys are:

  • stable across reruns

  • non-null

  • unique inside the event model

  • not derived from mutable fields

If two selected models can produce the same primary key value, DinMo still keeps them separate by source model.

Use this query to check duplicate primary keys in one model:

Timestamp field

The model timestamp field should represent when the event happened, not when the row was loaded into the warehouse.

Check for:

  • null timestamps

  • future timestamps

  • timestamps far outside the expected backfill range

  • inconsistent timezone handling

Event partition column

The event partition column is the field DinMo uses to select bounded windows.

Use a column that:

  • is present on every selected event model

  • is a date or timestamp field

  • follows the same calendar as the event timestamp

  • lets DinMo select complete windows, usually days

  • matches the physical partitioning or clustering strategy of the source table when possible

The best default is usually the event timestamp itself, or a derived event date that is physically partitioned in the warehouse.

Check daily volume before creating the project:

If many days have zero events, the project can still run, but the schedule and backfill window should match the real event cadence.

Identifier fields

Identifiers are the values DinMo can use to connect events.

Common identifiers:

Identifier
Typical strength
Notes

User ID, customer ID, account ID

Strong

Good anchor candidates when they are authenticated and stable.

Email, email hash, phone

Strong to medium

Good when standardized and governed.

Anonymous ID, cookie ID, device ID, client ID

Medium to weak

Useful for pre-login behavior; protect with lifetime and profile-per-value limits.

Session ID

Weak

Use for short windows only.

Click IDs

Weak

Useful for attribution windows; avoid long lifetimes.

IP address

Weakest

Use only with strict policy or for audit.

Do not map campaign fields, page URLs, product names, country, channel, utm_* fields, or free-text values as identifiers unless they are intentionally used as identity evidence.

Check identifier coverage before setup:

Placeholder and polluted values

Bad values should be blocked before the first production run.

Common examples:

  • unknown

  • undefined

  • null

  • none

  • test

  • 00000000-0000-0000-0000-000000000000

  • empty strings after standardization

Find common low-quality values:

Add placeholder values to blocked values in the Identifier policy.

Shared and corrupted weak identifiers

Weak identifiers can create unsafe connections when one value appears across many strong identities.

Before trusting a weak identifier, check whether it behaves like a shared or corrupted value:

If this query returns many high-count values, configure a lower Max profiles per value, shorten the stitching lifetime, or block known bad values.

Multiple event models

An Event Stitching project can process several event models from the same source.

For each selected model:

  • choose an event partition column

  • map available physical fields to logical identifiers

  • use the same logical identifier when two models carry the same type of value

  • leave unrelated fields unmapped

Example:

Model
Physical field
Logical identifier

web_events

user_id

User ID

web_events

anonymous_id

Anonymous ID

conversions

customer_id

User ID

conversions

email_hash

Email hash

This lets DinMo evaluate events from several models as one event profile graph.

Readiness checklist

Before creating the project, confirm:

  • selected models are event-grain tables

  • each model has a stable primary key

  • model timestamp fields are populated and plausible

  • each model has a date or timestamp event partition column

  • the main identifiers have meaningful coverage

  • weak identifiers do not obviously create large shared clusters

  • placeholder values are known and can be blocked

  • source tables are partitioned or clustered in a way that supports efficient window scans

  • the output dataset or schema can be written by DinMo

Then continue with Create an Event Stitching project.

Last updated