Surrogate keys

What is a surrogate key?

The Kimball Group has described surrogate keys in detail here. The id and durable_id fields are both incremental integer surrogate keys. Depending on the needs of the operator these columns can be seen as natural keys as well. For example, if an operator uses the bframe data model for their production application that would make durable_id a natural key and a surrogate key. Another way to implement bframe is by performing an ELT into target tables in a format that aligns with bframe. In this case, the natural key would be generated from the extracted data and id and durable_id would be treated soley as surrogate keys with their specified use cases (partial primary key and duable key respectively).

A simple example of a surrogate key can be seen on the customers table within bframe.

Customers

id (surrogate_key)

name

created_at

1001

SchlurcherBot

2023-11-01 01:00:00

1002

Gzen92Bot

2023-11-01 01:00:01

1003

TCDSJ

2023-11-01 01:00:12

1004

4throck

2023-11-01 01:00:40

1005

Moreau1

2023-11-01 01:01:31

Why are surrogate keys important?

Surrogate keys are important because of the need for row identification in a RDBMS. Primary keys, foreign keys, durable keys and the like often use surrogate keys for representation. Without a way to generate them it would be challenging to support the numerous use cases of metadata in a more sophisticated data application.

Are all identifiers surrogate keys?

No. Often unique identifiers are easier to create artificially, hence why surrogate keys are important and ubiquitous. If a field on a model can naturally be considered unique that can also be used a primary key. This can simplify business logic as well since a row can be identified without prior lookup. One such example of this is when email is used as a primary key within customer relationship management systems (CRMs).

How to model surrogate keys in bframe?

bframe has a large need for unique identifiers throughout it’s schema. The actual process of generating is based on the underlying system and the operators specific needs. For some systems surrogate keys can be simply created with a SEQUENCE. An incrementing integer can be created through this method and used as a surrogate key within the id field.

CREATE SEQUENCE serial START 101;
CREATE TABLE customers (id INTEGER, name TEXT);
INSERT INTO customers (id, name) VALUES (nextval('serial'), 'Johnny bagga donuts');
SELECT * FROM customers;
┌───────┬─────────────────────┐
│  id           name         │
│ int32        varchar       │
├───────┼─────────────────────┤
│   101  Johnny bagga donuts │
└───────┴─────────────────────┘

It is not always desirable to use incrementing integers especially when the identifiers are external facing. In these cases UUIDs can be used instead to create an obfuscated surrogate key.

CREATE TABLE customers (id UUID, name TEXT);
INSERT INTO customers (id, name) VALUES (gen_random_uuid(), 'Johnny bagga donuts');
SELECT * FROM customers;
┌──────────────────────────────────────┬─────────────────────┐
│                  id                          name         │
│                 uuid                        varchar       │
├──────────────────────────────────────┼─────────────────────┤
│ e223a3ed-4449-4a28-92f6-d044351d63af  Johnny bagga donuts │
└──────────────────────────────────────┴─────────────────────┘