Natural keys¶
What is a natural key?¶
A natural key is an identifier that has been created in the source system. This key is often used as a durable key or simply as a reference to the source data models. An example of a natural key in bframe could be the durable_key
. In the case that the underlying production application’s customer identifier is set as the durable_key
it would be considered a natural key. In the Wikipedia example the customer’s name is used as a natural key and durable key.
id |
durable_key (natural key and durable key) |
name |
created_at |
---|---|---|---|
1001 |
SchlurcherBot |
SchlurcherBot |
2023-11-01 01:00:00 |
1002 |
Gzen92Bot |
Gzen92Bot |
2023-11-01 01:00:01 |
1003 |
TCDSJ |
TCDSJ |
2023-11-01 01:00:12 |
1004 |
4throck |
4throck |
2023-11-01 01:00:40 |
1005 |
Moreau1 |
Moreau1 |
2023-11-01 01:01:31 |
As seen in the example above, a natural key can also be a durable key. Although a natural key can not be a surrogate key. A natural key is based on context. In the source system the field that is referenced by the natural key could be a surrogate key, but within bframe it would be considered a natural key.
Why is a natural key important?¶
Natural keys represent an important link to the source dataset. One important task that can only be accomplished using this key is reconciliation. Checking whether or not the correct number of X models are being created in the end system is crucial for data integrity. Without some form of natural key linkage this can be challenging.
Another use case for natural keys are as a durable key. This can be considered a nice to have and removes some overhead from querying and transforming data. Choosing a durable key is a serious design decision and should not be chosen flippantly. For example, if there is a chance that the natural key has any chance of being updated this option should be discarded.
How are natural keys modeled in bframe?¶
A natural key is either the durable_key
or a custom field. In either case the application itself is straightforward since it’s just creating a new column or inserting values into an existing one.