Surrogate keys
**************

What is a surrogate key?
========================
The Kimball Group has described surrogate keys in detail `here <https://www.kimballgroup.com/1998/05/surrogate-keys/>`_. The ``id`` and ``durable_id`` fields are both incremental integer surrogate keys. Depending on the needs of the operator these columns can be seen as natural keys as well. For example, if an operator uses the bframe data model for their production application that would make ``durable_id`` a natural key and a surrogate key. Another way to implement bframe is by performing an ELT into target tables in a format that aligns with bframe. In this case, the natural key would be generated from the extracted data and ``id`` and ``durable_id`` would be treated soley as surrogate keys with their specified use cases (partial primary key and duable key respectively). 

A simple example of a surrogate key can be seen on the customers table within bframe.

.. csv-table:: Customers
   :file: /_static/ex_tables/ex_m2_1.csv
   :class: bmodel-table
   :header-rows: 1
 
Why are surrogate keys important?
=================================
Surrogate keys are important because of the need for row identification in a RDBMS. Primary keys, foreign keys, durable keys and the like often use surrogate keys for representation. Without a way to generate them it would be challenging to support the numerous use cases of metadata in a more sophisticated data application.

Are all identifiers surrogate keys?
===================================
No. Often unique identifiers are easier to create artificially, hence why surrogate keys are important and ubiquitous. If a field on a model can naturally be considered unique that can also be used a primary key. This can simplify business logic as well since a row can be identified without prior lookup. One such example of this is when email is used as a primary key within customer relationship management systems (CRMs).

How to model surrogate keys in bframe?
======================================
bframe has a large need for unique identifiers throughout it's schema. The actual process of generating is based on the underlying system and the operators specific needs. For some systems surrogate keys can be simply created with a ``SEQUENCE``. An incrementing integer can be created through this method and used as a surrogate key within the ``id`` field.

.. code-block:: bash

    CREATE SEQUENCE serial START 101;
    CREATE TABLE customers (id INTEGER, name TEXT);
    INSERT INTO customers (id, name) VALUES (nextval('serial'), 'Johnny bagga donuts');
    SELECT * FROM customers;
    ┌───────┬─────────────────────┐
    │  id   │        name         │
    │ int32 │       varchar       │
    ├───────┼─────────────────────┤
    │   101 │ Johnny bagga donuts │
    └───────┴─────────────────────┘


It is not always desirable to use incrementing integers especially when the identifiers are external facing. In these cases UUIDs can be used instead to create an obfuscated surrogate key.

.. code-block:: bash

    CREATE TABLE customers (id UUID, name TEXT);
    INSERT INTO customers (id, name) VALUES (gen_random_uuid(), 'Johnny bagga donuts');
    SELECT * FROM customers;
    ┌──────────────────────────────────────┬─────────────────────┐
    │                  id                  │        name         │
    │                 uuid                 │       varchar       │
    ├──────────────────────────────────────┼─────────────────────┤
    │ e223a3ed-4449-4a28-92f6-d044351d63af │ Johnny bagga donuts │
    └──────────────────────────────────────┴─────────────────────┘