Primary keys
************

What is a primary key?
======================
The `primary key <https://en.wikipedia.org/wiki/Primary_key>`_, also known as a PK, is a designated attribue that can reliably identify and distinguish each record in a table. With bframe, core data models have PKs that are a combination of the ``id``, ``version``, ``org_id`` and ``env_id`` fields. A single ID can appear multiple times as long as it has a different version, if the version is the same it will be rejected. The ``id`` is a `surrogate key <surrogate_keys.html>`_ and the ``version`` is an incrementing count that specifies the iteration of the row. The ``org_id`` and ``env_id`` are references that refer to which tenant the data belongs to, more information can be found about organizations and environments here.

.. TODO(need to link to organization and environment sections here) and link surrogate key

Why is a primary key important?
===============================
In general, a primary keys importance is the ability for it to be a unique identifier for a specific row of data. Having a unique identifier allows for an application to refer to it and ensures that other rows aren't confused with it. Within bframe, the PK is used to maintain the integrity of the data. For example, a row with a new version was submitted twice by accident. If the database allowed for both rows to be inserted it could cause unintended consequences. Subsequent reads of the data could pull two identical objects instead of a single unique one. Depending on the contents of the table, this could lead to inaccurate rating, metering or nonsensical provisioning. Data quality is a never ending challenge and PKs are a great tool for keeping data hygiene intact.

How are primary keys modeled in bframe?
=======================================
Based on the underlying data store the implementation strategy will differ. For transactional RDBMSs, like Postgres or MySQL, it is straightforward to add a PK constraint to the specified table. 

.. code-block:: sql

    CREATE TABLE IF NOT EXISTS customers (
        id BIGINT,
        org_id INTEGER NOT NULL,
        env_id INTEGER NOT NULL,
        ...
        version INTEGER DEFAULT 0,
        PRIMARY KEY (id, version, org_id, env_id)
    );

If this option is available it should be taken. Otherwise the primary key will be diffcult to fully enforce. In this scenario the best we can do is use data validation pre and post insertion. Additionally, regular data quality checks are essential if the data store in use does not support uniqueness constraints. A simple check could look something like this:

.. code-block:: sql

    SELECT count(*)
    FROM (
        SELECT raw.*, ROW_NUMBER() OVER(
            PARTITION BY raw.id, raw.version
            ORDER BY raw.version DESC
        ) AS rank
        FROM src.customers AS raw
    )
    WHERE rank > 1;


With this said, bframe does perform deduplication on the raw data to enable editing and branching. So even if the same ``id`` and ``version`` combo are submitted multiple times only one will be returned. This can allow for bframe to offer uniqueness for core models, albeit with some risk, without the guarantees of a transactional RDBMS.