Connecting to a source ********************** What is a source? ================= A source is a form of data storage. This can be a file or a database. A source in bframe is similar but has a more specific purpose. Since the library does not directly store data, the expectation is that sources that are designated will have the data already present. There are three different sources that can be set on the bframe client: 1. Core source 2. Events source 3. Branch source If no source is set bframe will default to creating an in memory source within the existing duckdb connection. Below is a high level diagram of how sources could be set for a bframe client. |image0| Core source ----------- The "core source" is the data store that is accessed for core data models. Core data models include ``customers``, ``products``, ``pricebooks``, ``list_prices``, ``contracts`` and ``contract_prices``. The core source is a better fit for a transactional database like Postgres. Events source ------------- The "event source" is the data store that is accessed for ``events``. The event source is a better fit for analytical databases or highly scalable data formats (e.g. parquet or iceberg). Specifying a the events source is optional, if omitted the client will use the core source instead. Branch source ------------- The "branch source" is typically a temporary data store that is used for local exploration. Although a branch can be persisted directly in the core source, it is often inconvenient to do so (e.g. production data access controls). Using bframe's branching functionality and a local branch source enables a smooth local development environment with full write access. Depending on the use case using local memory or a ``*.duckdb`` file could be light weight solution. If no branch source is specified the library will default to the core source. How to connect to a source ========================== There are two steps to connecting a source in bframe: 1. Get access to a compatible source 2. Set the source in the client Whenever a bframe client is instantiated connecting to the requisite sources is an important part of the process. Below we will go through each step to demonstrate how it is done. Access to a compatible source ----------------------------- A billing system is typically chosen after a primary database has been selected and is in use. With this in mind, the library can be connected to an existing database. The first requirement for the store is accessibility from `duckdb `_. Fortunately, duckdb supports numerous `extensions `_ for databases and data formats. Creating a compatible source is also an option. DuckDB has robust file system functionality that allows for a database to be entirely created from files. For example, if a developer only had access to CSVs or JSON data, instead of the production database, they could build a source. This would work by adding these files to a duckdb connection and linking them as tables. If these files had a similar schema to the bframe core data models this would result in a compatible ``src`` database. An illustrative example would look something like below: .. code-block:: python from bframelib import Client, Source config = { "org_id": 1, "env_id": 1, "branch_id": 1, "rating_range": ['2025-01-01', '2026-01-01'], } core_source_connect = """ ATTACH ':memory:' AS src; CREATE VIEW IF NOT EXISTS src.customers AS ( SELECT * FROM read_csv(customers.csv) ); CREATE VIEW IF NOT EXISTS src.products AS ( SELECT * FROM read_csv(products.csv) ); CREATE VIEW IF NOT EXISTS src.pricebooks AS ( SELECT * FROM read_csv(pricebooks.csv) ); CREATE VIEW IF NOT EXISTS src.list_prices AS ( SELECT * FROM read_csv(list_prices.csv) ); CREATE VIEW IF NOT EXISTS src.contracts AS ( SELECT * FROM read_csv(contracts.csv) ); CREATE VIEW IF NOT EXISTS src.contract_prices AS ( SELECT * FROM read_csv(contract_prices.csv) ); CREATE VIEW IF NOT EXISTS src.events AS ( SELECT * FROM read_csv(events.csv) ); """ client = Client(config, [Source('core', core_source_connect, False)]) Setting the source ------------------ bframe can be initialized with a connection that has relevant sources present (``src``, ``brch``, ``evts``) or the library can instantiate them directly. Regardless of the chosen path, the connection must use duckdb functionality. This entails using the ``ATTACH DATABASE`` statement that can found `here `_ and within extensions (`postgres `_, `s3 `_, etc). Passing through a connection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On instantiation of the client the ``con`` parameter can receive a `DuckDBPyConnection `_. If a ``Source`` (`link <./interface_api/python_client.html#source>`_) is passed through it will be run on the received connection. .. code-block:: python from bframelib import Client, DEFAULT_SOURCES import duckdb config = { "org_id": 1, "env_id": 1, "branch_id": 1, "rating_range": ['2025-01-01', '2026-01-01'], } con = duckdb.connect() con.execute("ATTACH ':memory:' AS src;") client = Client(config, [DEFAULT_SOURCES], con) Set a source ~~~~~~~~~~~~ If no connection is passed through the client will generate one. The source connect scripts will be executed upon the newly generated connection. .. code-block:: python from bframelib import Client, Source import duckdb config = { "org_id": 1, "env_id": 1, "branch_id": 1, "rating_range": ['2025-01-01', '2026-01-01'], } core_source_connect = "ATTACH 'postgres://first:password@localhost:5433/dev_db' AS src (TYPE POSTGRES);" client = Client(config, [Source('core', core_source_connect, False)], con) Accessing a source ================== A source can be accessed directly from the bframe client by querying the respective database name. 1. Core source -> ``src.TABLE_NAME_HERE`` 2. Events source -> ``evt.TABLE_NAME_HERE`` 3. Branch source -> ``brch.TABLE_NAME_HERE`` An example of querying the core source directly: .. code-block:: python from bframelib import Client, Source config = { "org_id": 1, "env_id": 1, "branch_id": 1, "rating_range": ['2025-01-01', '2026-01-01'] } bf = Client(config) bf.execute("SELECT * FROM src.customers LIMIT 10;") This can be useful for a number of reasons. One example is that duckdb is capable of inserting data directly into a source database. Another is that a source can contain additional tables that bframe doesn't reference. On the otherhand, querying the source directly will not retain the benefits of the bframe configuration (e.g. deduplication, fixed date ranges, tenancy management). .. |image0| image:: /_static/images/api/high_level_sources.png .. |image1| image:: /_static/images/api/csv_source.png