Connecting to a source

What is a source?

A source is a form of data storage. This can be a file or a database. A source in bframe is similar but has a more specific purpose. Since the library does not directly store data, the expectation is that sources that are designated will have the data already present. There are three different sources that can be set on the bframe client:

  1. Core source

  2. Events source

  3. Branch source

If no source is set bframe will default to creating an in memory source within the existing duckdb connection. Below is a high level diagram of how sources could be set for a bframe client.

image0

Core source

The “core source” is the data store that is accessed for core data models. Core data models include customers, products, pricebooks, list_prices, contracts and contract_prices. The core source is a better fit for a transactional database like Postgres.

Events source

The “event source” is the data store that is accessed for events. The event source is a better fit for analytical databases or highly scalable data formats (e.g. parquet or iceberg). Specifying a the events source is optional, if omitted the client will use the core source instead.

Branch source

The “branch source” is typically a temporary data store that is used for local exploration. Although a branch can be persisted directly in the core source, it is often inconvenient to do so (e.g. production data access controls). Using bframe’s branching functionality and a local branch source enables a smooth local development environment with full write access. Depending on the use case using local memory or a *.duckdb file could be light weight solution. If no branch source is specified the library will default to the core source.

How to connect to a source

There are two steps to connecting a source in bframe:

  1. Get access to a compatible source

  2. Set the source in the client

Whenever a bframe client is instantiated connecting to the requisite sources is an important part of the process. Below we will go through each step to demonstrate how it is done.

Access to a compatible source

A billing system is typically chosen after a primary database has been selected and is in use. With this in mind, the library can be connected to an existing database. The first requirement for the store is accessibility from duckdb. Fortunately, duckdb supports numerous extensions for databases and data formats.

Creating a compatible source is also an option. DuckDB has robust file system functionality that allows for a database to be entirely created from files. For example, if a developer only had access to CSVs or JSON data, instead of the production database, they could build a source. This would work by adding these files to a duckdb connection and linking them as tables. If these files had a similar schema to the bframe core data models this would result in a compatible src database. An illustrative example would look something like below:

from bframelib import Client, Source

config = {
    "org_id": 1,
    "env_id": 1,
    "branch_id": 1,
    "rating_range": ['2025-01-01', '2026-01-01'],
}

core_source_connect = """
    ATTACH ':memory:' AS src;
    CREATE VIEW IF NOT EXISTS src.customers AS (
        SELECT * FROM read_csv(customers.csv)
    );
    CREATE VIEW IF NOT EXISTS src.products AS (
        SELECT * FROM read_csv(products.csv)
    );
    CREATE VIEW IF NOT EXISTS src.pricebooks AS (
        SELECT * FROM read_csv(pricebooks.csv)
    );
    CREATE VIEW IF NOT EXISTS src.list_prices AS (
        SELECT * FROM read_csv(list_prices.csv)
    );
    CREATE VIEW IF NOT EXISTS src.contracts AS (
        SELECT * FROM read_csv(contracts.csv)
    );
    CREATE VIEW IF NOT EXISTS src.contract_prices AS (
        SELECT * FROM read_csv(contract_prices.csv)
    );
    CREATE VIEW IF NOT EXISTS src.events AS (
        SELECT * FROM read_csv(events.csv)
    );
"""

client = Client(config, [Source('core', core_source_connect, False)])

Setting the source

bframe can be initialized with a connection that has relevant sources present (src, brch, evts) or the library can instantiate them directly. Regardless of the chosen path, the connection must use duckdb functionality. This entails using the ATTACH DATABASE statement that can found here and within extensions (postgres, s3, etc).

Passing through a connection

On instantiation of the client the con parameter can receive a DuckDBPyConnection. If a Source (link) is passed through it will be run on the received connection.

from bframelib import Client, DEFAULT_SOURCES
import duckdb

config = {
    "org_id": 1,
    "env_id": 1,
    "branch_id": 1,
    "rating_range": ['2025-01-01', '2026-01-01'],
}

con = duckdb.connect()
con.execute("ATTACH ':memory:' AS src;")
client = Client(config, [DEFAULT_SOURCES], con)

Set a source

If no connection is passed through the client will generate one. The source connect scripts will be executed upon the newly generated connection.

from bframelib import Client, Source
import duckdb

config = {
    "org_id": 1,
    "env_id": 1,
    "branch_id": 1,
    "rating_range": ['2025-01-01', '2026-01-01'],
}

core_source_connect = "ATTACH 'postgres://first:password@localhost:5433/dev_db' AS src (TYPE POSTGRES);"

client = Client(config, [Source('core', core_source_connect, False)], con)

Accessing a source

A source can be accessed directly from the bframe client by querying the respective database name.

  1. Core source -> src.TABLE_NAME_HERE

  2. Events source -> evt.TABLE_NAME_HERE

  3. Branch source -> brch.TABLE_NAME_HERE

An example of querying the core source directly:

from bframelib import Client, Source

config = {
    "org_id": 1,
    "env_id": 1,
    "branch_id": 1,
    "rating_range": ['2025-01-01', '2026-01-01']
}

bf = Client(config)
bf.execute("SELECT * FROM src.customers LIMIT 10;")

This can be useful for a number of reasons. One example is that duckdb is capable of inserting data directly into a source database. Another is that a source can contain additional tables that bframe doesn’t reference.

On the otherhand, querying the source directly will not retain the benefits of the bframe configuration (e.g. deduplication, fixed date ranges, tenancy management).