Coalesce 2023: The Event for dbt Practitioners
Coalesce is an annual conference hosted by dbt Labs for the purpose of advancing the discipline of analytics engineering and the building of the modern data stack.
If the previous sentence was somewhat incomprehensible to you, let’s take a look at what it means:
dbt is an open-source framework in which to apply the principles of software engineering to the transformation of data – namely portability, modularity, version control, continuous integration & continuous deployment (CI/CD), testing, and documentation. dbt has quickly become the standard tool for data transformation and is currently used in over 30,000 organizations to generate millions of data models.
Analytics engineering is the practice of using dbt and other tools to transform the data in a data warehouse – in other words, taking raw data, cleaning, combining, manipulating, and documenting them, and creating models with useful data for end users.
At Brooklyn Data Co. (BDC), we’ve observed that while dbt is increasingly adopted by data teams across every industry, we’re now seeing dbt being used as integral components of large enterprise data architectures. The inherent complexity of these architectures has made dbt more challenging to implement successfully at scale.
Which is why it was no surprise that a main focus of this year’s Coalesce conference was “dbt at scale.” Many of dbt Labs’ product announcements included features to enable better collaboration and more rapid development, and unlock deployment on large-scale architectures.
- Multi-project architecture, also known as dbt Mesh
- A project in dbt is a top-level structure which defines the file paths, database connection, project name, product version, and other configurations. dbt Mesh is an architectural approach which provides features to enable the integration of multiple dbt projects within an organization. For example, if an organization has separate data sources or workflows, these can be grouped into different dbt projects that can reference each other. Further, dbt now has governance features which enable access controls for different projects as well as data contracts so that model changes don’t break downstream processes.
- dbt Explorer
- The dbt docs interface has been revamped and improved in dbt Explorer, allowing a more performant interface with greater searchability. dbt Explorer will also reflect the improvements made in dbt Mesh by displaying global, cross-project model lineage.
- A revamped, re-released semantic layer
- Having acquired the start-up Transform earlier this year, dbt Labs has revised dbt’s earlier semantic layer with one powered by MetricFlow. This enables defining consistent, reliable metrics within a dbt project and then using provided integrations for using those metrics within Tableau, Google Sheets, and a number of other platforms. More complex metrics are also enabled by this change, as well as joins across tables in the metric definitions.
In addition to product announcements and workshops, the conference included sessions presented by data practitioners who have implemented dbt at their workplaces and by vendors who have provided for dbt integration within their data platforms. These presentations ranged from bootstrapping a data stack for a small organization to refactoring a data stack processing 6TB of data each day – for the latter, we were proud to have the Brooklyn Data Co. team credited for some of the project’s success!
As an Analytics Engineer and a first-time Coalesce attendee, I found the conference to be extremely valuable for professional development. With each presentation I found myself thinking of Brooklyn Data Co. clients who will benefit from new features such as dbt Mesh, as well as new methods such as optimizing dbt configurations to reduce database costs. Of course, it was also great to meet my colleagues and to spend time socializing and getting to know one another!