Time and again I see the same questions asked: “Why should I use dbt?” or “I don’t understand what value dbt offers”. So I thought I’d put together an article that touches on some of the benefits, as well as putting together a step through on setting up a new project (using DuckDB as the database), complete with associated GitHub repo for you to take a look at.

Having used dbt since early 2018, and with my partner being a dbt trainer, I hope that this article is useful for some of you. The link is paywall bypassed.

  • @[email protected]
    link
    fedilink
    English
    2
    edit-2
    5 months ago

    I stepped in to a cluster of a situation where an analyst driven product consisting of around 100 manually triggered, sequential, human in the loop for QA, loop back, edit and rerun SQL steps which were completed every month. This was after a bunch of data science spark routines ran (all off one of the DS’ laptops, and it took a day! The machine was unavailable for his other use while this happe ed! Lol

    Then, the product of all that work was loaded into an excel file where further QA occurred. The excel file was shipped to clients and represented the deliverable. Folks were paying a LOT for this analysis, every month.

    The excel took about 30 minutes to load, and god help you if you tried to move anything or conduct your own analysis.

    The eng team built a proper ingest pipe, computation/model platform, and by draw of the straw I got the task of unraveling the pile of analyst sql into a DBT workflow. Then we pivoted the deliverable to looker where the only SQL that happened there was specific to the display and final organization of the data.

    If you find yourself in a similar situation, and your stakeholder is a squad of highly intelligent analysts with deep domain knowledge but shallow eng knowledge, DBT can be a godsend.

    In the right space, I can’t recommend it enough