Background
There’s not much out there on the subject of how to unit test Databricks notebooks when it comes to using them to create a Kimball data warehouse or a data lake house. This is because of a couple reasons. One: most people who have historically worked in Business Intelligence aren’t in the habit of writing automated unit testing for their code, or two: unit testing of notebooks seems to be an impossible task to accomplish if you keep your ETL code entirely inside of Databricks notebooks. Some might argue that notebooks aren’t enterprise grade ETL in terms of SDLC and CI/CD capabilities. I agree. If you’ve got a team of software developers who are well versed in OOP then you should be building Python wheels that are deployed and ran on your Databricks clusters. But what happens when you’ve got a team who is only familiar with tools such as SSIS or Informatica, and the extent of their development experience begins and ends with the Script Component and Script Task in SSIS?
Continue reading