
In many of my Snowflake engagements, clients typically look to side-load data from Excel files. These files are either maintained by their internal business users or sourced from external data providers. In most cases, these Excel files are parsed and converted to CSV or JSON files and then hosted in Snowflake as external tables.
Recently Snowflake previewed the Java UDF feature, so I started exploring its capabilities. In this post, I’ll share my findings and a prototype implementation that leverages this new feature to parse Excel files (or any other files) directly into Snowflake.

Previously, I wrote a blog post on the workflow of deploying Snowflake external functions in AWS. Developing a cloud formation script could aid in the deployment process. Snowflake has provided cloud formation templates in this repo: sfguide-external-functions-examples. This template offers a good starting point. If you want to adopt this, I would recommend understanding it and making appropriate changes.
The recommended AWS approach for developing serverless functionality is to use the AWS SAM/Serverless Application Model. The templated examples provided by Snowflake, however, would not work well if your team uses AWS SAM.
In order to provide the Snowflake community with…

Part 1 of this series provided a brief overview of a “continuous” data quality and data monitoring approach and also various aspects to consider when evaluating different tools or products to fulfill this need for Snowflake.
The need for performing data quality checks should not be postponed until some random time in the future. In this regard, a really tactical solution to this problem is SnowDQ, which is a simplistic and smart approach to performing data quality checks on data that is hosted in Snowflake. The code is fully open-sourced and can be found in GitLab.

The widespread adoption of Snowflake has accelerated greatly over the last few years and even more so after their initial public offering. Honestly, this is because the product is just that good. Before Snowflake, the major clouds (AWS, Azure, & Google Cloud Platform) dominated the space.
Snowflake has been able to differentiate itself from the market by providing a fully SaaS service handling all of the backend infrastructure services that accompanies the major cloud providers.
The ability to easily deploy Snowflake over any of the major cloud providers, scale up or down as needed, spin up new warehouses, additional multiple…

I am continuing to see expanded use (and tremendous customer success) with the Snowflake Data Cloud across new workloads and applications due to the standard-setting scale, elasticity, and performance wrapped up in a consumption-based SaaS offering.
Among the many activities within a Snowflake environment, performing a union operation against tables is pretty common when it comes to data pipelines. Also, I think you’d agree that most source systems evolve over time with variations in schema & table. One key challenge is that performing a union operation on these evolved table versions can get complex.
I’ll focus on this union operation…

Your organization has successfully implemented Snowflake and various departments have embraced the migration and started the adoption. You have migrated from a traditional data warehouse, appliance, or big data platform (Netezza, Teradata, Hadoop, Exadata, Greenplum). You might also be ingesting data from 3rd party vendor APIs. Your Snowflake account might have 10 databases, each with 5 or more schema and probably thousands of tables, columns, and views. There are multiple users across various departments and groups running queries, executing jobs, and serving various business needs.
At this point, you need to access the data inventory in Snowflake and determine:

Azure Data Factory is a popular solution for Azure customers adopted for data acquisition, ingestion, processing pipelines. When it comes to triggering a dbt process hosted in an Azure Container Group (ACG), a typical solution is to use Azure Logic Apps or Azure Functions.
It would be much easier if dbt could be integrated with ADF so that all the code base is in the same place for ingestion, processing, etc. For this reason, I’ll demonstrate how this can be achieved using ADF.
This article is a follow-up to my previous blog post, Deploying and Running dbt on Azure Container…

dbt continues to be a widely adopted tool for data transformation pipelines, especially when moving to an ELT-based approach. As a dbt implementation services partner, we have recommended and deployed dbt on many of our Snowflake client engagements.
While dbt itself is a Python-based framework, you run it using dbt CLI commands. This means that it requires an execution environment, preferably a Python virtual environment running in a VM.
There have been several approaches discussed to achieve this as per the links below: