Slowly Changing Dimensions (SCD) is a commonly used dimensional modelling technique used in data warehousing to capture the changing data within the dimension (Image 1) over time. The three most commonly used SCD Types are 0, 1, 2.
The majority of DW/BI projects have type 2 dimensions where a change to an attribute causes the current dimension record to be end dated and a new record created allowing for a complete history of the data changes. See example below
An Upsert is an RDBMS feature that allows a DML statement’s author to automatically either insert a row, or if the row already exists, UPDATE that existing row instead.
From my experience building multiple Azure Data Platforms I have been able to develop reusable ELT functions that I can use from project to project, one being an Azure SQL upsert function.
Today I’m going to share with you have to how to create an Azure SQL Upsert function using PySpark. It can be reused across Databricks workflows with minimal effort and flexibility.
Basic Upsert Logic
Azure Cognitive Services Text Analytics is a great tool you can use to quickly evaluate a text data set for positive or negative sentiment. For example, a service provider can quickly and easily evaluate reviews as positive or negative and rank them based on the sentiment score detected.
As more and more businesses rely on electronic communications with their clients, understanding the overall sentiment attached to your product, service or image has never been more important. Sentiment analysis allow companies to automatically detect sentiment in any text (reviews, insurance claims, triaging etc) in a fast and highly scalable way.
Azure Data Architect Building Modern Data Platforms — Databricks, Azure