If you’ve started your dbt journey and successfully connected dbt Core with Databricks, congratulations! 🎉 The next feature you should learn is dbt Seeds. Seeds are one of the simplest yet most powerful features in dbt, especially when working with reference data, lookup tables, and demo datasets. In this blog, we’ll explore what dbt Seeds […]
🧩 Part 1: What is a CTE? “CTE stands for Common Table Expression. It’s like creating a temporary table that exists only while your query runs. You define it using the WITH keyword.” Example: WITH my_cte AS ( SELECT * FROM employees ) SELECT * FROM my_cte; “This is a normal CTE — not […]
Unlock the Power of AI in Your SQL Queries with Databricks ai_query() 1. Introduction: The Data Analyst’s Dilemma (and Solution) Key Points: The Problem: Data analysts constantly deal with unstructured data (text) like customer reviews, support tickets, and open-ended survey responses. Analyzing this data often requires complex, multi-step processes: 1) Export data from the Lakehouse, […]
In this blog post, we’ll explore how to trigger an Amazon SageMaker Jupyter notebook file from an AWS Lambda function using WebSockets. This method allows you to automate your machine learning workflows and run Jupyter notebooks on demand, providing a powerful tool for data scientists and engineers. Prerequisites Before we dive in, ensure you have […]
Hello Everyone, AWS Lambda is phasing out support for Python 3.7 following Python 3.7 reaching its End-Of-Life on June 27, 2023. To ensure the smooth operation of your functions, AWS strongly advises upgrading your Python 3.7 functions to Python 3.10 or Python 3.11 before November 27, 2023. AWS follows a two-stage process for ending support […]
Hey , In this Video we are going to learn Step function AWS service with real-time Scenario. We will trigger step function from Lambda as soon as file will drop into S3 bucket. We have configured Two lambda in our Step function’s state Machine, first will truncate the redshift table and 2nd will copy […]
import json import boto3 import pandas as pd from datetime import datetime import s3fs from urllib.parse import unquote def lambda_handler(event, context): # TODO implement now = datetime.now() date_time=now.strftime(“%Y-%m-%d_%H-%M-%S”) s3 = boto3.client(‘s3′) bucket=’deltafrog-training-dev’ src_Key=’real_time_data/’ dest_file=”s3://deltafrog-training-dev/combined_data/anual_combined_” archive_Key=’archive/’ res=s3.list_objects(Bucket=bucket,Prefix=src_Key) print(res) fname_list=[] final_df=pd.DataFrame() if “Contents” in res: for i in res[“Contents”]: print(i) if “csv” in i[‘Key’]: filename=i[‘Key’].split(‘/’)[-1] print(filename) fname_list.append(filename) […]