Author Archives: Sumit Kumar

Getting Started with dbt Seeds in Databricks: A Beginner-Friendly Guide

Posted on by Sumit Kumar

If you’ve started your dbt journey and successfully connected dbt Core with Databricks, congratulations! 🎉 The next feature you should learn is dbt Seeds. Seeds are one of the simplest yet most powerful features in dbt, especially when working with reference data, lookup tables, and demo datasets. In this blog, we’ll explore what dbt Seeds […]

Databricks ai_query

Posted on by Sumit Kumar

Unlock the Power of AI in Your SQL Queries with Databricks ai_query() 1. Introduction: The Data Analyst’s Dilemma (and Solution) Key Points: The Problem: Data analysts constantly deal with unstructured data (text) like customer reviews, support tickets, and open-ended survey responses. Analyzing this data often requires complex, multi-step processes: 1) Export data from the Lakehouse, […]

AWS Lambda Layer Essentials: Python Libraries to Optimize Your Serverless Stack

Posted on by Sumit Kumar

Hello Everyone, AWS Lambda is phasing out support for Python 3.7 following Python 3.7 reaching its End-Of-Life on June 27, 2023. To ensure the smooth operation of your functions, AWS strongly advises upgrading your Python 3.7 functions to Python 3.10 or Python 3.11 before November 27, 2023. AWS follows a two-stage process for ending support […]

Upload File into S3 and Send Notification Using SNS Python boto3

Posted on by Sumit Kumar

import pandas as pd import boto3 from io import StringIO data = [[‘Billu’, 31], [‘amit’, 30], [‘Mayank’, 14],[‘prabhat’, 30]] df = pd.DataFrame(data, columns = [‘Name’, ‘Age’]) df ACCESS_KEY=”AKIATHOPBKF36EHONBNK” SECRET_KEY=”dPYe96DbvnAmBDzLws5GBIgk+EqrmiPVnPWFdNoE” def upload_s3(df): i=”test.csv” s3 = boto3.client(“s3″,aws_access_key_id=ACCESS_KEY,aws_secret_access_key=SECRET_KEY) csv_buf = StringIO() df.to_csv(csv_buf, header=True, index=False) csv_buf.seek(0) s3.put_object(Bucket=”test-deltafrog-bucket”, Body=csv_buf.getvalue(), Key=’2021/’+i) ######################SNS############### ses = boto3.client(‘sns’, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY,region_name=’us-east-1′) sns_topicname_arn=”arn:aws:sns:us-east-1:222161883511:s3_upload_notification” #Publish the message […]

Send mail using SES AWS Python

Posted on by Sumit Kumar

import pandas as pd from pretty_html_table import build_table import boto3 from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText ACCESS_KEY = ‘AKIATHOPBKF35ELXFTFX’ SECRET_KEY = ‘2RWyaUCXusLDx/aVUuiCyTBZfbj2b5D/IuuhfAm/’ ses = boto3.client(‘ses’, aws_access_key_id=ACCESS_KEY, aws_secret_access_key=SECRET_KEY,region_name=’us-east-1′) def send_mail(df): body = build_table(df, ‘blue_light’) sender = “sumit8147085086@gmail.com” to = ‘sumit8147085086@gmail.com’ cc = ‘sumit8147085086@gmail.com’ rcpt = cc.split(“,”) + to.split(“,”) #rcpt = to.split(“,”) message = MIMEMultipart() […]

Kinesis_firehose_example

Posted on by Sumit Kumar

sudo yum install –y aws-kinesis-agent cd /etc/aws-kinesis/ sudo vi agent.json sudo service aws-kinesis-agent start sudo chkconfig aws-kinesis-agent on python3 LogGenerator.py 1000 cd /var/log/aws-kinesis-agent/ tail -f aws-kinesis-agent.log [ec2-user@ip-172-31-24-247 aws-kinesis]$ cat agent.json { “cloudwatch.emitMetrics”: true, “kinesis.endpoint”: “”, “firehose.endpoint”: “”, “flows”: [ { “filePattern”: “/home/ec2-user/*.log*”, “deliveryStream”: “kinesis_log_s3” } ] } ######### import names import random import time import […]

Combining multiple files from bucket and move it into another folder

Posted on by Sumit Kumar

import json import boto3 import pandas as pd from datetime import datetime import s3fs from urllib.parse import unquote def lambda_handler(event, context): # TODO implement now = datetime.now() date_time=now.strftime(“%Y-%m-%d_%H-%M-%S”) s3 = boto3.client(‘s3′) bucket=’deltafrog-training-dev’ src_Key=’real_time_data/’ dest_file=”s3://deltafrog-training-dev/combined_data/anual_combined_” archive_Key=’archive/’ res=s3.list_objects(Bucket=bucket,Prefix=src_Key) print(res) fname_list=[] final_df=pd.DataFrame() if “Contents” in res: for i in res[“Contents”]: print(i) if “csv” in i[‘Key’]: filename=i[‘Key’].split(‘/’)[-1] print(filename) fname_list.append(filename) […]