Databricks ai_query
Posted on by Sumit KumarUnlock the Power of AI in Your SQL Queries with Databricks ai_query()
1. Introduction: The Data Analyst’s Dilemma (and Solution)
Key Points:
- The Problem: Data analysts constantly deal with unstructured data (text) like customer reviews, support tickets, and open-ended survey responses. Analyzing this data often requires complex, multi-step processes: 1) Export data from the Lakehouse, 2) Write Python/Spark code, 3) Run a separate ML/LLM job, 4) Re-ingest the results. This is slow and complex.
- The Solution: Databricks SQL with the
ai_query()
function. This function allows you to call a large language model (LLM) directly within your SQL query, treating the LLM as a native, powerful SQL function. - The Power: You can now perform text summarization, sentiment analysis, entity extraction, and classification without ever leaving your SQL environment.
2. What is ai_query()
?
Content:
The ai_query()
function is a scalar SQL function that simplifies access to large language models served on the Databricks Platform, including open-source models (like DBRX, or custom-trained models) and powerful proprietary models.
The Syntax:
The function takes two main arguments:
Argument | Description | Example |
model_name |
The registered model endpoint (serving the LLM) in the Databricks Model Catalog. | "databricks-gpt-oss-20b" |
prompt_string |
The text prompt containing your instructions and the data you want to analyze. | "Summarize this review: 'The product is great...'" |
3. Practical Example: Real-Time Review Analysis
Let’s dive into a realistic use case: quickly analyzing customer feedback to understand sentiment and extract key issues.
Step 1: The Sample Data (Creating the Table)
We need a table of unstructured text data to work with.
-- Create a table to store product reviews
CREATE TABLE product_reviews (
review_id INT,
product_name STRING,
review_text STRING
);
-- Insert sample data
INSERT INTO product_reviews VALUES
(101, 'Wireless Mouse X1', 'The mouse is great, very ergonomic and the battery lasts a long time. Highly recommend!'),
(102, '4K Monitor Pro', 'Disappointed with the color accuracy. The reds are muted. Shipping was fast though.'),
(103, 'Noise-Cancelling Headphones', 'Best purchase this year! The noise cancellation is superb for my commute, and the sound quality is crisp.'),
(104, 'Wireless Mouse X1', 'It randomly disconnects, which is frustrating. Customer support hasn\'t been helpful.');
Step 2: The Magic Query (Sentiment and Summarization)
Here is where we use ai_query()
to iterate over every row, analyze the review_text
, and produce two new, structured fields: sentiment
and summary
.
SELECT
review_id,
product_name,
review_text,
-- 1. Use ai_query to extract the sentiment (We enforce a structured response)
ai_query(
"databricks-gpt-oss-20b",
CONCAT(
"Analyze the following customer review and respond with only one word: 'Positive', 'Negative', or 'Neutral'. Review: '",
review_text,
"'"
)
) AS sentiment,
-- 2. Use ai_query to summarize the key point
ai_query(
"databricks-gpt-oss-20b",
CONCAT(
"Summarize the main point of this review in a single, short sentence. Review: '",
review_text,
"'"
)
) AS summary
FROM
product_reviews;
Step 3: The Structured Result (What it Achieved)
The result is a new table-like output, instantly adding high-value, machine-generated analysis to your data:
review_id | product_name | review_text | sentiment | summary |
101 | Wireless Mouse X1 | …ergonomic and the battery lasts a long time. | Positive | The mouse is highly recommended for its ergonomics and battery life. |
102 | 4K Monitor Pro | Disappointed with the color accuracy… | Negative | The customer is disappointed with the color accuracy, specifically muted red tones. |
104 | Wireless Mouse X1 | It randomly disconnects… | Negative | The mouse randomly disconnects, and customer support was unhelpful. |
4. Other Powerful Use Cases for ai_query()
The possibilities extend far beyond simple sentiment analysis:
Use Case | Prompt Idea | Data Source Example |
Topic Tagging | “Identify the key topic (e.g., ‘Shipping’, ‘Pricing’, ‘Feature Request’) from this text.” | Customer Support Tickets |
Named Entity Recognition (NER) | “Extract all company names mentioned in this news article.” | News Feeds/Documents |
Data Cleaning/Normalization | “Normalize the following address format into ‘Street, City, State, Zip’.” | Unstructured Address Fields |
Translation | “Translate the following text into Spanish.” | Multi-lingual Customer Feedback |
5. Conclusion: Bringing AI to the Data Layer
The ai_query()
function fundamentally changes the interaction between data professionals and advanced AI.
- Speed: No more data movement or complex orchestration. Insights are generated in real-time.
- Simplicity: Leverage the power of cutting-edge LLMs using the language you already know: SQL.
- Scalability: Since this runs natively in the Databricks Lakehouse, it scales effortlessly from a few rows to petabytes of data.
Start experimenting with ai_query()
today and transform how you extract value from your unstructured data!
Leave a Reply