Databricks ai_query

Posted on by Sumit Kumar

Unlock the Power of AI in Your SQL Queries with Databricks ai_query()

1. Introduction: The Data Analyst’s Dilemma (and Solution)

Key Points:

  • The Problem: Data analysts constantly deal with unstructured data (text) like customer reviews, support tickets, and open-ended survey responses. Analyzing this data often requires complex, multi-step processes: 1) Export data from the Lakehouse, 2) Write Python/Spark code, 3) Run a separate ML/LLM job, 4) Re-ingest the results. This is slow and complex.
  • The Solution: Databricks SQL with the ai_query() function. This function allows you to call a large language model (LLM) directly within your SQL query, treating the LLM as a native, powerful SQL function.
  • The Power: You can now perform text summarization, sentiment analysis, entity extraction, and classification without ever leaving your SQL environment.

2. What is ai_query()?

Content:

The ai_query() function is a scalar SQL function that simplifies access to large language models served on the Databricks Platform, including open-source models (like DBRX, or custom-trained models) and powerful proprietary models.

The Syntax:

The function takes two main arguments:

Argument Description Example
model_name The registered model endpoint (serving the LLM) in the Databricks Model Catalog. "databricks-gpt-oss-20b"
prompt_string The text prompt containing your instructions and the data you want to analyze. "Summarize this review: 'The product is great...'"

 

3. Practical Example: Real-Time Review Analysis

 

Let’s dive into a realistic use case: quickly analyzing customer feedback to understand sentiment and extract key issues.

 

Step 1: The Sample Data (Creating the Table)

 

We need a table of unstructured text data to work with.

SQL

-- Create a table to store product reviews
CREATE TABLE product_reviews (
  review_id INT,
  product_name STRING,
  review_text STRING
);

-- Insert sample data
INSERT INTO product_reviews VALUES
(101, 'Wireless Mouse X1', 'The mouse is great, very ergonomic and the battery lasts a long time. Highly recommend!'),
(102, '4K Monitor Pro', 'Disappointed with the color accuracy. The reds are muted. Shipping was fast though.'),
(103, 'Noise-Cancelling Headphones', 'Best purchase this year! The noise cancellation is superb for my commute, and the sound quality is crisp.'),
(104, 'Wireless Mouse X1', 'It randomly disconnects, which is frustrating. Customer support hasn\'t been helpful.');

 

Step 2: The Magic Query (Sentiment and Summarization)

 

Here is where we use ai_query() to iterate over every row, analyze the review_text, and produce two new, structured fields: sentiment and summary.

SQL

SELECT
  review_id,
  product_name,
  review_text,
  -- 1. Use ai_query to extract the sentiment (We enforce a structured response)
  ai_query(
    "databricks-gpt-oss-20b",
    CONCAT(
      "Analyze the following customer review and respond with only one word: 'Positive', 'Negative', or 'Neutral'. Review: '",
      review_text,
      "'"
    )
  ) AS sentiment,
  -- 2. Use ai_query to summarize the key point
  ai_query(
    "databricks-gpt-oss-20b",
    CONCAT(
      "Summarize the main point of this review in a single, short sentence. Review: '",
      review_text,
      "'"
    )
  ) AS summary
FROM
  product_reviews;

 

Step 3: The Structured Result (What it Achieved)

 

The result is a new table-like output, instantly adding high-value, machine-generated analysis to your data:

review_id product_name review_text sentiment summary
101 Wireless Mouse X1 …ergonomic and the battery lasts a long time. Positive The mouse is highly recommended for its ergonomics and battery life.
102 4K Monitor Pro Disappointed with the color accuracy… Negative The customer is disappointed with the color accuracy, specifically muted red tones.
104 Wireless Mouse X1 It randomly disconnects… Negative The mouse randomly disconnects, and customer support was unhelpful.

 

4. Other Powerful Use Cases for ai_query()

 

The possibilities extend far beyond simple sentiment analysis:

Use Case Prompt Idea Data Source Example
Topic Tagging “Identify the key topic (e.g., ‘Shipping’, ‘Pricing’, ‘Feature Request’) from this text.” Customer Support Tickets
Named Entity Recognition (NER) “Extract all company names mentioned in this news article.” News Feeds/Documents
Data Cleaning/Normalization “Normalize the following address format into ‘Street, City, State, Zip’.” Unstructured Address Fields
Translation “Translate the following text into Spanish.” Multi-lingual Customer Feedback

 

5. Conclusion: Bringing AI to the Data Layer

 

The ai_query() function fundamentally changes the interaction between data professionals and advanced AI.

  • Speed: No more data movement or complex orchestration. Insights are generated in real-time.
  • Simplicity: Leverage the power of cutting-edge LLMs using the language you already know: SQL.
  • Scalability: Since this runs natively in the Databricks Lakehouse, it scales effortlessly from a few rows to petabytes of data.

Start experimenting with ai_query() today and transform how you extract value from your unstructured data!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*