Merge option in Hive with Example

Posted on by Sumit Kumar

Wow! Merge in Hive ? Yes , after the successful release of hive 2.2.X merge is also possible in hive now. Today I will walk you through one simple example that will clear merge concept in hive.

Merge option in Hive with Example

What is Merge option in hive:-

With Merge option we can perform record level insert,update and delete in hive table.

For example suppose we have one table(final table) and incremental data is keep coming from source table using sqoop . Now our requirement is , if we get new record, we have to insert into final table or if we get record that is already available in final table , we have to update the final table record with updated value that is coming from source table.

Above example is called upserts .It is combination of update and insert.

To support Merge option in hive we have to create bucketed table in ORC format and need to enable ACID property as below:–

SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.auto.convert.join=false;

create table customer_name
(id int, name string, state string)
clustered by (id) into 2 buckets stored as orc
tblproperties(“transactional”=”true”);

Lets insert some records in customer_name table:–

INSERT INTO customer_name VALUES (1, 'sumit', 'bihar'),(2, 'amit', 'bihar');

Create incremental table(all_updates) and insert one records.

INSERT INTO customer_name VALUES (1, 'prakash', 'panjab');

Below Merge query will update customer_name record with ID 1 with the value of table all_updates record.

Option 1:-

merge into customer_name
using ( select * from all_updates) sub
on sub.id = customer_name.id
when matched then update set name = sub.name, state = sub.state
when not matched then insert values (sub.id, sub.name, sub.state);

Option 2:-

merge into customer_name
using all_updates on customer_name.id = all_updates.id
when matched then update set
name=all_updates.name,
state=all_updates.state
when not matched then insert
values(all_updates.id, all_updates.name,
all_updates.state);

After executing merge query , final table (customer_name) will have updated value.

Hope this simple example will help you to understand Merge in hive.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

*