====== Merge in Hive and Impala ====== Suppose you want to load data into an already populated Hive table and you want to apply data as update of existing records and insert of new records. Neither Hive nor Impala query language includes ''MERGE'' command (or ''UPSERT'' known from Teradata). Also it is a good practice to ovewrite a whole table partition at once. Here is the command which performs what you would achieve in RDBMS using ''MERGE'' or ''UPSERT'' command. Assumptions: - MY_SOURCE_TABLE has the same primary key as MY_TARGET_TABLE - MY_SOURCE_TABLE may be a subquery. In this case resulting data must not violate target table's primary key. -- on Hive set hive.exec.dynamic.partition.mode=true set hive.exec.dynamic.partition.mode=nonstrict; -- on Hive INSERT OVERWRITE TABLE MY_TARGET_TABLE PARTITION (PARTITION_COLUMN) SELECT COALESCE(SRC.COLUMN_1, TGT.COLUMN_1) AS COLUMN_1, COALESCE(SRC.COLUMN_2, TGT.COLUMN_2) AS COLUMN_2, COALESCE(SRC.COLUMN_3, TGT.COLUMN_3) AS COLUMN_3, COALESCE(SRC.PARTITION_COLUMN, TGT.PARTITION_COLUMN) AS PARTITION_COLUMN FROM MY_SOURCE_TABLE SRC FULL OUTER JOIN MY_TARGET_TABLE TGT ON TGT.KEY_COLUMN_1 = SRC.KEY_COLUMN_1 AND TGT.KEY_COLUMN_2 = SRC.KEY_COLUMN_2 WHERE coalesce(SRC.COLUMN_3, TGT.COLUMN_3) = 'something' /* filter condition */ ; On Hive you must set properties:\\ ''set hive.exec.dynamic.partition.mode=true''\\ ''set hive.exec.dynamic.partition.mode=nonstrict''\\