Friday, November 14, 2014

Sqoop has inconsistent behavior with creating a Hive table if it doesn’t already exist

If you want to import data into HDFS via Sqoop and have it auto-create the Hive table schema, currently (1.4.5) Sqoop jobs will fail if the the Hive table already exists. Running the same “sqoop import” command directly will successfully complete.


The reason is that when a Sqoop job is created the “PROPNAME” of hive.fail.table.exists is set to true. If you update that via SQLTool in the hive metastore DB so it’s set to false, the jobs will run fine.


I can’t find anything in the docs that indicates how you can specify this behavior. I’ve had to manually run “update” statements directly on the HSQLDB instance…. that or just before creating your sqoop job run a “sqoop import” command with –create-hive-table and specifying “where 1=0″ in the query somehow so the table exists…. then when you create your sqoop job removing the –create-hive-table bit from the import config.


I think we just need to add another configurable parameter that lets you specify if you want the import to fail if the hive table already exists.


Here’s the SQL statement that will make it so it won’t fail if the table already exists:

update sqoop_sessions set propval = false where propname = ‘hive.fail.table.exists';


I submitted my very first Apache Jira bug for this :) http://ift.tt/1174byG





No comments:

Post a Comment