Wednesday, November 26, 2014

Why do you need to upload a hive-site.xml file for each Oozie workflow Sqoop action

At the bottom of every action config window there is a field that says “Job XML”. These sorts of things have always scared me. Well, here’s what it means in Hadoop-land.


If you don’t do anything with this field and you set up a Sqoop task, that task can run along just fine, happy as can be… until it needs to do something involving Hive. At that point, it has no idea what or where to do stuff in Hive…. because it doesn’t know anything about where Hive is. And that’s why the hive-site.xml file has to be specified there. You click the dot-dot-dot and upload a file – the hive-site.xml file you get from here:


Cloudera Manager > Cluster > Hive > Actions drop-down on the top right > “Download Client Configuration” > in that zip file will be hive-site.xml. That’s the file you upload. That’s the file that defines where anything Hive-related will be.


You could get fancy and store your hive-site (and all the other *-site.xml files in some HDFS folder you just point to), but that’s fancy and I’m not ready for that ;)





No comments:

Post a Comment