Figured I’d consolidate what I did to get data into a Hadoop cluster with SSIS/SSDT. I used SSDT 2013 – the BI version install. In this case, it was a Hortonworks Hadoop cluster.
1. Install both the 32 and 64 bit versions of the Hortonworks Hive ODBC driver: http://ift.tt/1wsXYrp
– edit the DSNs that should already be in there from the installer to point at whichever node is running a HiveServer2 service (find it in Ambari)
– you have to edit the 32 bit one via the 32 bit ODBC Data Source Administrator and the 64 bit DSN via the 64 bit ODBC Data Source Administrator
– In “Authentication”, I could only get it to work with “username” or “username and password”. “No Authentication” resulted in a test just hanging for a long time. I used the “hive” account creds as defined in Ambari
2. At this point you must have a Hive table schema already existing. Just map columns from the data flow to columns in the Hive table schema and you should be good to go.
– Now, to be honest, I keep getting a permissions issue with writing data to the table. I haven’t solved it yet, but it may be problematic. I’ll try to update later with what works.
How do you create a table in Hive? You can use Hue or HCat or whatever web-based tool may be installed on your Hadoop cluster. But here is how you can do it via the command-line:
1. SSH into one of the nodes on your cluster that’s running Hive
2. change use to a user that has permissions in Hive:
– su hive
3. start hive
– hive
4. Once you’re in hive, it’s just like when you’re logged in to a mysql client – generally the same commands are available)
– show databases;
– use database default;
– create table (Year int, Month);
No comments:
Post a Comment