Monday, October 13, 2014

Importing data from SQL Server to Cloudera Hadoop via Sqoop2

Critical points that may or may not be obvious:


1. If you’re using the Sqoop stuff in the web-based interface, you’re actually using Sqoop2


2. You have to download and install the JDBC driver for SQL Server yourself

– curl -L ‘http://ift.tt/1CfDcMU; | tar xz

– sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop2/

– while you’re at it, you may as well put it in the sqoop directory too: sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop/


3. Sqoop2 home directory is /var/lib/sqoop2/ (maybe not…….)


4. restart Sqoop2 service after copying in the JDBC driver file:

– sudo service sqoop2-server restart


5. connection string in “Manage Connections” is like this: jdbc:sqlserver://192.168.1.102


6. for an action, leave schema and table name fields blank and just paste in your TSQL query, then append this to the end of it: +and+${CONDITIONS}. Don’t mess with the boundary query stuff until some other time (Sqoop2 will automatically query for the min/max of the Partition column name you provide).


7. if you mess with the connection you create in Hue/Sqoop2, note you have to type in the password


8. if you get errors, don’t fight it – you have to log in via SSH and look at /var/log/sqoop2/sqoop2.log





No comments:

Post a Comment