MMO Worlds: Importing data from SQL Server to Cloudera Hadoop via Sqoop2

Monday, October 13, 2014

Importing data from SQL Server to Cloudera Hadoop via Sqoop2

Critical points that may or may not be obvious:

1. If you’re using the Sqoop stuff in the web-based interface, you’re actually using Sqoop2

2. You have to download and install the JDBC driver for SQL Server yourself

– curl -L ‘http://ift.tt/1CfDcMU; | tar xz

– sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop2/

– while you’re at it, you may as well put it in the sqoop directory too: sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop/

3. Sqoop2 home directory is /var/lib/sqoop2/ (maybe not…….)

4. restart Sqoop2 service after copying in the JDBC driver file:

– sudo service sqoop2-server restart

5. connection string in “Manage Connections” is like this: jdbc:sqlserver://192.168.1.102

6. for an action, leave schema and table name fields blank and just paste in your TSQL query, then append this to the end of it: +and+${CONDITIONS}. Don’t mess with the boundary query stuff until some other time (Sqoop2 will automatically query for the min/max of the Partition column name you provide).

7. if you mess with the connection you create in Hue/Sqoop2, note you have to type in the password

8. if you get errors, don’t fight it – you have to log in via SSH and look at /var/log/sqoop2/sqoop2.log

MMO Worlds

Monday, October 13, 2014

Importing data from SQL Server to Cloudera Hadoop via Sqoop2

No comments:

Post a Comment