MMO Worlds: October 2014

Friday, October 31, 2014

Installing MySQL as the external database for Hive, Impala, etc. in Cloudera Hadoop distribution

I intend on having 2 separate namenode servers, and on each one I will be running a MySQL instance. Hive metastore data, Impala metadata, namenode info, Cloudera Manager, and other roles will all use these DBs. Not sure how it’ll all go yet, but my first attempt last week didn’t go so well.

Read through this, then at each step, read ahead a few steps, as the info is a bit spread out: http://ift.tt/1wPj4BF

Do yourself a favor and disable selinux before installing. Otherwise you’ll likely see this error when mysql fails to start:

Fatal error: Can’t open and lock privilege tables: Table ‘mysql.host’ doesn’t exist

(it’s due to permissions on the /var/lib/mysql folder

– nano /etc/selinux/config

– “enforcing” change to “disabled”

– reboot machine

– yum remove mysql-server

– rm -rf /var/lib/mysql

– yum install mysql-server

1. yum install mysql mysql-server

2. in /etc/my.cnf (maybe make a copy of your original, e.g. cp my.cnf my.cnf.orig)

– paste in the sample my.cnf file contents from here (the paths should match the default install paths, but check just in case): http://ift.tt/1wPj4BF

– uncomment the binlog lines and set it to “mixed” (otherwise it’ll error out when your cluster starts up)

3. service mysqld start

4. chkconfig mysqld on

5. /usr/bin/mysql_secure_installation

6. Install Extra Packages for Enterprise Linux (EPEL repo)

– wget http://ift.tt/1htTSIk

– sudo rpm -Uvh epel-release-6*.rpm

7. Copy in the mysql connector (do this on both machines)

– mkdir /usr/share/java

– cp mysql-connector-java-5.1.33/mysql-connector-java-5.1.33-bin.jar /usr/share/java/mysql-connector-java.jar

– note the filename changes when you copy it (if it’s not exactly that you will get errors when testing DB connections)

8. At this point I installed Cloudera Manager on machine1

– from here: http://ift.tt/1wPj4BJ

– wget http://ift.tt/1eEtSMH

– chmod u+x cloudera-manager-installer.bin

– ./cloudera-manager-installer.bin

– the installer downloads everything you need

9. I’m having it install Java

– later I’ll install the mysql connector

– I had one node fail during installation – just “retry failed hosts” a few times. Likely it was some connectivity thing

10. Now for the Cluster Role Assignments step

– All services that require a DB will be on the hosts that have mysql installed and running

– NameNode – first server

– Secondary NameNode – second server

– Cloudera Management Server Activity Monitor – second server

11. Database setup

– “Use Custom Database”

– Database Type: MySQL

– create databases and users for each service listed at the bottom of this page: http://ift.tt/1wPj4BF

– this page provides another description of this process: http://ift.tt/1qb7xXE

– from inside a mysql client on the host mentioned has storing the activity monitor:

– create database amon DEFAULT CHARACTER SET utf8;

– grant all on amon.* TO ‘amon’@’%’ IDENTIFIED BY ‘amoncrazypassword';

– from inside a mysql client on the host mentioned as storing the hive metadata:

– create database hive DEFAULT CHARACTER SET utf8;

– grant all on hive.* TO ‘hive’@’%’ IDENTIFIED BY ‘hivecrazypassword';

– In the “Database Host Name:” field, mine looked like this:

– “:3306″ since that’s the port mysql was running on

– I’d recommend not explicitly specifying a hostname because there are a lot of configs the installer auto-configures.

12. You should get some nice green check marks indicating successful connections. If not:

– service iptables status

– check the port mysql is configured to listen on (default is 3306)

– service mysqld status

– make sure the right database and user are created on the appropriate host

13. If the install finishes smoothly, make sure to set up backups.

– I’ve seen it fail on the “Creating Hive Metastore Databse Tables” step with an error about there needing to be a hostname configured. Just open up a new tab pointed at your Cloudera manager and then look for the red configuration alert next to the Hive service in your cluster. It should take you directly to the offending config. In my case, I clicked on the button to revert it to default. Then go back to the Cluster Setup tab and click Retry. Things were happy again.

14. I’d suggest the first thing you do is restart your cluster

– there are usually some errors that a restart will clear up

15. Hue creates its own embedded database. If you want to have it run on your mysql instances:

– for Hue: http://ift.tt/1qb7vPy

– from a mysql client on the host running hue:

– create database hue DEFAULT CHARACTER SET utf8;

– grant all on hue.* TO ‘hue’@’%’ IDENTIFIED BY ‘huecrazypassword';

– in Cloudera Manager click on the Hue Service > Configuration tab at the top > look for “Database” configs on the left

– update appropriately

16. Honestly, expect the startup of a new cluster to feel a bit like whack-a-mole with the errors on services. Just click on them, look at the logs, and then restart them a few times. The HDFS and HBase services, if not working right, can cause other services to error out, especially the health checks. All that means is focus on HBase and HDFS first.

Hive Metastore Canary – fails to create a database during test

Thursday, October 30, 2014

Mounting NFS share on Windows 8.1

MapR has a fantastic feature where it exposes the Hadoop file system directly via NFS. So, you can read and write to the Hadoop cluster directly from client machines, rather than having to use other tools you have to learn. Now, to be fair, Hortonworks and Cloudera have NFS share features too. The difference is that they buffer the incoming data before actually writing it to the cluster. With MapR, the data is written as it’s received. MapR should have better performance with NFS shares too.

The use cases for DWH are pretty obvious: if you can export csv files and have them simply write directly to the Hadoop cluster, then you’ve cut out an entire step in the data workflow. No need to use special ODBC drivers in SSIS to write data to Hadoop, no need to move very large files around over your network. You can even mount the NFS share on your web servers and have them log directly to that mount point. As the data is laid down, it is queryable too

It’s easy enough to mount an NFS share in Linux and OSX, but for Windows you have to either install some third party NFS client, or you can do this:

Enable the NFS client features in Windows – search for Add Features

regedit.exe

HKEY_LOCAL_MACHINE > SOFTWARE > Microsoft > ClientForNFS > Default > add 2 DWORDS:

AnonymousGid

AnonymousUid

The values need to be 0, which will be set by default.

from a command line:

"nfsadmin client restart" (or reboot)

Now when you browse to your mapR machine’s IP via Explorer, you will see the shared folders. You can copy stuff in or out, delete, etc. Note, I’ve only ever seen 30MB/s at the fastest.

mapR docs on the subject:

http://ift.tt/1yKUuBn

Notes on security:

I haven’t looked yet, but it’d probably be a good idea to restrict which hosts can hit which folders on the Hadoop cluster. I haven’t yet messed with how mapR exports the NFS shares, but I suppose IP restrictions should be alright.

Wednesday, October 29, 2014

Mapr Install redux

Dynamic Hive table schema based on HBase column family

Goal:

Let data get firehosed into HBase. Then auto-generate and maintain Hive external table schema based on the actual key-value pairs in the HBase column families. It seems nobody’s really doing this too much. However, it seems this would be a general solution for data warehousing (as long as you can get all data into JSON format in an HBase table).

How to generate dynamic Hive tables based on JSON:

http://ift.tt/1n5r1d0

The good JSON serde:

http://ift.tt/1eVPKm1

http://ift.tt/1p3WW62

Generate a Hive schema based on a “curated” representative JSON doc:

http://ift.tt/104aylj

http://ift.tt/1u2Mlp4

In a comment on his own OP, how to create an external table pointed at an HBase table that returns everything as JSON. It may be useful as the source of the “curated” JSON doc:

http://ift.tt/104aylk

I’ll update on progress…

Tuesday, October 28, 2014

MapR – Architectural Overview

I found this video by the CTO of MapR to be very explanatory about now Mapr does things.

In short, there are no NameNodes to die in your Hadoop cluster, their MaprDB seems even better than HBase, their NFS allows for random read-write, and everything on the cluster is HA (not sure still why exactly there are a limited number of control nodes though – more to learn).

https://www.mapr.com/resources/videos/architectural-overview-mapr

Getting data into Hadoop from Windows via SSIS (SSDT) and a DSN

Figured I’d consolidate what I did to get data into a Hadoop cluster with SSIS/SSDT. I used SSDT 2013 – the BI version install. In this case, it was a Hortonworks Hadoop cluster.

1. Install both the 32 and 64 bit versions of the Hortonworks Hive ODBC driver: http://ift.tt/1wsXYrp

– edit the DSNs that should already be in there from the installer to point at whichever node is running a HiveServer2 service (find it in Ambari)

– you have to edit the 32 bit one via the 32 bit ODBC Data Source Administrator and the 64 bit DSN via the 64 bit ODBC Data Source Administrator

– In “Authentication”, I could only get it to work with “username” or “username and password”. “No Authentication” resulted in a test just hanging for a long time. I used the “hive” account creds as defined in Ambari

2. At this point you must have a Hive table schema already existing. Just map columns from the data flow to columns in the Hive table schema and you should be good to go.

– Now, to be honest, I keep getting a permissions issue with writing data to the table. I haven’t solved it yet, but it may be problematic. I’ll try to update later with what works.

How do you create a table in Hive? You can use Hue or HCat or whatever web-based tool may be installed on your Hadoop cluster. But here is how you can do it via the command-line:

1. SSH into one of the nodes on your cluster that’s running Hive

2. change use to a user that has permissions in Hive:

– su hive

3. start hive

– hive

4. Once you’re in hive, it’s just like when you’re logged in to a mysql client – generally the same commands are available)

– show databases;

– use database default;

– create table (Year int, Month);

Monday, October 27, 2014

Installing Hortonworks Hadoop Ambari Server

Installing their manager was simple enough:

Install steps here: http://ift.tt/1Dm56HJ

You can use any VM – not just the Vagrant steps described.

1. install the Hortonworks repo:

– wget http://ift.tt/1fESf8t

– cp ambari.repo /etc/yum.repos.d

2. make sure you set up your hosts file at /etc/hosts

2. install the ambari-server

– yum install ambari-server

– ambari-server setup

– just go with defaults

3. start ambari-server

– ambari-server start

– it’ll fail if your hosts file doesn’t include the hostname of the machine you’re running it on

4. in order to add machines to a cluster you need to have the ambari agent installed on each machine

– http://ift.tt/1Dm56HL

– the Ambari wizard will do it automatically ONLY if you have ssh keys installed

– if you don’t have ssh keys installed you will have to manually install the ambari agent on each machine

– wget http://ift.tt/1fESf8t

– cp ambari.repo /etc/yum.repos.d

– yum install ambari-agent

– point the agent at the Ambari server

– nano /etc/ambari-agent/conf/ambari-agent.ini

Notes:

1. The step where you register and check the servers: http://ift.tt/1tA1cuT

– Hosts will fail if you don’t have the /etc/hosts files synced up to reflect all the hosts

2. during install, you can tail -f /var/log/yum.log to see everything Ambari is installing

3. During the “Install, Start, and Test” step, I got a failure on all 3 nodes with this message: “Puppet has been killed due to timeout”. I see yum still installing stuff on each node, so I presume it’s just a case where my internet connection was too slow and it really did timeout. I’m going to let yum continue and then simply click the “retry” button in Ambari…. yeah, after waiting clear through when it installed the mysql-connector packages, I clicked the “Retry” button in Ambari – it’s progressing now.

4. This is kinda cool – it installs mysql as the backend DB for the manager and Hive.

5. It will install Nagios server and agents – again, kinda cool

6. It seems to use Puppet for deployment

7. Note the mysql install could be more secure. Run this on the mysql host at some point: mysql_secure_installation (use all defaults except the mysql root password)

Problems I encountered:

1. Not all services started right up after the install. I had to manually start a few

2. In the initial setup wizard I changed the oozie configs to use a mysql database (the one that was configured to be used for hive). However, the oozie username does not get created and neither does the database. I had to create mysql users oozie@% and oozie@horton2 (the name of the machine oozie was running on, not it was also the same host). The oozie@% wasn’t strictly necessary, but I added it for future machines I might run oozie on.

3.

Friday, October 24, 2014

Installing MapR for the first time

I’m going to give a step-by-step as I’m doing it. I’m running on a fresh CentOS 6.5 minimal install:

1. Download installer from here: http://ift.tt/1yw9kLV

– I had to do yum install wget

– mv mapr-setup mapr-setup.sh

– chmod u+x mapr-setup.sh

– ./mapr-setup.sh

– I see Run “/opt/mapr-installer/bin/install” as super user, to begin install process

– /opt/mapr-installer/bin/install

2. “Unable to install package sshpass”

– looking at logs in /opt/mapr-installer/var/mapr-installer.log I see it tried doing yum install sshpass

– well, I guess it wasn’t found in the default repos I have

– I’m doing this: http://ift.tt/12u8GUO (installing maprTech and EPEL repos)

3. Some questions about the install come up

– I typed in the hostname when it asked for the hostname for the control nodes (I guess a consistent /etc/hosts file would be a good idea)

– I go with default answers (I hit enter a bunch)

4. “Disks not set for node: c1″ (c1 is the hostname of my control node, which is the machine I’m running this installer on)

– I select to modify the options

– “d”

– “Enter the full path of disks for hosts separated by spaces or commas []:”

– I open a new SSH window and do lsblk to see possible drives. I’m going to add a new disk to this machine really quick

– echo “- – -” > /sys/class/scsi_host/host0/scan

– lsblk (don’t see any new disks

– echo “- – -” > /sys/class/scsi_host/host1/scan

– lsblk (still no new disks)

– echo “- – -” > /sys/class/scsi_host/host2/scan

– lsblk (ahh… there’s a new disk at sdb

5. /dev/sdb is the full path to the disks I want to use

– does this mean all hosts need to have the same device paths?

– not sure, but I’m going to find out how things go with adding data nodes later on

6. I put in the SSH creds

7. This is a decent wait

– while poking around in /opt/mapr-installer I noticed an ansible directory – cool way to install stuff – just use an ansible playbook

– Cloudera requires the Oracle JDK. MapR uses OpenJDK, which seemed to download and install faster than the Cloudera install.

– I noticed MapR has some good instructions on how to set up a local repo: http://ift.tt/12u8GUO

8. Well, that install took about 20 minutes (on the safe side)

9. I can log in. Now I’ll see how things look, create a few data node VMs, create a new cluster (or is the cluster technically created and I just need to add more nodes?)

Cloning a CentOS 6.5 VM and getting networking to work

Clone a working CentOS 6.5 machine and start it up:

update the hostname here:

nano /etc/sysconfig/network

Reset the NICs:

rm -rf /etc/udev/rules.d/70-persistent-net.rules

reset the UUID and MAC address:

nano /etc/sysconfig/network-scripts/ifcfg-eth0

– remove the stuff after “HWADDR=” just on that line though

– remove the stuff after “UUID=” just on that line)

– make sure you see ONBOOT=yes

reboot

After this, you may not need to mess with the MAC address, as mentioned below, but I figured I’d include how I’ve found to do it.

After reboot, do one of the below to set the MAC address:

1. Go to that above 70-persistent-net.rules file (it gets recreated), get the MAC address and then paste it into “HWADDR=” line

-OR-

2. run system-config-network-tui and confirm the network settings are as desired, then save it. This will put in the MAC address

Thursday, October 23, 2014

OAuth on Tableau Server 8.2

Situation:

Running Tableau behind a NAT

Tableau Server is behind a proxy server

No DNS from public internet

Wanting to extract Google Analytics data automatically via Tableau Server

Note:

The OAuth setup will not work without the proxy settings configured for the Run As account. Here’s what the error looked like after clicking “Accept” on the IE popup confirming I wanted to allow Tableau Server to connect to my Google Account via OAuth (after a 20 second pause):

Tableau Server encountered an internal error.

Request ID: VElOGwoxNiEAADC0qPEAAAHu

Tableau Server version: Version 8.2.3

(8200.14.0925.1717) 64-bit

Here are the steps I took to set up Tableau Server OAuth. They clarify a few things not super-explicitly mentioned here: http://ift.tt/1tmnlLZ:

1. Set up Google OAuth

– create new project here: http://ift.tt/1fljiJ4

– set up a new Client ID under “Credentials

– REDIRECT URI needs to be set to the hostname/domain name you use in your browser to connect to Tableau Server when you set up your Google Analytics key. Note, this cannot be an IP address. “localhost” works if you want to set up OAuth while logged in to the Tableau Server. Really, this whole thing is complicated by the fact that there is no direct, consistent public IP/domain name for accessing Tableau.

– JAVASCRIPT ORIGINS – just blank it out

2. Enable the “Analytics API” in the APIs page of your project

2. log in to Tableau Server (Web UI) with the hostname/domain name you put in the above REDIRECT URI field.

3. The Tableau Server admin needs to go to Admin > Maintenance and check the boxes in the Settings section to allow users to save their own OAuth tokens

3. On the top-right click on your name > User Preferences > click on “Add” next to Google Analytics

Notes on REDIRECT URI:

1. IP addresses result in Google saying you need to supply a device_id and something else – doesn’t work

3. You could make up a domain name and add it to your hosts file (and all your end users’ hosts files…. yikes), and telling IE to not use a proxy server for addresses beginning with that made-up domain name

Google Analytics auto extractions on Tableau Server: GetOAuthUsername caught exception processing specs Response code: 500

The Google Analytics connector will throw odd 500 errors if you don’t configure the Run As account to use the proxy server. If you happened to have Tableau Desktop installed on the Tableau Server machine, everything would have worked just fine…. because you had likely set your own proxy server settings. You also need to set the proxy server configs for your Tableau Server’s Run As account… there, I said it twice.

You have to login/RDP to the Tableau Server as the Run As account to do this. Running IE as the Run As account and configuring the proxy server didn’t “stick” (after setting it, I immediately opened up the configs again and they were reset to default). You have to RDP in. If you don’t, here’s an error you see when publishing a workbook to the server:

An error occurred when publishing the data source.

GetOAuthUsername caught exception processing specs Response code: 500

Error in Cluster – Error connecting to db with user ‘\”hive’\” and jdbcUrl ‘\”jdbc:postgresql://localhost:7432/hive’\”’

The Cluster Setup step (step 5) where it creates the Hive metastore tables kept failing with the below error (seen in the logs when you click on Details > stderr). I was installing the embedded Postgres DB.

Error connecting to db with user ‘\”hive’\” and jdbcUrl ‘\”jdbc:postgresql://localhost:7432/hive’\”’

It seems I got it to get past this by manually clicking to start the Hive metastore service:

Coudera Manager > in the cluster section on the left click on Hive > click on Instances tab > select the Hive metastore server and from Actions select “start”. For whatever reason, after doing that and then going back and clicking “Retry” on the Cluster Setup page, it didn’t throw the error and the cluster seemed to start up alright.

Wednesday, October 22, 2014

Cluster Installation step of installing Hadoop via Cloudera Manager

The “Cluster Installation” phase of installing the Hadoop cluster didn’t have CDH 5.2 – it only showed 4.7. It had also auto-selected some of the other parcel options (Accumulo, etc.), when every other time I’ve done this installation it had by default selected “None”. The following step where it downloads and distributes the parcels seemed to hang after downloading; so when I clicked “Back”, the cluster installation restarted and I had to reselect the nodes I wanted to include in the cluster. After that, the usual default selections of CDH 5.2 and none of the other parcels showed up; and this time it seems the download, distribution and activation worked fine.

My URL looked like this, in case you’ve encountered the same thing:

/cmf/express-wizard/wizard#step=parcelInstallStep

I don’t think this something is particularly broken, but I figured I’d post this as an edge case help.

Failed Cluster Installation on Initial Hadoop Cluster Installation via Cloudera Manager

At this point I think you may be able to start a new cluster installation from a fresh Cloudera Manager install: http://192.168.1.201:7180/cmf/express-wizard/. I’m not sure about that, but I’ll update later if it is indeed the case.

BTW, the “Abort Installation” button on the Cluster Installation (step 3) applies to hosts whose installation has NOT completed successfully. It won’t roll back all your hosts, in the case where just one host is unhappy for some reason. I’ve seen a fairly regular occurrence of just one host having some intermittent network problem for example, that results in the express-wizard getting hung up waiting on that machine, but that machine doesn’t know to tell the wizard anything. So you could be sitting there forever. In that case, just click that “Abort Installation” button. You should then see buttons replace it about rolling back failed hosts and then attempt reinstall on failed hosts.

Tuesday, October 21, 2014

Installing Cloudera stack

Cloudera Manager will set all the Hadoop stuff up for you. However, by default it will do it with an embedded PostGres DB, which they say will not scale well as you grow your cluster. Also, it may be harder to do backups and such. It seems that’s the only significant blocker from simply saying “just run the Cloudera Manager auto-installer”.

I figured I’d install MySQL, which is one of their supported DB platforms. But we don’t really have to do that until the Cloudera Manager cluster setup is to the point where you’re setting up the Hive metastore (step 3 of “Cluster Setup”, well into the whole shebang). The installation instructions seem to make a big deal about having to have the databases and such figured out before you ever run the Cloudera Manager installer…. you don’t – you can just install them when the Cloudera Manager gets to the database setup step.

The URL I’m at has “step=showDbTestConnStep” in it. I see “Database Setup”, with options for “Use Custom Database” and “Use Embedded Database”. At this point I went to the node that has the hive role and installed mysql this way:

yum install mysql-server -y

chkconfig mysqld on

/usr/bin/mysql_secure_installation

I did a CentOS minimal install, so I got some errors Cloudera not being able to find the JDBC driver to connect to the mysql database, so I did this to install it, and things were happy.

sudo yum install mysql-connector-java

After all this I got an error saying Cloudera couldn’t find the specified database, so I did this to simply create it:

mysql

create database hive;

Installing Hadoop via Cloudera Manager

After installing things on a few nodes I saw this message when Cloudera Manager was “checking things for correctness”:

Cloudera recommends setting /proc/sys/vm/swappiness to 0. Current setting is 60. Use the sysctl command to change this setting at runtime and edit /etc/sysctl.conf for this setting to be saved after a reboot. You may continue with installation, but you may run into issues with Cloudera Manager reporting that your hosts are unhealthy because they are swapping. The following hosts are affected:

From http://ift.tt/1dXvcEy

Run this on each node: sudo sysctl -w vm.swappiness=0

Monday, October 20, 2014

Notes on Installing Cloudera Manager on CentOS 6.5

I did a minimal install, so here were the necessary things:

1. Disable SELINUX

– nano /etc/selinux/config

– set it to “disabled”

2. install python

– yum install python

3. install wget (to get cloudera-manager installer)

– yum install wget

4. install nano (because I like it)

– yum install nano

5. allow root to log in remotely (Cloudera Manager requires this)

6. disable iptables

– only because I’m running all machines within a locked-down subnet

– yeah, scary with allowing root to log in remotely

Likely convenient things:

1. set a common hosts file for all machines:

– list out all the IPs and hostnames for each machine in this file

– this is where you can specify the fully qualified domain name as well as host names

– Cloudera Manager will work more seemlessly in various ways

– You can have Cloudera Manager search for nodes to add by these host names

2. set up SSH keys so Cloudera Manager can use those instead of SSH passwords (

– though it can use SSH passwords as well

Note that when Cloudera Manager sets up a host, it will download 500+ MBs of install packages on each machine. In my case, the YUM install logs showed this:

Transaction Summary

================================================================================

Install 117 Package(s)

Total download size: 513 M

Installed size: 894 M

Downloading Packages:

Sunday, October 19, 2014

VMWare Workstation 10 – first try

If you want to use “bridged” connections, where the VMs appear as any other machine directly attached to your router (and dhcp server), you may need to go Edit > Virtual Network Editor > select the “Bridged” Type row, then below in the “Bridged to:” drop-down, select your NIC that is your main internet connection. I’ve tried loads of different virtualization and VPN solutions, so I have about 12 different network connections.

Once I did the above, my VMs set to “bridged” NICs got an IP from my dhcp server just fine.

Monday, October 13, 2014

Importing data from SQL Server to Cloudera Hadoop via Sqoop2

Critical points that may or may not be obvious:

1. If you’re using the Sqoop stuff in the web-based interface, you’re actually using Sqoop2

2. You have to download and install the JDBC driver for SQL Server yourself

– curl -L ‘http://ift.tt/1CfDcMU; | tar xz

– sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop2/

– while you’re at it, you may as well put it in the sqoop directory too: sudo cp sqljdbc_4.0/enu/sqljdbc4.jar /var/lib/sqoop/

3. Sqoop2 home directory is /var/lib/sqoop2/ (maybe not…….)

4. restart Sqoop2 service after copying in the JDBC driver file:

– sudo service sqoop2-server restart

5. connection string in “Manage Connections” is like this: jdbc:sqlserver://192.168.1.102

6. for an action, leave schema and table name fields blank and just paste in your TSQL query, then append this to the end of it: +and+${CONDITIONS}. Don’t mess with the boundary query stuff until some other time (Sqoop2 will automatically query for the min/max of the Partition column name you provide).

7. if you mess with the connection you create in Hue/Sqoop2, note you have to type in the password

8. if you get errors, don’t fight it – you have to log in via SSH and look at /var/log/sqoop2/sqoop2.log

Wednesday, October 8, 2014

Hyper-V and VirtualBox on the same computer

You can only run one or the other; not both at the same time. When I tried running a VM on VirtualBox on my Windows 8.1 machine, I got this error:

“vt-x is not available”

This post in the VirtualBox forums was my first indication that you can’t have 2 hypervisors running at the same time: http://ift.tt/1vRwgFG

This post is a solution that has worked for me:

http://ift.tt/1dQvWhx

In short, you create a new boot profile that disables Hyper-V in Windows 8.1 (covered in the Hanselman blog post). It’s actually kinda slick. When you click “restart”, hold down the Shift key and the menu will come up where you can select a different boot profile.