Friday, October 31, 2014

Installing MySQL as the external database for Hive, Impala, etc. in Cloudera Hadoop distribution

I intend on having 2 separate namenode servers, and on each one I will be running a MySQL instance. Hive metastore data, Impala metadata, namenode info, Cloudera Manager, and other roles will all use these DBs. Not sure how it’ll all go yet, but my first attempt last week didn’t go so well.


Read through this, then at each step, read ahead a few steps, as the info is a bit spread out: http://ift.tt/1wPj4BF


Do yourself a favor and disable selinux before installing. Otherwise you’ll likely see this error when mysql fails to start:

Fatal error: Can’t open and lock privilege tables: Table ‘mysql.host’ doesn’t exist

(it’s due to permissions on the /var/lib/mysql folder

– nano /etc/selinux/config

– “enforcing” change to “disabled”

– reboot machine

– yum remove mysql-server

– rm -rf /var/lib/mysql

– yum install mysql-server


1. yum install mysql mysql-server

2. in /etc/my.cnf (maybe make a copy of your original, e.g. cp my.cnf my.cnf.orig)

– paste in the sample my.cnf file contents from here (the paths should match the default install paths, but check just in case): http://ift.tt/1wPj4BF

– uncomment the binlog lines and set it to “mixed” (otherwise it’ll error out when your cluster starts up)

3. service mysqld start

4. chkconfig mysqld on

5. /usr/bin/mysql_secure_installation

6. Install Extra Packages for Enterprise Linux (EPEL repo)

– wget http://ift.tt/1htTSIk

– sudo rpm -Uvh epel-release-6*.rpm

7. Copy in the mysql connector (do this on both machines)

– mkdir /usr/share/java

– cp mysql-connector-java-5.1.33/mysql-connector-java-5.1.33-bin.jar /usr/share/java/mysql-connector-java.jar

– note the filename changes when you copy it (if it’s not exactly that you will get errors when testing DB connections)

8. At this point I installed Cloudera Manager on machine1

– from here: http://ift.tt/1wPj4BJ

– wget http://ift.tt/1eEtSMH

– chmod u+x cloudera-manager-installer.bin

– ./cloudera-manager-installer.bin

– the installer downloads everything you need

9. I’m having it install Java

– later I’ll install the mysql connector

– I had one node fail during installation – just “retry failed hosts” a few times. Likely it was some connectivity thing

10. Now for the Cluster Role Assignments step

– All services that require a DB will be on the hosts that have mysql installed and running

– NameNode – first server

– Secondary NameNode – second server

– Cloudera Management Server Activity Monitor – second server

11. Database setup

– “Use Custom Database”

– Database Type: MySQL

– create databases and users for each service listed at the bottom of this page: http://ift.tt/1wPj4BF

– this page provides another description of this process: http://ift.tt/1qb7xXE

– from inside a mysql client on the host mentioned has storing the activity monitor:

– create database amon DEFAULT CHARACTER SET utf8;

– grant all on amon.* TO ‘amon’@’%’ IDENTIFIED BY ‘amoncrazypassword';

– from inside a mysql client on the host mentioned as storing the hive metadata:

– create database hive DEFAULT CHARACTER SET utf8;

– grant all on hive.* TO ‘hive’@’%’ IDENTIFIED BY ‘hivecrazypassword';

– In the “Database Host Name:” field, mine looked like this:

– “:3306″ since that’s the port mysql was running on

– I’d recommend not explicitly specifying a hostname because there are a lot of configs the installer auto-configures.

12. You should get some nice green check marks indicating successful connections. If not:

– service iptables status

– check the port mysql is configured to listen on (default is 3306)

– service mysqld status

– make sure the right database and user are created on the appropriate host

13. If the install finishes smoothly, make sure to set up backups.

– I’ve seen it fail on the “Creating Hive Metastore Databse Tables” step with an error about there needing to be a hostname configured. Just open up a new tab pointed at your Cloudera manager and then look for the red configuration alert next to the Hive service in your cluster. It should take you directly to the offending config. In my case, I clicked on the button to revert it to default. Then go back to the Cluster Setup tab and click Retry. Things were happy again.

14. I’d suggest the first thing you do is restart your cluster

– there are usually some errors that a restart will clear up

15. Hue creates its own embedded database. If you want to have it run on your mysql instances:

– for Hue: http://ift.tt/1qb7vPy

– from a mysql client on the host running hue:

– create database hue DEFAULT CHARACTER SET utf8;

– grant all on hue.* TO ‘hue’@’%’ IDENTIFIED BY ‘huecrazypassword';

– in Cloudera Manager click on the Hue Service > Configuration tab at the top > look for “Database” configs on the left

– update appropriately

16. Honestly, expect the startup of a new cluster to feel a bit like whack-a-mole with the errors on services. Just click on them, look at the logs, and then restart them a few times. The HDFS and HBase services, if not working right, can cause other services to error out, especially the health checks. All that means is focus on HBase and HDFS first.





No comments:

Post a Comment