Tuesday, October 21, 2014

Installing Cloudera stack

Cloudera Manager will set all the Hadoop stuff up for you. However, by default it will do it with an embedded PostGres DB, which they say will not scale well as you grow your cluster. Also, it may be harder to do backups and such. It seems that’s the only significant blocker from simply saying “just run the Cloudera Manager auto-installer”.


I figured I’d install MySQL, which is one of their supported DB platforms. But we don’t really have to do that until the Cloudera Manager cluster setup is to the point where you’re setting up the Hive metastore (step 3 of “Cluster Setup”, well into the whole shebang). The installation instructions seem to make a big deal about having to have the databases and such figured out before you ever run the Cloudera Manager installer…. you don’t – you can just install them when the Cloudera Manager gets to the database setup step.


The URL I’m at has “step=showDbTestConnStep” in it. I see “Database Setup”, with options for “Use Custom Database” and “Use Embedded Database”. At this point I went to the node that has the hive role and installed mysql this way:

yum install mysql-server -y

chkconfig mysqld on

/usr/bin/mysql_secure_installation


I did a CentOS minimal install, so I got some errors Cloudera not being able to find the JDBC driver to connect to the mysql database, so I did this to install it, and things were happy.

sudo yum install mysql-connector-java


After all this I got an error saying Cloudera couldn’t find the specified database, so I did this to simply create it:

mysql

create database hive;





No comments:

Post a Comment