Saturday, November 15, 2014

Working with XML or JSON in Hive (and Impala)

You need 2 SerDe jar files, and you need to configure the Hive Auxiliary Jars path.


1. Pick the directory where you will always put all your globally-accessible additional SerDe jars:

– these will be usable by everyone who uses Hive, so consider that I guess

– I’m going with /var/lib/hive/aux_jars

– mkdir /var/lib/hive/aux_jars

– do this on each node that is running HiveServer2 or HiveServer.


2. From the following 2 projects get the SerDe jars and somehow copy them into the /var/lib/hive/aux_jars folder on all your nodes running HiveServer2 and/or HiveServer:

http://ift.tt/1xm1hEC

http://ift.tt/1eVPKm1

– make sure to do a chown -R hive:hive /var/lib/hive/aux_jars


3. In Cloudera Manager, click on the Hive service and go to the configuration tab:

– type “aux” to filter the configs to show the

Hive Auxiliary JARs Directory config. enter /var/lib/hive/aux_jars

– this is my own path, not something official or some magic number

– it’s just telling Hadoop-land which directory on the HiveServer2 nodes to look for additional SerDe jars.

– redeploy and restart – just do whatever Cloudera Manager tells you do to in order to deploy the config changes


4. Now you can use those SerDes as they are documented in step 3 above. If not, double-triple check your path spelling. I’ve not had it *not* work for me for any other reason.





No comments:

Post a Comment