Wednesday, October 29, 2014

Dynamic Hive table schema based on HBase column family

Goal:

Let data get firehosed into HBase. Then auto-generate and maintain Hive external table schema based on the actual key-value pairs in the HBase column families. It seems nobody’s really doing this too much. However, it seems this would be a general solution for data warehousing (as long as you can get all data into JSON format in an HBase table).


How to generate dynamic Hive tables based on JSON:


http://ift.tt/1n5r1d0


The good JSON serde:


http://ift.tt/1eVPKm1


http://ift.tt/1p3WW62


Generate a Hive schema based on a “curated” representative JSON doc:


http://ift.tt/104aylj


http://ift.tt/1u2Mlp4


In a comment on his own OP, how to create an external table pointed at an HBase table that returns everything as JSON. It may be useful as the source of the “curated” JSON doc:


http://ift.tt/104aylk


I’ll update on progress…





No comments:

Post a Comment