Goal:
Let data get firehosed into HBase. Then auto-generate and maintain Hive external table schema based on the actual key-value pairs in the HBase column families. It seems nobody’s really doing this too much. However, it seems this would be a general solution for data warehousing (as long as you can get all data into JSON format in an HBase table).
How to generate dynamic Hive tables based on JSON:
The good JSON serde:
Generate a Hive schema based on a “curated” representative JSON doc:
In a comment on his own OP, how to create an external table pointed at an HBase table that returns everything as JSON. It may be useful as the source of the “curated” JSON doc:
I’ll update on progress…
No comments:
Post a Comment