My thoughts after trying them all (on local VMs)
1. You need more than 18GB of RAM on your machine in order to effectively test. Just do it.
2. Cloudera is the easiest to install. AND it sets up Hue for you. Hortonworks and MapR require a LOT of manual edits to arcane config files (they seem arcane when you’re new). Hortonworks is the next-easiest. MapR was the hardest.
3. MapR, to me, has the most promise, given it’s closer to the metal. The promise of random writes directly to the HDSF cluster just seems really really good.
4. MapR is the hardest to install. It just takes more command-line work.
5. Adding nodes to a cluster is strightforward with Hortonworks and Cloudera. With Mapr, you have to do more command-line prep than a noob will prefer.
6. MapR seemed to have some of the best documentation on how Hadoop works. Hortonworks was up there too. Cloudera seemed a little less than screamingly clear… but that could have been due to the fact that theirs was the first docs I had started reading.
7. Hortonworks installs a MySQL instance for the Hive metastore. Cloudera and MapR use some embedded Postgre DB, which they repeatedly say not to use for much beyond a proof of concept cluster.
8. Cloudera has some proactive notifications on config best practices. However, I’m not sure why something like Java heap size configs would differ – I suppose the installer may set things to some percentage of available RAM.
No comments:
Post a Comment