Here’s the error I saw (from ‘Recent Log Entries” in Cloudera Manager after clicking on the Details for the failed YARN startup step when restarting the Cluster):
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: 13 missing files; e.g.: /tmp/hadoop-yarn/yarn-nm-recovery/nm-aux-services/mapreduce_shuffle/mapreduce_shuffle_state/000005.sst
There is a bug already submitted in Jira for YARN that seems to encompass this error I saw. It also seems to include a workaround:
Fix:
In short, remove or rename the CURRENT file in these 2 paths and then restart YARN (or delete the files, or I think you could even just reboot each affected node since the /tmp folder may be cleared out on reboot):
/tmp/hadoop-yarn/yarn-nm-recovery/yarn-nm-state
/tmp/hadoop-yarn/yarn-nm-recovery/nm-aux-services/mapreduce_shuffle/mapreduce_shuffle_state
No comments:
Post a Comment