In this article, we will see how to recover unhealthy nodes from hadoop cluster.

Following are the content:
  •          Why nodes become unhealthy
  •            How two recover them

1.       Using Hue
2.       Using CLI
  •  Which method is best
 Let us first understand what does unhealthy means and why nodes become unhealthy.

The unhealthy state means that nodes are reachable, but it cannot used to schedule task execution.   The main reason for this is insufficient memory in nodes. If log files are more and did not clean them, memory will be full.  Usually hadoop has to recover nodes from unhealthy state by deleting the log files.

We can manually delete the log files in two methods.

  •          Using HUE
  •          Using Command Line


Using HUE:

As shown in the figure we must go to /var/log first.

There we get spark and hadoop-yarn options. We must delete logs in both the folders.

Select spark and go on till we see logs. Select all logs and clock on delete option.

Repeat it for hadoop-yarn.

Then if we check resource manager, we see nodes in healthy state.

Using Command Line:

Use the following command to see memory consumed.

hadoop fs –ds –h <path>

Path for spark would be like /var/log/spark/apps

Path for hadoop-yarn would be like /var/log/hadoop-yarn/apps/hadoop/logs

Use the following command to see in detail memory consumption

hadoop fs –du –h <path>

Use path as mentioned above

To remove all the logs use the following commands

hadoop fs –rm –r <path>

Advantage of using Commands is if there are huge number of logs to be deleted it takes lot of time if to delete using hue. However, using command line we can delete all of them with one command.

Connect with me in LinkedIn 
                     Happy Learning J