Srinath's Blog :My views of the World: Debugging Hadoop Task tracker, Job tracker, Data Node or Name Node

Friday, May 25, 2012

Debugging Hadoop Task tracker, Job tracker, Data Node or Name Node

Hadoop conf/hadoop-env.sh has following environment variables

HADOOP_NAMENODE_OPTS
HADOOP_SECONDARYNAMENODE_OPTS,
HADOOP_DATANODE_OPTS
HADOOP_BALANCER_OPTS
HADOOP_JOBTRACKER_OPTS
HADOOP_TASKTRACKER_OPTS

You can use them to start the remote debugger so that you can connection and debug any of the above servers. Unfortunately, Hadoop tasks are started through a separate JVM by the task tracker, and you cannot use this method to debug your map or reduce function as they run in separate JVMs.

To debug task tracker, do following steps.

1. Edit conf/hadoop-env.sh to have following

export HADOOP_TASKTRACKER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=5000,server=y,suspend=n"

2. Start Hadoop (bin/start-dfs.sh and bin/start-mapred.sh)

3. It will block waiting for debug connection

4. Connect to the server using Eclipse "Remote Java Application" in the Debug configurations and add the break points

5. Run a map reduce Job

Friday, May 25, 2012

Debugging Hadoop Task tracker, Job tracker, Data Node or Name Node

No comments: