Hadoop conf/hadoop-env.sh has following environment variables
- HADOOP_NAMENODE_OPTS
- HADOOP_SECONDARYNAMENODE_OPTS,
- HADOOP_DATANODE_OPTS
- HADOOP_BALANCER_OPTS
- HADOOP_JOBTRACKER_OPTS
- HADOOP_TASKTRACKER_OPTS
You can use them to start the remote debugger so that you can connection and debug any of the above servers. Unfortunately, Hadoop tasks are started through a separate JVM by the task tracker, and you cannot use this method to debug your map or reduce function as they run in separate JVMs.
To debug task tracker, do following steps.
1. Edit conf/hadoop-env.sh to have following
export HADOOP_TASKTRACKER_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,address=5000,server=y,suspend=n"
2. Start Hadoop (bin/start-dfs.sh and bin/start-mapred.sh)
3. It will block waiting for debug connection
4. Connect to the server using Eclipse "Remote Java Application" in the Debug configurations and add the break points
5. Run a map reduce Job
No comments:
Post a Comment