Saving H2O Models From R/Python API in Hadoop Environment - DZone Big Data

When you are using H2O in a clustered environment, i.e. Hadoop, the machine could be different where h2o.savemodel() is trying to write the model. That's why you see the error, “No such file or directory.” If you just give the path, i.e. /tmp, and visit the machine ID where the H2O connection is initiated from R, you will see the model stored there.

Here is a good example to understand it better.

1. Start Hadoop Driver in EC2 Environment 

[ec2-user@ip-10-0-104-179 ~]$ hadoop jar h2o-3.10.4.8-hdp2.6/h2odriver.jar -nodes 2 -mapperXmx 2g -output /usr/ec2-user/005
....
....
....
Open H2O Flow in your web browser: http://10.0.65.248:54323  <=== H2O is started.
[ec2-user@ip-10-0-104-179 ~]$ hadoop jar h2o-3.10.4.8-hdp2.6/h2odriver.jar -nodes 2 -mapperXmx 2g -output /usr/ec2-user/005
....
....
....
Open H2O Flow in your web browser: http://10.0.65.248:54323  <=== H2O is started.

2. Connect R Client With H2O

> h2o.init(ip = "10.0.65.248", port = 54323, strict_version_check = FALSE)

Note: I have used the IP address as shown above to connect with the existing H2O cluster. However, the machine where I am running the R client is different, as its IP address is 34.208.200.16.

3. Saving H2O Model

h2o.saveModel(my.glm, path = "/tmp", force = TRUE)

The mode is saved at 10.0.65.248 even when the R client is running at 34.208.200.16.

ec2-user@ip-10-0-65-248 ~]$ ll /tmp/GLM*
-rw-r--r-- 1 yarn hadoop 90391 Jun 2 20:02 /tmp/GLM_model_R_1496447892009_1

You need to make sure you have access to a folder where the H2O service is running, or you can save model at HDFS something similar to as below:

h2o.saveModel(my.glm, path = "hdfs://ip-10-0-104-179.us-west-2.compute.internal/user/achauhan", force = TRUE)

Comments

Popular posts from this blog

New Article Posted: Seven Things You Should Do in Your AMS

10 Common Mistakes Biblical Counselors Sometimes Make, Part 8