Saving H2O Models From R/Python API in Hadoop Environment

Saving H2O Models From R/Python API in Hadoop Environment - DZone Big Data

June 14, 2017

When you are using H2O in a clustered environment, i.e. Hadoop, the machine could be different where h2o.savemodel()Â is trying to write the model. That's why you see the error, â€œNo such file or directory.â€ If you just give the path, i.e. /tmp, and visit the machine ID where the H2O connection is initiated from R, you will see the model stored there.

Here is a good example to understand it better.

1. Start Hadoop Driver in EC2 EnvironmentÂ

[ec2-user@ip-10-0-104-179 ~]$ hadoop jar h2o-3.10.4.8-hdp2.6/h2odriver.jar -nodes 2 -mapperXmx 2g -output /usr/ec2-user/005
....
....
....
Open H2O Flow in your web browser: http://10.0.65.248:54323  <=== H2O is started.

[ec2-user@ip-10-0-104-179 ~]$ hadoop jar h2o-3.10.4.8-hdp2.6/h2odriver.jar -nodes 2 -mapperXmx 2g -output /usr/ec2-user/005
....
....
....
Open H2O Flow in your web browser: http://10.0.65.248:54323  <=== H2O is started.

2. Connect R Client With H2O

> h2o.init(ip = "10.0.65.248", port = 54323, strict_version_check = FALSE)

Note: I have used the IP address as shown above to connect with the existing H2O cluster. However, the machine where I am running the R client is different, as its IP address is 34.208.200.16.

3. Saving H2O Model

h2o.saveModel(my.glm, path = "/tmp", force = TRUE)

The mode is saved at 10.0.65.248Â even when the R client is running atÂ 34.208.200.16.

ec2-user@ip-10-0-65-248 ~]$ ll /tmp/GLM*
-rw-r--r-- 1 yarn hadoop 90391 Jun 2 20:02 /tmp/GLM_model_R_1496447892009_1

You need to make sure you have access to a folder where the H2O service is running, or you can save model at HDFS something similar to as below:

h2o.saveModel(my.glm, path = "hdfs://ip-10-0-104-179.us-west-2.compute.internal/user/achauhan", force = TRUE)

Go to the article

Search This Blog

forever building..