Interactive Data Science with R in Apache Zeppelin Notebook

r-zeppelin

Introduction

The objective of this blog post is to help you get started with Apache Zeppelin notebook for your R data science requirements. Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with Scala(with Apache Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown, Shell and more.

0-z-homepage1

0-z-homepage2

However, the latest official release, version 0.5.0, does not yet support the R programming language. Fortunately NFLabs, the company driving this open source project, pointed me to this pull request that provides an R Interpreter. An Interpreter is a plug-in which enables zeppelin users to use a specific language/data-processing-backend. For example to use scala code in Zeppelin, you need a spark interpreter. So, if you are impatient like I am for R-integration into Zeppelin, this tutorial will show you how to setup Zeppelin for use with R by building from source.

Prerequisites

  • We will launch Zeppelin through Bash shell on Linux. If you are using Windows OS I recommend that you install and use the Cygwin terminal (It provides functionality similar to a Linux distribution on Windows).
  • Make sure Java 1.7 and Maven 3.2.x are installed on your host machine and their environment variables are set.

Build Zeppelin from Source

Step 1: Download Zeppelin Source Code

Go to this github branch and download the source code. Alternatively copy and paste this link into your web browser: https://github.com/elbamos/incubator-zeppelin/tree/rinterpreter

2-z-download-branch

In my case I have downloaded and unzipped the folder onto my Desktop

3-z-unzip-desktop

Step 2: Build Zeppelin

Run the following code in your terminal to build zeppelin on your host machine in local mode. If you are installing on a cluster then add these options found in the Zeppelin documentation.

$ cd Desktop/Apache/incubator-zeppelin-rinterpreter

$ mvn clean package -DskipTests

4-z-build-source

This will take around 16 minutes to build zeppelin, Spark and all interpreters including R, Markdown, Hive, Shell, and others. (as shown in the image below).

15mins

Step 3: Start Zeppelin

Run the following command to start zeppelin.

$ ./bin/zeppelin-daemon.sh start

start

Go to localhost on your web browser and listen on port 8080. (i.e. http://localhost:8080). At this point you are ready to start creating interactive notebooks with code and graphs in Zeppelin.

6-z

Interactive Data Science

Step 1: Create a Notebook

Click the dropdown arrow next to the “Notebook” page and click “Create new note”.

6-z-create-note

Give your notebook a name or you can use the assigned default name. I named mine “Base R in Apache Zeppelin”.

6-z-my-notebook

Step 2: Start your Analysis

To use R, use the “%spark.r” or “%spark.knitr” tags as shown in the images below. First let’s use markdown to write some instruction text.

7-z-md

Now let’s install some packages that we may need for our analysis.

8-z-install-packages

Now let’s read in our data set. We shall use the “flights” dataset which shows flights departing New York in 2013.

9-z-read-df

Now let’s do some data manipulation using dplyr (with the pipe operator)

10-z-data-manipulation

You can also use bar graphs and pie charts to visualize some descriptive statistics from your data.

10-z-data-man2

Now let’s do some data exploration with ggplot2

11-ggplot2

Now let’s do some statistical machine learning using the caret package.

13-ml

13-lm-2

How about creating some maps.

14-maps

Final Remarks

Zeppelin allows you to create interactive documents with beautiful graphs using multiple programming languages. The objective of this post was to help you setup Zeppelin for use with the R programming language. Hopefully the Project Management Committees (PMC) of this wonderful open source project can release the next version with an R interpreter. It will surely make it easier to launch Zeppelin faster without having to build from source.

Also it’s worth mentioning that there is another R interpreter for Zeppelin produced by the folks at Data Layer. You can find instructions on how to use it here: https://github.com/datalayer/zeppelin-R.

Try out both interpreters and share your experiences in the comments section below.

Moving Ahead

As a follow-up to this post, We shall see how to use Apache Spark (especially SparkR) within Zeppelin in the next blog post.

Updates

[11/24/2015]: Since several people were running into errors while building Zeppelin, I tried it out again on a new clean machine. I ran into the same errors and that’s when I discovered that you need to install the following packages first before building Zeppelin-RInterpreter.

install.packages("evaluate", dependencies = TRUE)
install.packages("base64enc", dependencies = TRUE)

After installing these packages, the build process worked without any errors. Also make sure you have access to the launch scripts using chmod u+x command.

Also I noticed that you need the “repr” package to be able to view R output. Install it from github using the “devtools” package.

install.packages(“devtools”, dependencies = TRUE)
install_github('IRkernel/repr')

[11/27/2015]: The” flights” dataset can be found here: https://github.com/SparkIQ-Labs/Demos/tree/master/datasets

[12/06/2015]: The notebook used in this demo and other notebooks can be found here: https://github.com/SparkIQ-Labs/Demos/tree/master/zeppelin-notebooks

 

42 thoughts on “Interactive Data Science with R in Apache Zeppelin Notebook

    1. Thanks Alex. I actually have a Zeppelin Hub (beta) account. Let me push the notebooks online on github and I will add an “Updates” section in the blog above with the link.

      Like

      Reply
  1. Hi,

    getting the below error. Can you please help:

    mvn clean package -DskipTests
    Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
    [INFO] Scanning for projects…
    [INFO] ————————————————————————
    [ERROR] FATAL ERROR
    [INFO] ————————————————————————
    [INFO] Error building POM (may not be this project’s POM).

    Project ID: unknown
    POM Location: /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/spark-dependencies/pom.xml

    Reason: Parse error reading POM. Reason: unexpected character in markup < (position: END_TAG seen …\n<<… @479:3) for project unknown at /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/spark-dependencies/pom.xml

    Like

    Reply
  2. Wade

    Great post. Have you been able to successfully build on a Windows box using cygwin as hinted in your post? I have tried numerous times (from the main Zeppelin page) and it fails on Windows. I’ve seen posts from Madhuka that it’s possible, but haven’t seen the secret recipe yet.

    When I try @elbamos’s rinterpreter branch (nov 17 2015), it throws non-parseable POM errors, so doesn’t even get to building.

    Thoughts would be welcome.

    Like

    Reply
    1. Hi Wade,
      I have been meaning to try it out on Windows OS but have not yet. I will do so later this evening and let you know & then I will include an “Updates” section at the end of the blog.

      Like

      Reply
  3. Amos Elb

    Hi all. I’m the author of the rinterpreter for zeppelin.

    The build problem you’re having is because of a mistake I made when rebasing last night. It should be fixed now. Your best bet for building is to specify the Spark version also, like this:
    mvn package install -Pspark-1.5 -DskipTests

    The reason is that if there’s a different spark version between build and runtime, you’ll get very odd errors about spark dependency libraries like akka, but you won’t see them until you try to actually execute SparkR commands. And this happens often because the Zeppelin build process downloads new jars for whatever spark version you specify at build, even if you already have a working spark installed.

    Please let me know if the build doesn’t work or you have any other problems. You can contact me directly, and I’ll try to monitor the comments here.

    I just attempted to make this comment and it didn’t appear… hope this doesn’t turn into a double-comment…

    Like

    Reply
    1. Wade

      Hi Amos,

      Thanks for updating. Just tried again, seeing error below during build of R Interpreter. Here’s what I tried (Windows box using cygwin/babun).

      { software } » git clone -b rinterpreter https://github.com/elbamos/incubator-zeppelin.git
      { software } » cd incubator-zeppelin
      { incubator-zeppelin } rinterpreter » mvn package install -Pspark-1.5 -DskipTests

      [INFO] Zeppelin ……………………………………. SUCCESS [02:28 min]
      [INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 13.420 s]
      [INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 8.096 s]
      [INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [ 55.363 s]
      [INFO] Zeppelin: Spark ……………………………… SUCCESS [ 29.736 s]
      [INFO] Zeppelin: R Interpreter ………………………. FAILURE [ 16.191 s]
      [INFO] Zeppelin: Markdown interpreter ………………… SKIPPED
      [INFO] Zeppelin: Angular interpreter …………………. SKIPPED
      [INFO] Zeppelin: Shell interpreter …………………… SKIPPED
      [INFO] Zeppelin: Hive interpreter ……………………. SKIPPED
      [INFO] Zeppelin: Apache Phoenix Interpreter …………… SKIPPED
      [INFO] Zeppelin: PostgreSQL interpreter ………………. SKIPPED
      [INFO] Zeppelin: Tajo interpreter ……………………. SKIPPED
      [INFO] Zeppelin: Flink ……………………………… SKIPPED
      [INFO] Zeppelin: Apache Ignite interpreter ……………. SKIPPED
      [INFO] Zeppelin: Kylin interpreter …………………… SKIPPED
      [INFO] Zeppelin: Lens interpreter ……………………. SKIPPED
      [INFO] Zeppelin: Cassandra ………………………….. SKIPPED
      [INFO] Zeppelin: web Application …………………….. SKIPPED
      [INFO] Zeppelin: Server …………………………….. SKIPPED
      [INFO] Zeppelin: Packaging distribution ………………. SKIPPED
      [INFO] ————————————————————————
      [INFO] BUILD FAILURE
      [INFO] ————————————————————————
      [INFO] Total time: 04:31 min
      [INFO] Finished at: 2015-11-18T12:24:00-05:00
      [INFO] Final Memory: 103M/1072M
      [INFO] ————————————————————————
      [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default) on project zeppelin-zrinterpreter: Command execution failed. Cannot run program “R\install-dev.sh” (in directory “C:\xxxx\software\incubator-zeppelin\r”): CreateProcess error=2, The system cannot find the file specified -> [Help 1]

      Like

      Reply
      1. Amos Elb

        Wade — Its likely that this relates to Windows, which I don’t have a way to test since I don’t have a Windows machine. Is it possible for us to communicate directly? I’d like to track this down and resolve any other Windows options permanently.

        Like

      1. Amos

        Yes. The only way to run Zeppelin on Windows that I’m aware of is docker. Since Zeppelin will not run on Windows, the R-interpreter will not run on Windows. This is a Zeppelin limitation.

        Like

  4. Still getting error:

    mvn clean package -DskipTests
    Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
    [INFO] Scanning for projects…
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-spark:jar:0.6.0-incubating-SNAPSHOT
    [WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 416, column 15
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-zrinterpreter:jar:0.6.0-incubating-SNAPSHOT
    [WARNING] ‘dependencies.dependency.(groupId:artifactId:type:classifier)’ must be unique: org.apache.spark:spark-core_${scala.binary.version}:jar -> duplicate declaration of version ${spark.version} @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 137, column 21
    [WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 314, column 21
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-flink:jar:0.6.0-incubating-SNAPSHOT
    [WARNING] ‘build.plugins.plugin.(groupId:artifactId)’ must be unique but found duplicate declaration of plugin org.apache.maven.plugins:maven-dependency-plugin @ line 349, column 15
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-cassandra:jar:0.6.0-incubating-SNAPSHOT
    [WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 174, column 21
    [WARNING]
    [WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
    [WARNING]
    [WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
    [WARNING]
    [INFO] ————————————————————————
    [INFO] Reactor Build Order:
    [INFO]
    [INFO] Zeppelin
    [INFO] Zeppelin: Interpreter
    [INFO] Zeppelin: Zengine
    [INFO] Zeppelin: Spark dependencies
    [INFO] Zeppelin: Spark
    [INFO] Zeppelin: R Interpreter
    [INFO] Zeppelin: Markdown interpreter
    [INFO] Zeppelin: Angular interpreter
    [INFO] Zeppelin: Shell interpreter
    [INFO] Zeppelin: Hive interpreter
    [INFO] Zeppelin: Apache Phoenix Interpreter
    [INFO] Zeppelin: PostgreSQL interpreter
    [INFO] Zeppelin: Tajo interpreter
    [INFO] Zeppelin: Flink
    [INFO] Zeppelin: Apache Ignite interpreter
    [INFO] Zeppelin: Kylin interpreter
    [INFO] Zeppelin: Lens interpreter
    [INFO] Zeppelin: Cassandra
    [INFO] Zeppelin: web Application
    [INFO] Zeppelin: Server
    [INFO] Zeppelin: Packaging distribution
    [INFO]
    [INFO] ————————————————————————
    [INFO] Building Zeppelin 0.6.0-incubating-SNAPSHOT
    [INFO] ————————————————————————
    [INFO]
    [INFO] — maven-clean-plugin:2.6.1:clean (default-clean) @ zeppelin —
    [INFO] Deleting /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/target
    [INFO]
    [INFO] — maven-checkstyle-plugin:2.13:check (checkstyle-fail-build) @ zeppelin —
    [INFO]
    [INFO]
    [INFO] — maven-resources-plugin:2.7:copy-resources (copy-resources) @ zeppelin —
    [INFO] Using ‘UTF-8’ encoding to copy filtered resources.
    [INFO] skip non existing resourceDirectory /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/../_tools/site
    [INFO]
    [INFO] — maven-enforcer-plugin:1.3.1:enforce (enforce) @ zeppelin —
    [INFO]
    [INFO] — maven-remote-resources-plugin:1.4:process (default) @ zeppelin —
    [INFO]
    [INFO] — maven-dependency-plugin:2.8:copy-dependencies (copy-dependencies) @ zeppelin —
    [INFO]
    [INFO] — maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ zeppelin —
    [INFO]
    [INFO] ————————————————————————
    [INFO] Building Zeppelin: Interpreter 0.6.0-incubating-SNAPSHOT
    [INFO] ————————————————————————
    Downloading: https://repo.maven.apache.org/maven2/org/apache/thrift/libthrift/0.9.2/libthrift-0.9.2.jar

    Like

    Reply
    1. Amos Elb

      Ganesh – Sorry about that – it should be fixed now.

      If no-one noticed, this push was a little rushed…

      Anyway, should be fixed now. Would appreciate if you’d confirm.

      Like

      Reply
  5. Issue still persist with the latest build or changes. Pasting the error below again

    mvn clean package -DskipTests
    Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
    [INFO] Scanning for projects…
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-spark:jar:0.6.0-incubating-SNAPSHOT
    [WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 416, column 15
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-zrinterpreter:jar:0.6.0-incubating-SNAPSHOT
    [WARNING] ‘dependencies.dependency.(groupId:artifactId:type:classifier)’ must be unique: org.apache.spark:spark-core_${scala.binary.version}:jar -> duplicate declaration of version ${spark.version} @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 137, column 21
    [WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 314, column 21
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-flink:jar:0.6.0-incubating-SNAPSHOT
    [WARNING] ‘build.plugins.plugin.(groupId:artifactId)’ must be unique but found duplicate declaration of plugin org.apache.maven.plugins:maven-dependency-plugin @ line 349, column 15
    [WARNING]
    [WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-cassandra:jar:0.6.0-incubating-SNAPSHOT
    [WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 174, column 21
    [WARNING]
    [WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
    [WARNING]
    [WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
    [WARNING]
    [INFO] ————————————————————————

    Like

    Reply
    1. Amos Elb

      Ganesh that doesn’t look like an error. It looks like warnings. It looks like the build should have gone on to complete. Can you email me through the git or provide an e-mail address and I’ll try to contact you directly?

      Like

      Reply
  6. Got the error finally after waiting for 30 minutes:

    [INFO] ————————————————————————
    [INFO] BUILD FAILURE
    [INFO] ————————————————————————
    [INFO] Total time: 30:26 min
    [INFO] Finished at: 2015-11-23T09:57:46+05:30
    [INFO] Final Memory: 31M/247M
    [INFO] ————————————————————————
    [ERROR] Failed to execute goal on project zeppelin-interpreter: Could not resolve dependencies for project org.apache.zeppelin:zeppelin-interpreter:jar:0.6.0-incubating-SNAPSHOT: Could not transfer artifact org.apache.thrift:libthrift:jar:0.9.2 from/to central (https://repo.maven.apache.org/maven2): Timeout while waiting for concurrent download of /home/ganesh/.m2/repository/org/apache/thrift/libthrift/0.9.2/libthrift-0.9.2.jar.part to progress -> [Help 1]
    [ERROR]
    [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR]
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
    [ERROR]
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR] mvn -rf :zeppelin-interpreter

    Like

    Reply
    1. Amos

      Ganesh that isn’t a problem with the R interpreter. That’s core vanilla Zeppelin failing to find its dependencies.

      Can you contact me directly and I’ll try to help?

      Like

      Reply
      1. Since several people were running into errors while building Zeppelin, I tried it out again on a new clean machine. I ran into the same errors and that’s when I discovered that you need to install the following packages first before building Zeppelin-RInterpreter.
        install.packages(“evaluate”, dependencies = TRUE)
        install.packages(“base64enc”, dependencies = TRUE)
        sudo apt-get install libcurl4-openssl-dev
        sudo apt-get install libxml2-dev
        install.packages(“devtools”, dependencies = TRUE)
        install_github(‘IRkernel/repr’)

        After installing these packages, the build process worked without any errors. Also make sure you have access to the launch scripts using chmod u+x command.

        Like

      1. Laurent

        Hi Amos,

        I’ve been trying for a while, including all of the comments (including specifying spark version) in the process, but keep running to failure for the R interpreter part.
        I’ve moved it to the end of the process so the rest would actually work. Here’s what I get:

        [INFO] Reactor Summary:
        [INFO]
        [INFO] Zeppelin ……………………………………. SUCCESS [02:18 min]
        [INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 26.050 s]
        [INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 14.347 s]
        [INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [02:14 min]
        [INFO] Zeppelin: Spark ……………………………… SUCCESS [ 31.050 s]
        [INFO] Zeppelin: web Application …………………….. SUCCESS [03:21 min]
        [INFO] Zeppelin: Server …………………………….. SUCCESS [ 33.539 s]
        [INFO] Zeppelin: Packaging distribution ………………. SUCCESS [ 2.694 s]
        [INFO] Zeppelin: Markdown interpreter ………………… SUCCESS [ 3.287 s]
        [INFO] Zeppelin: Angular interpreter …………………. SUCCESS [ 3.002 s]
        [INFO] Zeppelin: Shell interpreter …………………… SUCCESS [ 3.132 s]
        [INFO] Zeppelin: Hive interpreter ……………………. SUCCESS [ 8.717 s]
        [INFO] Zeppelin: Apache Phoenix Interpreter …………… SUCCESS [ 17.422 s]
        [INFO] Zeppelin: PostgreSQL interpreter ………………. SUCCESS [ 6.395 s]
        [INFO] Zeppelin: Tajo interpreter ……………………. SUCCESS [ 5.077 s]
        [INFO] Zeppelin: Flink ……………………………… SUCCESS [ 15.353 s]
        [INFO] Zeppelin: Apache Ignite interpreter ……………. SUCCESS [ 4.958 s]
        [INFO] Zeppelin: Kylin interpreter …………………… SUCCESS [ 2.934 s]
        [INFO] Zeppelin: Lens interpreter ……………………. SUCCESS [ 9.202 s]
        [INFO] Zeppelin: Cassandra ………………………….. SUCCESS [ 23.548 s]
        [INFO] Zeppelin: R Interpreter ………………………. FAILURE [ 2.472 s]
        [INFO] ————————————————————————
        [INFO] BUILD FAILURE
        [INFO] ————————————————————————
        [INFO] Total time: 11:29 min
        [INFO] Finished at: 2015-11-27T14:30:14+01:00
        [INFO] Final Memory: 89M/325M
        [INFO] ————————————————————————
        [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default) on project zeppelin-zrinterpreter: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
        [ERROR]
        [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
        [ERROR] Re-run Maven using the -X switch to enable full debug logging.
        [ERROR]
        [ERROR] For more information about the errors and possible solutions, please read the following articles:
        [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
        [ERROR]
        [ERROR] After correcting the problems, you can resume the build with the command
        [ERROR] mvn -rf :zeppelin-zrinterpreter

        I don’t understand where to type Emaasit’s commands, I receive ” syntax error near unexpected token `”evaluate” ” in the terminal.

        Like

      2. Amos

        That part of the error log just says there was an error it doesn’t say what the error was. That’s earlier.

        Anyway you shouldn’t need to change the order when things install.

        Please try it from a fresh clone of the git and contact me directly by email with the whole install log if it fails.

        Like

  7. Ramulu

    Am using CentOS Apache hadoop and spark cluster, Maven 3.3.3, npm installed.
    The installation fails at R interpreter, I tried all the above steps.

    Like

    Reply
    1. Since several people were running into errors while building Zeppelin, I tried it out again on a new clean machine. I ran into the same errors and that’s when I discovered that you need to install the following packages first before building Zeppelin-RInterpreter.
      install.packages(“evaluate”, dependencies = TRUE)
      install.packages(“base64enc”, dependencies = TRUE)
      sudo apt-get install libcurl4-openssl-dev
      sudo apt-get install libxml2-dev
      install.packages(“devtools”, dependencies = TRUE)
      install_github(‘IRkernel/repr’)

      After installing these packages, the build process worked without any errors. Also make sure you have access to the launch scripts using chmod u+x command.

      Like

      Reply
      1. Amos

        Thanks!

        The only one of those that should be required is evaluate. I’ll look into the others. I’m also going to put up some documentation this week.

        Like

      2. worked fine until it started to download HBase. Below is the error. How did you find out the dependencies? If yu can suggest how to identify dependencies, I can find it and update it accordingly here

        Downloading: https://repo.maven.apache.org/maven2/org/apache/phoenix/phoenix/4.4.0-HBase-1.0/phoenix-4.4.0-HBase-1.0.pom
        [INFO] ————————————————————————
        [INFO] Reactor Summary:
        [INFO]
        [INFO] Zeppelin ……………………………………. SUCCESS [ 36.633 s]
        [INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 46.716 s]
        [INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 13.824 s]
        [INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [01:50 min]
        [INFO] Zeppelin: Spark ……………………………… SUCCESS [03:15 min]
        [INFO] Zeppelin: R Interpreter ………………………. SUCCESS [10:28 min]
        [INFO] Zeppelin: Markdown interpreter ………………… SUCCESS [ 1.335 s]
        [INFO] Zeppelin: Angular interpreter …………………. SUCCESS [ 1.075 s]
        [INFO] Zeppelin: Shell interpreter …………………… SUCCESS [ 1.195 s]
        [INFO] Zeppelin: Hive interpreter ……………………. SUCCESS [ 9.754 s]
        [INFO] Zeppelin: Apache Phoenix Interpreter …………… FAILURE [32:31 min]
        [INFO] Zeppelin: PostgreSQL interpreter ………………. SKIPPED
        [INFO] Zeppelin: Tajo interpreter ……………………. SKIPPED
        [INFO] Zeppelin: Flink ……………………………… SKIPPED
        [INFO] Zeppelin: Apache Ignite interpreter ……………. SKIPPED
        [INFO] Zeppelin: Kylin interpreter …………………… SKIPPED
        [INFO] Zeppelin: Lens interpreter ……………………. SKIPPED
        [INFO] Zeppelin: Cassandra ………………………….. SKIPPED
        [INFO] Zeppelin: web Application …………………….. SKIPPED
        [INFO] Zeppelin: Server …………………………….. SKIPPED
        [INFO] Zeppelin: Packaging distribution ………………. SKIPPED
        [INFO] ————————————————————————
        [INFO] BUILD FAILURE
        [INFO] ————————————————————————
        [INFO] Total time: 50:00 min
        [INFO] Finished at: 2015-11-25T20:54:22+05:30
        [INFO] Final Memory: 81M/679M
        [INFO] ————————————————————————

        [ERROR] Failed to read artifact descriptor for org.apache.phoenix:phoenix-core:jar:4.4.0-HBase-1.0: Could not transfer artifact org.apache.phoenix:phoenix:pom:4.4.0-HBase-1.0 from/to central (https://repo.maven.apache.org/maven2):: Read timed out -> [Help 1]
        [ERROR]
        [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
        [ERROR] Re-run Maven using the -X switch to enable full debug logging.
        [ERROR]
        [ERROR] For more information about the errors and possible solutions, please read the following articles:
        [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
        [ERROR]
        [ERROR] After correcting the problems, you can resume the build with the command
        [ERROR] mvn -rf :zeppelin-phoenix

        Like

  8. Robert

    This is super cool. I was trying to replicate the steps, but when I mvn clean package -DskipTests, I got:

    workspace/zeppelin/Zeppelin-With-R/r/src/main/scala/org/apache/zeppelin/rinterpreter/RContext.scala:157: error: value isSparkRSupported is not a member of org.apache.zeppelin.spark.SparkVersion
    [INFO] if (!intp.getSparkVersion().isSparkRSupported) throw new RuntimeException(“SparkR requires Spark 1.4 or later”)
    [INFO] ^
    [ERROR] one error found

    It seems like there is some issue with getting SparkVersion, but I am not sure how to fix it.

    Like

    Reply
    1. Amos

      If you look here: https://github.com/elbamos/Zeppelin-With-R/blob/rinterpreter-0.5.5/spark/src/main/java/org/apache/zeppelin/spark/SparkVersion.java

      You’ll see that `isSparkRSupported` is *definitely* part of the code.

      The most likely suspect, is that you’re trying to install this on top of an existing Zeppelin source directory, or otherwise not starting from a clean slate.

      Can you try again, cloning into an empty directory? Thanks

      (Also, the git is definitely the place for posting issues, so we don’t divert Daniel’s blog.)

      Like

      Reply
  9. m azahari

    Hi,

    I need some help here. I tried twice successfully but R was missing in both attempts. I have installed all required R packages as indicated but R is missing

    [INFO] Zeppelin ……………………………………. SUCCESS [ 9.399 s]
    [INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 25.896 s]
    [INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 10.436 s]
    [INFO] Zeppelin: Display system apis …………………. SUCCESS [ 21.377 s]
    [INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [ 51.776 s]
    [INFO] Zeppelin: Spark ……………………………… SUCCESS [ 46.444 s]
    [INFO] Zeppelin: Markdown interpreter ………………… SUCCESS [ 1.403 s]
    [INFO] Zeppelin: Angular interpreter …………………. SUCCESS [ 1.105 s]
    [INFO] Zeppelin: Shell interpreter …………………… SUCCESS [ 0.625 s]
    [INFO] Zeppelin: Hive interpreter ……………………. SUCCESS [ 4.718 s]
    [INFO] Zeppelin: HBase interpreter …………………… SUCCESS [ 8.663 s]
    [INFO] Zeppelin: Apache Phoenix Interpreter …………… SUCCESS [ 9.917 s]
    [INFO] Zeppelin: PostgreSQL interpreter ………………. SUCCESS [ 1.199 s]
    [INFO] Zeppelin: JDBC interpreter ……………………. SUCCESS [ 0.732 s]
    [INFO] Zeppelin: Tajo interpreter ……………………. SUCCESS [ 2.253 s]
    [INFO] Zeppelin: Flink ……………………………… SUCCESS [ 23.066 s]
    [INFO] Zeppelin: Apache Ignite interpreter ……………. SUCCESS [ 2.542 s]
    [INFO] Zeppelin: Kylin interpreter …………………… SUCCESS [ 2.075 s]
    [INFO] Zeppelin: Lens interpreter ……………………. SUCCESS [ 8.047 s]
    [INFO] Zeppelin: Cassandra ………………………….. SUCCESS [01:41 min]
    [INFO] Zeppelin: Elasticsearch interpreter ……………. SUCCESS [ 7.917 s]
    [INFO] Zeppelin: Tachyon interpreter …………………. SUCCESS [ 3.348 s]
    [INFO] Zeppelin: web Application …………………….. SUCCESS [01:53 min]
    [INFO] Zeppelin: Server …………………………….. SUCCESS [ 19.168 s]
    [INFO] Zeppelin: Packaging distribution ………………. SUCCESS [ 1.549 s]
    [INFO] ————————————————————————
    [INFO] BUILD SUCCESS

    Like

    Reply
  10. m azahari

    redownload from github and tried again. This time I’m getting this error

    [INFO] Zeppelin ……………………………………. SUCCESS [ 4.451 s]
    [INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 8.033 s]
    [INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 4.681 s]
    [INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [ 42.967 s]
    [INFO] Zeppelin: Spark ……………………………… SUCCESS [ 37.819 s]
    [INFO] Zeppelin: R Interpreter ………………………. FAILURE [ 35.534 s]
    [INFO] Zeppelin: Markdown interpreter ………………… SKIPPED
    [INFO] Zeppelin: Angular interpreter …………………. SKIPPED
    [INFO] Zeppelin: Shell interpreter …………………… SKIPPED
    [INFO] Zeppelin: Hive interpreter ……………………. SKIPPED
    [INFO] Zeppelin: Apache Phoenix Interpreter …………… SKIPPED
    [INFO] Zeppelin: PostgreSQL interpreter ………………. SKIPPED
    [INFO] Zeppelin: Tajo interpreter ……………………. SKIPPED
    [INFO] Zeppelin: Flink ……………………………… SKIPPED
    [INFO] Zeppelin: Apache Ignite interpreter ……………. SKIPPED
    [INFO] Zeppelin: Kylin interpreter …………………… SKIPPED
    [INFO] Zeppelin: Lens interpreter ……………………. SKIPPED
    [INFO] Zeppelin: Cassandra ………………………….. SKIPPED
    [INFO] Zeppelin: Elasticsearch interpreter ……………. SKIPPED
    [INFO] Zeppelin: web Application …………………….. SKIPPED
    [INFO] Zeppelin: Server …………………………….. SKIPPED
    [INFO] Zeppelin: Packaging distribution ………………. SKIPPED
    [INFO] ————————————————————————
    [INFO] BUILD FAILURE
    [INFO] ————————————————————————
    [INFO] Total time: 02:14 min
    [INFO] Finished at: 2016-03-10T00:51:50-08:00
    [INFO] Final Memory: 107M/301M
    [INFO] ————————————————————————
    [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default) on project zeppelin-zrinterpreter: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
    [ERROR]
    [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR]
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
    [ERROR]
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR] mvn -rf :zeppelin-zrinterpreter

    what is the problem?

    Like

    Reply
  11. azeltov

    I ran into this issue: com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name ‘id’ (in class org.apache.spark.rdd.RDDOperationScope)?

    I got everything built successfully, but when trying the r sample notebook getting this error. Tried it on some of my regular pyspark code that i know works in regular zeppelin and getting the same error. I build the zeppelin using this command:

    mvn package install -Pspark-1.6 -Phadoop-2.4 -DskipTests

    Appreciate your assistance!

    data(faithful)

    faith <- createDataFrame(sqlContext, faithful)

    data(faithful)

    faith <- createDataFrame(sqlContext, faithful)

    <simpleError in invokeJava(isStatic = TRUE, className, methodName, …): com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
    at [Source: {"id":"0","name":"parallelize"}; line: 1, column: 1]
    at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)

    Thanks!

    Alex

    Like

    Reply
    1. Amos

      It looks like you’re trying to use a version of spark other than the one Zeppelin was compiled with.

      Please open an issue on my git if you can’t fix this by recompiling.

      Like

      Reply

Leave a comment