Interactive Data Science with R in Apache Zeppelin Notebook

Introduction

The objective of this blog post is to help you get started with Apache Zeppelin notebook for your R data science requirements. Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with Scala(with Apache Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown, Shell and more.

However, the latest official release, version 0.5.0, does not yet support the R programming language. Fortunately NFLabs, the company driving this open source project, pointed me to this pull request that provides an R Interpreter. An Interpreter is a plug-in which enables zeppelin users to use a specific language/data-processing-backend. For example to use scala code in Zeppelin, you need a spark interpreter. So, if you are impatient like I am for R-integration into Zeppelin, this tutorial will show you how to setup Zeppelin for use with R by building from source.

Prerequisites

We will launch Zeppelin through Bash shell on Linux. If you are using Windows OS I recommend that you install and use the Cygwin terminal (It provides functionality similar to a Linux distribution on Windows).
Make sure Java 1.7 and Maven 3.2.x are installed on your host machine and their environment variables are set.

Build Zeppelin from Source

Step 1: Download Zeppelin Source Code

Go to this github branch and download the source code. Alternatively copy and paste this link into your web browser: https://github.com/elbamos/incubator-zeppelin/tree/rinterpreter

In my case I have downloaded and unzipped the folder onto my Desktop

Step 2: Build Zeppelin

Run the following code in your terminal to build zeppelin on your host machine in local mode. If you are installing on a cluster then add these options found in the Zeppelin documentation.

$ cd Desktop/Apache/incubator-zeppelin-rinterpreter

$ mvn clean package -DskipTests

This will take around 16 minutes to build zeppelin, Spark and all interpreters including R, Markdown, Hive, Shell, and others. (as shown in the image below).

Step 3: Start Zeppelin

Run the following command to start zeppelin.

$ ./bin/zeppelin-daemon.sh start

Go to localhost on your web browser and listen on port 8080. (i.e. http://localhost:8080). At this point you are ready to start creating interactive notebooks with code and graphs in Zeppelin.

Interactive Data Science

Step 1: Create a Notebook

Click the dropdown arrow next to the “Notebook” page and click “Create new note”.

Give your notebook a name or you can use the assigned default name. I named mine “Base R in Apache Zeppelin”.

Step 2: Start your Analysis

To use R, use the “%spark.r” or “%spark.knitr” tags as shown in the images below. First let’s use markdown to write some instruction text.

Now let’s install some packages that we may need for our analysis.

Now let’s read in our data set. We shall use the “flights” dataset which shows flights departing New York in 2013.

Now let’s do some data manipulation using dplyr (with the pipe operator)

You can also use bar graphs and pie charts to visualize some descriptive statistics from your data.

Now let’s do some data exploration with ggplot2

Now let’s do some statistical machine learning using the caret package.

How about creating some maps.

Final Remarks

Zeppelin allows you to create interactive documents with beautiful graphs using multiple programming languages. The objective of this post was to help you setup Zeppelin for use with the R programming language. Hopefully the Project Management Committees (PMC) of this wonderful open source project can release the next version with an R interpreter. It will surely make it easier to launch Zeppelin faster without having to build from source.

Also it’s worth mentioning that there is another R interpreter for Zeppelin produced by the folks at Data Layer. You can find instructions on how to use it here: https://github.com/datalayer/zeppelin-R.

Try out both interpreters and share your experiences in the comments section below.

Moving Ahead

As a follow-up to this post, We shall see how to use Apache Spark (especially SparkR) within Zeppelin in the next blog post.

Updates

[11/24/2015]: Since several people were running into errors while building Zeppelin, I tried it out again on a new clean machine. I ran into the same errors and that’s when I discovered that you need to install the following packages first before building Zeppelin-RInterpreter.

install.packages("evaluate", dependencies = TRUE) install.packages("base64enc", dependencies = TRUE)

After installing these packages, the build process worked without any errors. Also make sure you have access to the launch scripts using chmod u+x command.

Also I noticed that you need the “repr” package to be able to view R output. Install it from github using the “devtools” package.

install.packages(“devtools”, dependencies = TRUE) install_github('IRkernel/repr')

[11/27/2015]: The” flights” dataset can be found here: https://github.com/SparkIQ-Labs/Demos/tree/master/datasets

[12/06/2015]: The notebook used in this demo and other notebooks can be found here: https://github.com/SparkIQ-Labs/Demos/tree/master/zeppelin-notebooks

42 thoughts on “Interactive Data Science with R in Apache Zeppelin Notebook”

Pingback: Distilled News | Data Analytics & R
Pingback: Interactive Data Science with R in Apache Zeppelin Notebook | Mubashir Qasim
alex

November 17, 2015 at 2:29 am

Awesome post, thank you for sharing!
Would you be willing to post the notebook with R example i.e on github and including a link in the article?
Also, you might use http://zeppelinhub.com/viewer for a preview

LikeLike
Reply
1. emaasit
  
  November 17, 2015 at 11:07 am
  
  Thanks Alex. I actually have a Zeppelin Hub (beta) account. Let me push the notebooks online on github and I will add an “Updates” section in the blog above with the link.
  
  LikeLike
  Reply
ganeshbhat3

November 17, 2015 at 2:19 am

Hi,

getting the below error. Can you please help:

mvn clean package -DskipTests
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
[INFO] Scanning for projects…
[INFO] ————————————————————————
[ERROR] FATAL ERROR
[INFO] ————————————————————————
[INFO] Error building POM (may not be this project’s POM).

Project ID: unknown
POM Location: /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/spark-dependencies/pom.xml

Reason: Parse error reading POM. Reason: unexpected character in markup < (position: END_TAG seen …\n<<… @479:3) for project unknown at /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/spark-dependencies/pom.xml

LikeLike
Reply
Wade

November 17, 2015 at 6:25 pm

Great post. Have you been able to successfully build on a Windows box using cygwin as hinted in your post? I have tried numerous times (from the main Zeppelin page) and it fails on Windows. I’ve seen posts from Madhuka that it’s possible, but haven’t seen the secret recipe yet.

When I try @elbamos’s rinterpreter branch (nov 17 2015), it throws non-parseable POM errors, so doesn’t even get to building.

Thoughts would be welcome.

LikeLike
Reply
1. emaasit
  
  November 17, 2015 at 10:32 am
  
  Hi Wade,
  I have been meaning to try it out on Windows OS but have not yet. I will do so later this evening and let you know & then I will include an “Updates” section at the end of the blog.
  
  LikeLike
  Reply
Amos Elb

November 17, 2015 at 11:07 am

Hi all. I’m the author of the rinterpreter for zeppelin.

The build problem you’re having is because of a mistake I made when rebasing last night. It should be fixed now. Your best bet for building is to specify the Spark version also, like this:
mvn package install -Pspark-1.5 -DskipTests

The reason is that if there’s a different spark version between build and runtime, you’ll get very odd errors about spark dependency libraries like akka, but you won’t see them until you try to actually execute SparkR commands. And this happens often because the Zeppelin build process downloads new jars for whatever spark version you specify at build, even if you already have a working spark installed.

Please let me know if the build doesn’t work or you have any other problems. You can contact me directly, and I’ll try to monitor the comments here.

I just attempted to make this comment and it didn’t appear… hope this doesn’t turn into a double-comment…

LikeLike
Reply
1. Wade
  
  November 18, 2015 at 9:33 am
  
  Hi Amos,
  
  Thanks for updating. Just tried again, seeing error below during build of R Interpreter. Here’s what I tried (Windows box using cygwin/babun).
  
  { software } » git clone -b rinterpreter https://github.com/elbamos/incubator-zeppelin.git
  { software } » cd incubator-zeppelin
  { incubator-zeppelin } rinterpreter » mvn package install -Pspark-1.5 -DskipTests
  
  [INFO] Zeppelin ……………………………………. SUCCESS [02:28 min]
  [INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 13.420 s]
  [INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 8.096 s]
  [INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [ 55.363 s]
  [INFO] Zeppelin: Spark ……………………………… SUCCESS [ 29.736 s]
  [INFO] Zeppelin: R Interpreter ………………………. FAILURE [ 16.191 s]
  [INFO] Zeppelin: Markdown interpreter ………………… SKIPPED
  [INFO] Zeppelin: Angular interpreter …………………. SKIPPED
  [INFO] Zeppelin: Shell interpreter …………………… SKIPPED
  [INFO] Zeppelin: Hive interpreter ……………………. SKIPPED
  [INFO] Zeppelin: Apache Phoenix Interpreter …………… SKIPPED
  [INFO] Zeppelin: PostgreSQL interpreter ………………. SKIPPED
  [INFO] Zeppelin: Tajo interpreter ……………………. SKIPPED
  [INFO] Zeppelin: Flink ……………………………… SKIPPED
  [INFO] Zeppelin: Apache Ignite interpreter ……………. SKIPPED
  [INFO] Zeppelin: Kylin interpreter …………………… SKIPPED
  [INFO] Zeppelin: Lens interpreter ……………………. SKIPPED
  [INFO] Zeppelin: Cassandra ………………………….. SKIPPED
  [INFO] Zeppelin: web Application …………………….. SKIPPED
  [INFO] Zeppelin: Server …………………………….. SKIPPED
  [INFO] Zeppelin: Packaging distribution ………………. SKIPPED
  [INFO] ————————————————————————
  [INFO] BUILD FAILURE
  [INFO] ————————————————————————
  [INFO] Total time: 04:31 min
  [INFO] Finished at: 2015-11-18T12:24:00-05:00
  [INFO] Final Memory: 103M/1072M
  [INFO] ————————————————————————
  [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default) on project zeppelin-zrinterpreter: Command execution failed. Cannot run program “R\install-dev.sh” (in directory “C:\xxxx\software\incubator-zeppelin\r”): CreateProcess error=2, The system cannot find the file specified -> [Help 1]
  
  LikeLike
  Reply
  1. Amos Elb
    
    November 18, 2015 at 1:17 pm
    
    Wade — Its likely that this relates to Windows, which I don’t have a way to test since I don’t have a Windows machine. Is it possible for us to communicate directly? I’d like to track this down and resolve any other Windows options permanently.
    
    LikeLike
2. Wade
  
  November 18, 2015 at 5:44 pm
  
  Hi Amos, I can assist with Windows issues. Will ping you via email.
  
  LikeLike
  Reply
  1. Fernando
    
    January 14, 2016 at 10:03 pm
    
    Hi, Wade and Amos.
    
    I’m getting the same error as Wade. Have you guys solved this issue?
    
    Thanks!
    
    LikeLike
  2. Amos
    
    January 14, 2016 at 10:17 pm
    
    Yes. The only way to run Zeppelin on Windows that I’m aware of is docker. Since Zeppelin will not run on Windows, the R-interpreter will not run on Windows. This is a Zeppelin limitation.
    
    LikeLike
Amos Elb

November 17, 2015 at 2:13 pm

Ganesh: I pushed code earlier today that should solve that issue. Please reclone the git and try again. Thanks.

LikeLike
Reply
ganeshbhat3

November 18, 2015 at 7:02 am

Still getting error:

mvn clean package -DskipTests
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
[INFO] Scanning for projects…
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-spark:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 416, column 15
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-zrinterpreter:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘dependencies.dependency.(groupId:artifactId:type:classifier)’ must be unique: org.apache.spark:spark-core_${scala.binary.version}:jar -> duplicate declaration of version ${spark.version} @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 137, column 21
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 314, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-flink:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.(groupId:artifactId)’ must be unique but found duplicate declaration of plugin org.apache.maven.plugins:maven-dependency-plugin @ line 349, column 15
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-cassandra:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 174, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ————————————————————————
[INFO] Reactor Build Order:
[INFO]
[INFO] Zeppelin
[INFO] Zeppelin: Interpreter
[INFO] Zeppelin: Zengine
[INFO] Zeppelin: Spark dependencies
[INFO] Zeppelin: Spark
[INFO] Zeppelin: R Interpreter
[INFO] Zeppelin: Markdown interpreter
[INFO] Zeppelin: Angular interpreter
[INFO] Zeppelin: Shell interpreter
[INFO] Zeppelin: Hive interpreter
[INFO] Zeppelin: Apache Phoenix Interpreter
[INFO] Zeppelin: PostgreSQL interpreter
[INFO] Zeppelin: Tajo interpreter
[INFO] Zeppelin: Flink
[INFO] Zeppelin: Apache Ignite interpreter
[INFO] Zeppelin: Kylin interpreter
[INFO] Zeppelin: Lens interpreter
[INFO] Zeppelin: Cassandra
[INFO] Zeppelin: web Application
[INFO] Zeppelin: Server
[INFO] Zeppelin: Packaging distribution
[INFO]
[INFO] ————————————————————————
[INFO] Building Zeppelin 0.6.0-incubating-SNAPSHOT
[INFO] ————————————————————————
[INFO]
[INFO] — maven-clean-plugin:2.6.1:clean (default-clean) @ zeppelin —
[INFO] Deleting /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/target
[INFO]
[INFO] — maven-checkstyle-plugin:2.13:check (checkstyle-fail-build) @ zeppelin —
[INFO]
[INFO]
[INFO] — maven-resources-plugin:2.7:copy-resources (copy-resources) @ zeppelin —
[INFO] Using ‘UTF-8’ encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/../_tools/site
[INFO]
[INFO] — maven-enforcer-plugin:1.3.1:enforce (enforce) @ zeppelin —
[INFO]
[INFO] — maven-remote-resources-plugin:1.4:process (default) @ zeppelin —
[INFO]
[INFO] — maven-dependency-plugin:2.8:copy-dependencies (copy-dependencies) @ zeppelin —
[INFO]
[INFO] — maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ zeppelin —
[INFO]
[INFO] ————————————————————————
[INFO] Building Zeppelin: Interpreter 0.6.0-incubating-SNAPSHOT
[INFO] ————————————————————————
Downloading: https://repo.maven.apache.org/maven2/org/apache/thrift/libthrift/0.9.2/libthrift-0.9.2.jar

LikeLike
Reply
1. Amos Elb
  
  November 18, 2015 at 1:38 pm
  
  Ganesh – Sorry about that – it should be fixed now.
  
  If no-one noticed, this push was a little rushed…
  
  Anyway, should be fixed now. Would appreciate if you’d confirm.
  
  LikeLike
  Reply
ganeshbhat3

November 18, 2015 at 9:00 pm

Issue still persist with the latest build or changes. Pasting the error below again

mvn clean package -DskipTests
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
[INFO] Scanning for projects…
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-spark:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 416, column 15
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-zrinterpreter:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘dependencies.dependency.(groupId:artifactId:type:classifier)’ must be unique: org.apache.spark:spark-core_${scala.binary.version}:jar -> duplicate declaration of version ${spark.version} @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 137, column 21
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 314, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-flink:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.(groupId:artifactId)’ must be unique but found duplicate declaration of plugin org.apache.maven.plugins:maven-dependency-plugin @ line 349, column 15
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-cassandra:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 174, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ————————————————————————

LikeLike
Reply
1. Amos Elb
  
  November 18, 2015 at 9:34 pm
  
  Ganesh that doesn’t look like an error. It looks like warnings. It looks like the build should have gone on to complete. Can you email me through the git or provide an e-mail address and I’ll try to contact you directly?
  
  LikeLike
  Reply
  1. ganeshbhat3
    
    November 22, 2015 at 7:51 pm
    
    Agreed. But the download does not go through. It is trying to download but does not download anything.
    Downloading: https://repo.maven.apache.org/maven2/org/apache/thrift/libthrift/0.9.2/libthrift-0.9.2.jar
    
    There is no internet issue as I am able to surf the internet and download stuff as well.
    
    My email address: ganeshbhat.3@gmail.com
    
    LikeLike
ganeshbhat3

November 22, 2015 at 8:40 pm

Got the error finally after waiting for 30 minutes:

[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 30:26 min
[INFO] Finished at: 2015-11-23T09:57:46+05:30
[INFO] Final Memory: 31M/247M
[INFO] ————————————————————————
[ERROR] Failed to execute goal on project zeppelin-interpreter: Could not resolve dependencies for project org.apache.zeppelin:zeppelin-interpreter:jar:0.6.0-incubating-SNAPSHOT: Could not transfer artifact org.apache.thrift:libthrift:jar:0.9.2 from/to central (https://repo.maven.apache.org/maven2): Timeout while waiting for concurrent download of /home/ganesh/.m2/repository/org/apache/thrift/libthrift/0.9.2/libthrift-0.9.2.jar.part to progress -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :zeppelin-interpreter

LikeLike
Reply
1. Amos
  
  November 24, 2015 at 9:59 am
  
  Ganesh that isn’t a problem with the R interpreter. That’s core vanilla Zeppelin failing to find its dependencies.
  
  Can you contact me directly and I’ll try to help?
  
  LikeLike
  Reply
  1. emaasit
    
    November 24, 2015 at 10:29 am
    
    Since several people were running into errors while building Zeppelin, I tried it out again on a new clean machine. I ran into the same errors and that’s when I discovered that you need to install the following packages first before building Zeppelin-RInterpreter.
    install.packages(“evaluate”, dependencies = TRUE)
    install.packages(“base64enc”, dependencies = TRUE)
    sudo apt-get install libcurl4-openssl-dev
    sudo apt-get install libxml2-dev
    install.packages(“devtools”, dependencies = TRUE)
    install_github(‘IRkernel/repr’)
    
    After installing these packages, the build process worked without any errors. Also make sure you have access to the launch scripts using chmod u+x command.
    
    LikeLike
Ramulu

November 24, 2015 at 7:17 am

It fails at R interpreter

LikeLike
Reply
1. Amos
  
  November 24, 2015 at 9:58 am
  
  Can you contact me directly and/or post an error log? It should compile fine on your system.
  
  LikeLike
  Reply
  1. Laurent
    
    November 27, 2015 at 5:35 am
    
    Hi Amos,
    
    I’ve been trying for a while, including all of the comments (including specifying spark version) in the process, but keep running to failure for the R interpreter part.
    I’ve moved it to the end of the process so the rest would actually work. Here’s what I get:
    
    [INFO] Reactor Summary:
    [INFO]
    [INFO] Zeppelin ……………………………………. SUCCESS [02:18 min]
    [INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 26.050 s]
    [INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 14.347 s]
    [INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [02:14 min]
    [INFO] Zeppelin: Spark ……………………………… SUCCESS [ 31.050 s]
    [INFO] Zeppelin: web Application …………………….. SUCCESS [03:21 min]
    [INFO] Zeppelin: Server …………………………….. SUCCESS [ 33.539 s]
    [INFO] Zeppelin: Packaging distribution ………………. SUCCESS [ 2.694 s]
    [INFO] Zeppelin: Markdown interpreter ………………… SUCCESS [ 3.287 s]
    [INFO] Zeppelin: Angular interpreter …………………. SUCCESS [ 3.002 s]
    [INFO] Zeppelin: Shell interpreter …………………… SUCCESS [ 3.132 s]
    [INFO] Zeppelin: Hive interpreter ……………………. SUCCESS [ 8.717 s]
    [INFO] Zeppelin: Apache Phoenix Interpreter …………… SUCCESS [ 17.422 s]
    [INFO] Zeppelin: PostgreSQL interpreter ………………. SUCCESS [ 6.395 s]
    [INFO] Zeppelin: Tajo interpreter ……………………. SUCCESS [ 5.077 s]
    [INFO] Zeppelin: Flink ……………………………… SUCCESS [ 15.353 s]
    [INFO] Zeppelin: Apache Ignite interpreter ……………. SUCCESS [ 4.958 s]
    [INFO] Zeppelin: Kylin interpreter …………………… SUCCESS [ 2.934 s]
    [INFO] Zeppelin: Lens interpreter ……………………. SUCCESS [ 9.202 s]
    [INFO] Zeppelin: Cassandra ………………………….. SUCCESS [ 23.548 s]
    [INFO] Zeppelin: R Interpreter ………………………. FAILURE [ 2.472 s]
    [INFO] ————————————————————————
    [INFO] BUILD FAILURE
    [INFO] ————————————————————————
    [INFO] Total time: 11:29 min
    [INFO] Finished at: 2015-11-27T14:30:14+01:00
    [INFO] Final Memory: 89M/325M
    [INFO] ————————————————————————
    [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default) on project zeppelin-zrinterpreter: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
    [ERROR]
    [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR]
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
    [ERROR]
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR] mvn -rf :zeppelin-zrinterpreter
    
    I don’t understand where to type Emaasit’s commands, I receive ” syntax error near unexpected token `”evaluate” ” in the terminal.
    
    LikeLike
  2. Amos
    
    November 27, 2015 at 9:20 am
    
    That part of the error log just says there was an error it doesn’t say what the error was. That’s earlier.
    
    Anyway you shouldn’t need to change the order when things install.
    
    Please try it from a fresh clone of the git and contact me directly by email with the whole install log if it fails.
    
    LikeLike
Ramulu

November 24, 2015 at 7:18 am

Am using CentOS Apache hadoop and spark cluster, Maven 3.3.3, npm installed.
The installation fails at R interpreter, I tried all the above steps.

LikeLike
Reply
1. emaasit
  
  November 24, 2015 at 10:26 am
  
  Since several people were running into errors while building Zeppelin, I tried it out again on a new clean machine. I ran into the same errors and that’s when I discovered that you need to install the following packages first before building Zeppelin-RInterpreter.
  install.packages(“evaluate”, dependencies = TRUE)
  install.packages(“base64enc”, dependencies = TRUE)
  sudo apt-get install libcurl4-openssl-dev
  sudo apt-get install libxml2-dev
  install.packages(“devtools”, dependencies = TRUE)
  install_github(‘IRkernel/repr’)
  
  After installing these packages, the build process worked without any errors. Also make sure you have access to the launch scripts using chmod u+x command.
  
  LikeLike
  Reply
  1. Amos
    
    November 24, 2015 at 2:15 pm
    
    Thanks!
    
    The only one of those that should be required is evaluate. I’ll look into the others. I’m also going to put up some documentation this week.
    
    LikeLike
  2. ganeshbhat3
    
    November 25, 2015 at 8:33 am
    
    worked fine until it started to download HBase. Below is the error. How did you find out the dependencies? If yu can suggest how to identify dependencies, I can find it and update it accordingly here
    
    Downloading: https://repo.maven.apache.org/maven2/org/apache/phoenix/phoenix/4.4.0-HBase-1.0/phoenix-4.4.0-HBase-1.0.pom
    [INFO] ————————————————————————
    [INFO] Reactor Summary:
    [INFO]
    [INFO] Zeppelin ……………………………………. SUCCESS [ 36.633 s]
    [INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 46.716 s]
    [INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 13.824 s]
    [INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [01:50 min]
    [INFO] Zeppelin: Spark ……………………………… SUCCESS [03:15 min]
    [INFO] Zeppelin: R Interpreter ………………………. SUCCESS [10:28 min]
    [INFO] Zeppelin: Markdown interpreter ………………… SUCCESS [ 1.335 s]
    [INFO] Zeppelin: Angular interpreter …………………. SUCCESS [ 1.075 s]
    [INFO] Zeppelin: Shell interpreter …………………… SUCCESS [ 1.195 s]
    [INFO] Zeppelin: Hive interpreter ……………………. SUCCESS [ 9.754 s]
    [INFO] Zeppelin: Apache Phoenix Interpreter …………… FAILURE [32:31 min]
    [INFO] Zeppelin: PostgreSQL interpreter ………………. SKIPPED
    [INFO] Zeppelin: Tajo interpreter ……………………. SKIPPED
    [INFO] Zeppelin: Flink ……………………………… SKIPPED
    [INFO] Zeppelin: Apache Ignite interpreter ……………. SKIPPED
    [INFO] Zeppelin: Kylin interpreter …………………… SKIPPED
    [INFO] Zeppelin: Lens interpreter ……………………. SKIPPED
    [INFO] Zeppelin: Cassandra ………………………….. SKIPPED
    [INFO] Zeppelin: web Application …………………….. SKIPPED
    [INFO] Zeppelin: Server …………………………….. SKIPPED
    [INFO] Zeppelin: Packaging distribution ………………. SKIPPED
    [INFO] ————————————————————————
    [INFO] BUILD FAILURE
    [INFO] ————————————————————————
    [INFO] Total time: 50:00 min
    [INFO] Finished at: 2015-11-25T20:54:22+05:30
    [INFO] Final Memory: 81M/679M
    [INFO] ————————————————————————
    
    [ERROR] Failed to read artifact descriptor for org.apache.phoenix:phoenix-core:jar:4.4.0-HBase-1.0: Could not transfer artifact org.apache.phoenix:phoenix:pom:4.4.0-HBase-1.0 from/to central (https://repo.maven.apache.org/maven2):: Read timed out -> [Help 1]
    [ERROR]
    [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
    [ERROR] Re-run Maven using the -X switch to enable full debug logging.
    [ERROR]
    [ERROR] For more information about the errors and possible solutions, please read the following articles:
    [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
    [ERROR]
    [ERROR] After correcting the problems, you can resume the build with the command
    [ERROR] mvn -rf :zeppelin-phoenix
    
    LikeLike
  3. ganeshbhat3
    
    November 25, 2015 at 8:31 pm
    
    Ran it again an it was setup.
    
    Need to get the flights.csv for initial analysis? Can you please upload this file?
    
    Thanks
    
    Ganesh Bhat
    
    LikeLike
Pingback: Links /11/2015: Webconverger 33.1, Netrunner 17 Released | Techrights
emaasit

November 27, 2015 at 12:38 pm

Hi Ganesh,
The dataset can be found here: https://github.com/SparkIQ-Labs/Demos/tree/master/datasets

LikeLike
Reply
Pingback: dataaspirant-Nov2015-newsLetter | dataaspirant
Robert

December 15, 2015 at 2:15 pm

This is super cool. I was trying to replicate the steps, but when I mvn clean package -DskipTests, I got:

workspace/zeppelin/Zeppelin-With-R/r/src/main/scala/org/apache/zeppelin/rinterpreter/RContext.scala:157: error: value isSparkRSupported is not a member of org.apache.zeppelin.spark.SparkVersion
[INFO] if (!intp.getSparkVersion().isSparkRSupported) throw new RuntimeException(“SparkR requires Spark 1.4 or later”)
[INFO] ^
[ERROR] one error found

It seems like there is some issue with getting SparkVersion, but I am not sure how to fix it.

LikeLike
Reply
1. Amos
  
  December 15, 2015 at 4:48 pm
  
  If you look here: https://github.com/elbamos/Zeppelin-With-R/blob/rinterpreter-0.5.5/spark/src/main/java/org/apache/zeppelin/spark/SparkVersion.java
  
  You’ll see that `isSparkRSupported` is *definitely* part of the code.
  
  The most likely suspect, is that you’re trying to install this on top of an existing Zeppelin source directory, or otherwise not starting from a clean slate.
  
  Can you try again, cloning into an empty directory? Thanks
  
  (Also, the git is definitely the place for posting issues, so we don’t divert Daniel’s blog.)
  
  LikeLike
  Reply
Pingback: Various notebooks / Mihai Lupu
1. Daniel Emaasit
  
  March 2, 2016 at 11:07 am
  
  Amos Elberg (https://github.com/elbamos) is actually the creator of the R interpreter, and not SparkIQ Labs as you indicated in your blog post.
  
  LikeLike
  Reply
m azahari

March 9, 2016 at 11:18 pm

Hi,

I need some help here. I tried twice successfully but R was missing in both attempts. I have installed all required R packages as indicated but R is missing

[INFO] Zeppelin ……………………………………. SUCCESS [ 9.399 s]
[INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 25.896 s]
[INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 10.436 s]
[INFO] Zeppelin: Display system apis …………………. SUCCESS [ 21.377 s]
[INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [ 51.776 s]
[INFO] Zeppelin: Spark ……………………………… SUCCESS [ 46.444 s]
[INFO] Zeppelin: Markdown interpreter ………………… SUCCESS [ 1.403 s]
[INFO] Zeppelin: Angular interpreter …………………. SUCCESS [ 1.105 s]
[INFO] Zeppelin: Shell interpreter …………………… SUCCESS [ 0.625 s]
[INFO] Zeppelin: Hive interpreter ……………………. SUCCESS [ 4.718 s]
[INFO] Zeppelin: HBase interpreter …………………… SUCCESS [ 8.663 s]
[INFO] Zeppelin: Apache Phoenix Interpreter …………… SUCCESS [ 9.917 s]
[INFO] Zeppelin: PostgreSQL interpreter ………………. SUCCESS [ 1.199 s]
[INFO] Zeppelin: JDBC interpreter ……………………. SUCCESS [ 0.732 s]
[INFO] Zeppelin: Tajo interpreter ……………………. SUCCESS [ 2.253 s]
[INFO] Zeppelin: Flink ……………………………… SUCCESS [ 23.066 s]
[INFO] Zeppelin: Apache Ignite interpreter ……………. SUCCESS [ 2.542 s]
[INFO] Zeppelin: Kylin interpreter …………………… SUCCESS [ 2.075 s]
[INFO] Zeppelin: Lens interpreter ……………………. SUCCESS [ 8.047 s]
[INFO] Zeppelin: Cassandra ………………………….. SUCCESS [01:41 min]
[INFO] Zeppelin: Elasticsearch interpreter ……………. SUCCESS [ 7.917 s]
[INFO] Zeppelin: Tachyon interpreter …………………. SUCCESS [ 3.348 s]
[INFO] Zeppelin: web Application …………………….. SUCCESS [01:53 min]
[INFO] Zeppelin: Server …………………………….. SUCCESS [ 19.168 s]
[INFO] Zeppelin: Packaging distribution ………………. SUCCESS [ 1.549 s]
[INFO] ————————————————————————
[INFO] BUILD SUCCESS

LikeLike
Reply
m azahari

March 10, 2016 at 1:00 am

redownload from github and tried again. This time I’m getting this error

[INFO] Zeppelin ……………………………………. SUCCESS [ 4.451 s]
[INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 8.033 s]
[INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 4.681 s]
[INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [ 42.967 s]
[INFO] Zeppelin: Spark ……………………………… SUCCESS [ 37.819 s]
[INFO] Zeppelin: R Interpreter ………………………. FAILURE [ 35.534 s]
[INFO] Zeppelin: Markdown interpreter ………………… SKIPPED
[INFO] Zeppelin: Angular interpreter …………………. SKIPPED
[INFO] Zeppelin: Shell interpreter …………………… SKIPPED
[INFO] Zeppelin: Hive interpreter ……………………. SKIPPED
[INFO] Zeppelin: Apache Phoenix Interpreter …………… SKIPPED
[INFO] Zeppelin: PostgreSQL interpreter ………………. SKIPPED
[INFO] Zeppelin: Tajo interpreter ……………………. SKIPPED
[INFO] Zeppelin: Flink ……………………………… SKIPPED
[INFO] Zeppelin: Apache Ignite interpreter ……………. SKIPPED
[INFO] Zeppelin: Kylin interpreter …………………… SKIPPED
[INFO] Zeppelin: Lens interpreter ……………………. SKIPPED
[INFO] Zeppelin: Cassandra ………………………….. SKIPPED
[INFO] Zeppelin: Elasticsearch interpreter ……………. SKIPPED
[INFO] Zeppelin: web Application …………………….. SKIPPED
[INFO] Zeppelin: Server …………………………….. SKIPPED
[INFO] Zeppelin: Packaging distribution ………………. SKIPPED
[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 02:14 min
[INFO] Finished at: 2016-03-10T00:51:50-08:00
[INFO] Final Memory: 107M/301M
[INFO] ————————————————————————
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default) on project zeppelin-zrinterpreter: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :zeppelin-zrinterpreter

what is the problem?

LikeLike
Reply
azeltov

March 21, 2016 at 1:20 pm

I ran into this issue: com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name ‘id’ (in class org.apache.spark.rdd.RDDOperationScope)?

I got everything built successfully, but when trying the r sample notebook getting this error. Tried it on some of my regular pyspark code that i know works in regular zeppelin and getting the same error. I build the zeppelin using this command:

mvn package install -Pspark-1.6 -Phadoop-2.4 -DskipTests

Appreciate your assistance!

data(faithful)

faith <- createDataFrame(sqlContext, faithful)

data(faithful)

faith <- createDataFrame(sqlContext, faithful)

<simpleError in invokeJava(isStatic = TRUE, className, methodName, …): com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: {"id":"0","name":"parallelize"}; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)

Thanks!

Alex

LikeLike
Reply
1. Amos
  
  March 21, 2016 at 1:49 pm
  
  It looks like you’re trying to use a version of spark other than the one Zeppelin was compiled with.
  
  Please open an issue on my git if you can’t fix this by recompiling.
  
  LikeLike
  Reply