Introduction
The objective of this blog post is to help you get started with Apache Zeppelin notebook for your R data science requirements. Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with Scala(with Apache Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown, Shell and more.
However, the latest official release, version 0.5.0, does not yet support the R programming language. Fortunately NFLabs, the company driving this open source project, pointed me to this pull request that provides an R Interpreter. An Interpreter is a plug-in which enables zeppelin users to use a specific language/data-processing-backend. For example to use scala code in Zeppelin, you need a spark
interpreter. So, if you are impatient like I am for R-integration into Zeppelin, this tutorial will show you how to setup Zeppelin for use with R by building from source.
Prerequisites
- We will launch Zeppelin through Bash shell on Linux. If you are using Windows OS I recommend that you install and use the Cygwin terminal (It provides functionality similar to a Linux distribution on Windows).
- Make sure Java 1.7 and Maven 3.2.x are installed on your host machine and their environment variables are set.
Build Zeppelin from Source
Step 1: Download Zeppelin Source Code
Go to this github branch and download the source code. Alternatively copy and paste this link into your web browser: https://github.com/elbamos/incubator-zeppelin/tree/rinterpreter
In my case I have downloaded and unzipped the folder onto my Desktop
Step 2: Build Zeppelin
Run the following code in your terminal to build zeppelin on your host machine in local mode. If you are installing on a cluster then add these options found in the Zeppelin documentation.
$ cd Desktop/Apache/incubator-zeppelin-rinterpreter
$ mvn clean package -DskipTests
This will take around 16 minutes to build zeppelin, Spark and all interpreters including R, Markdown, Hive, Shell, and others. (as shown in the image below).
Step 3: Start Zeppelin
Run the following command to start zeppelin.
$ ./bin/zeppelin-daemon.sh start
Go to localhost on your web browser and listen on port 8080. (i.e. http://localhost:8080). At this point you are ready to start creating interactive notebooks with code and graphs in Zeppelin.
Interactive Data Science
Step 1: Create a Notebook
Click the dropdown arrow next to the “Notebook” page and click “Create new note”.
Give your notebook a name or you can use the assigned default name. I named mine “Base R in Apache Zeppelin”.
Step 2: Start your Analysis
To use R, use the “%spark.r” or “%spark.knitr” tags as shown in the images below. First let’s use markdown to write some instruction text.
Now let’s install some packages that we may need for our analysis.
Now let’s read in our data set. We shall use the “flights” dataset which shows flights departing New York in 2013.
Now let’s do some data manipulation using dplyr (with the pipe operator)
You can also use bar graphs and pie charts to visualize some descriptive statistics from your data.
Now let’s do some data exploration with ggplot2
Now let’s do some statistical machine learning using the caret package.
How about creating some maps.
Final Remarks
Zeppelin allows you to create interactive documents with beautiful graphs using multiple programming languages. The objective of this post was to help you setup Zeppelin for use with the R programming language. Hopefully the Project Management Committees (PMC) of this wonderful open source project can release the next version with an R interpreter. It will surely make it easier to launch Zeppelin faster without having to build from source.
Also it’s worth mentioning that there is another R interpreter for Zeppelin produced by the folks at Data Layer. You can find instructions on how to use it here: https://github.com/datalayer/zeppelin-R.
Try out both interpreters and share your experiences in the comments section below.
Moving Ahead
As a follow-up to this post, We shall see how to use Apache Spark (especially SparkR) within Zeppelin in the next blog post.
Updates
[11/24/2015]: Since several people were running into errors while building Zeppelin, I tried it out again on a new clean machine. I ran into the same errors and that’s when I discovered that you need to install the following packages first before building Zeppelin-RInterpreter.
install.packages("evaluate", dependencies = TRUE)
install.packages("base64enc", dependencies = TRUE)
After installing these packages, the build process worked without any errors. Also make sure you have access to the launch scripts using chmod u+x
command.
Also I noticed that you need the “repr” package to be able to view R output. Install it from github using the “devtools” package.
install.packages(“devtools”, dependencies = TRUE)
install_github('IRkernel/repr')
[11/27/2015]: The” flights” dataset can be found here: https://github.com/SparkIQ-Labs/Demos/tree/master/datasets
[12/06/2015]: The notebook used in this demo and other notebooks can be found here: https://github.com/SparkIQ-Labs/Demos/tree/master/zeppelin-notebooks
Awesome post, thank you for sharing!
Would you be willing to post the notebook with R example i.e on github and including a link in the article?
Also, you might use http://zeppelinhub.com/viewer for a preview
LikeLike
Thanks Alex. I actually have a Zeppelin Hub (beta) account. Let me push the notebooks online on github and I will add an “Updates” section in the blog above with the link.
LikeLike
Hi,
getting the below error. Can you please help:
mvn clean package -DskipTests
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
[INFO] Scanning for projects…
[INFO] ————————————————————————
[ERROR] FATAL ERROR
[INFO] ————————————————————————
[INFO] Error building POM (may not be this project’s POM).
Project ID: unknown
POM Location: /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/spark-dependencies/pom.xml
Reason: Parse error reading POM. Reason: unexpected character in markup < (position: END_TAG seen …\n<<… @479:3) for project unknown at /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/spark-dependencies/pom.xml
LikeLike
Great post. Have you been able to successfully build on a Windows box using cygwin as hinted in your post? I have tried numerous times (from the main Zeppelin page) and it fails on Windows. I’ve seen posts from Madhuka that it’s possible, but haven’t seen the secret recipe yet.
When I try @elbamos’s rinterpreter branch (nov 17 2015), it throws non-parseable POM errors, so doesn’t even get to building.
Thoughts would be welcome.
LikeLike
Hi Wade,
I have been meaning to try it out on Windows OS but have not yet. I will do so later this evening and let you know & then I will include an “Updates” section at the end of the blog.
LikeLike
Hi all. I’m the author of the rinterpreter for zeppelin.
The build problem you’re having is because of a mistake I made when rebasing last night. It should be fixed now. Your best bet for building is to specify the Spark version also, like this:
mvn package install -Pspark-1.5 -DskipTests
The reason is that if there’s a different spark version between build and runtime, you’ll get very odd errors about spark dependency libraries like akka, but you won’t see them until you try to actually execute SparkR commands. And this happens often because the Zeppelin build process downloads new jars for whatever spark version you specify at build, even if you already have a working spark installed.
Please let me know if the build doesn’t work or you have any other problems. You can contact me directly, and I’ll try to monitor the comments here.
I just attempted to make this comment and it didn’t appear… hope this doesn’t turn into a double-comment…
LikeLike
Hi Amos,
Thanks for updating. Just tried again, seeing error below during build of R Interpreter. Here’s what I tried (Windows box using cygwin/babun).
{ software } » git clone -b rinterpreter https://github.com/elbamos/incubator-zeppelin.git
{ software } » cd incubator-zeppelin
{ incubator-zeppelin } rinterpreter » mvn package install -Pspark-1.5 -DskipTests
[INFO] Zeppelin ……………………………………. SUCCESS [02:28 min]
[INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 13.420 s]
[INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 8.096 s]
[INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [ 55.363 s]
[INFO] Zeppelin: Spark ……………………………… SUCCESS [ 29.736 s]
[INFO] Zeppelin: R Interpreter ………………………. FAILURE [ 16.191 s]
[INFO] Zeppelin: Markdown interpreter ………………… SKIPPED
[INFO] Zeppelin: Angular interpreter …………………. SKIPPED
[INFO] Zeppelin: Shell interpreter …………………… SKIPPED
[INFO] Zeppelin: Hive interpreter ……………………. SKIPPED
[INFO] Zeppelin: Apache Phoenix Interpreter …………… SKIPPED
[INFO] Zeppelin: PostgreSQL interpreter ………………. SKIPPED
[INFO] Zeppelin: Tajo interpreter ……………………. SKIPPED
[INFO] Zeppelin: Flink ……………………………… SKIPPED
[INFO] Zeppelin: Apache Ignite interpreter ……………. SKIPPED
[INFO] Zeppelin: Kylin interpreter …………………… SKIPPED
[INFO] Zeppelin: Lens interpreter ……………………. SKIPPED
[INFO] Zeppelin: Cassandra ………………………….. SKIPPED
[INFO] Zeppelin: web Application …………………….. SKIPPED
[INFO] Zeppelin: Server …………………………….. SKIPPED
[INFO] Zeppelin: Packaging distribution ………………. SKIPPED
[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 04:31 min
[INFO] Finished at: 2015-11-18T12:24:00-05:00
[INFO] Final Memory: 103M/1072M
[INFO] ————————————————————————
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default) on project zeppelin-zrinterpreter: Command execution failed. Cannot run program “R\install-dev.sh” (in directory “C:\xxxx\software\incubator-zeppelin\r”): CreateProcess error=2, The system cannot find the file specified -> [Help 1]
LikeLike
Wade — Its likely that this relates to Windows, which I don’t have a way to test since I don’t have a Windows machine. Is it possible for us to communicate directly? I’d like to track this down and resolve any other Windows options permanently.
LikeLike
Hi Amos, I can assist with Windows issues. Will ping you via email.
LikeLike
Hi, Wade and Amos.
I’m getting the same error as Wade. Have you guys solved this issue?
Thanks!
LikeLike
Yes. The only way to run Zeppelin on Windows that I’m aware of is docker. Since Zeppelin will not run on Windows, the R-interpreter will not run on Windows. This is a Zeppelin limitation.
LikeLike
Ganesh: I pushed code earlier today that should solve that issue. Please reclone the git and try again. Thanks.
LikeLike
Still getting error:
mvn clean package -DskipTests
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
[INFO] Scanning for projects…
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-spark:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 416, column 15
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-zrinterpreter:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘dependencies.dependency.(groupId:artifactId:type:classifier)’ must be unique: org.apache.spark:spark-core_${scala.binary.version}:jar -> duplicate declaration of version ${spark.version} @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 137, column 21
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 314, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-flink:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.(groupId:artifactId)’ must be unique but found duplicate declaration of plugin org.apache.maven.plugins:maven-dependency-plugin @ line 349, column 15
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-cassandra:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 174, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ————————————————————————
[INFO] Reactor Build Order:
[INFO]
[INFO] Zeppelin
[INFO] Zeppelin: Interpreter
[INFO] Zeppelin: Zengine
[INFO] Zeppelin: Spark dependencies
[INFO] Zeppelin: Spark
[INFO] Zeppelin: R Interpreter
[INFO] Zeppelin: Markdown interpreter
[INFO] Zeppelin: Angular interpreter
[INFO] Zeppelin: Shell interpreter
[INFO] Zeppelin: Hive interpreter
[INFO] Zeppelin: Apache Phoenix Interpreter
[INFO] Zeppelin: PostgreSQL interpreter
[INFO] Zeppelin: Tajo interpreter
[INFO] Zeppelin: Flink
[INFO] Zeppelin: Apache Ignite interpreter
[INFO] Zeppelin: Kylin interpreter
[INFO] Zeppelin: Lens interpreter
[INFO] Zeppelin: Cassandra
[INFO] Zeppelin: web Application
[INFO] Zeppelin: Server
[INFO] Zeppelin: Packaging distribution
[INFO]
[INFO] ————————————————————————
[INFO] Building Zeppelin 0.6.0-incubating-SNAPSHOT
[INFO] ————————————————————————
[INFO]
[INFO] — maven-clean-plugin:2.6.1:clean (default-clean) @ zeppelin —
[INFO] Deleting /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/target
[INFO]
[INFO] — maven-checkstyle-plugin:2.13:check (checkstyle-fail-build) @ zeppelin —
[INFO]
[INFO]
[INFO] — maven-resources-plugin:2.7:copy-resources (copy-resources) @ zeppelin —
[INFO] Using ‘UTF-8’ encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/../_tools/site
[INFO]
[INFO] — maven-enforcer-plugin:1.3.1:enforce (enforce) @ zeppelin —
[INFO]
[INFO] — maven-remote-resources-plugin:1.4:process (default) @ zeppelin —
[INFO]
[INFO] — maven-dependency-plugin:2.8:copy-dependencies (copy-dependencies) @ zeppelin —
[INFO]
[INFO] — maven-site-plugin:3.4:attach-descriptor (attach-descriptor) @ zeppelin —
[INFO]
[INFO] ————————————————————————
[INFO] Building Zeppelin: Interpreter 0.6.0-incubating-SNAPSHOT
[INFO] ————————————————————————
Downloading: https://repo.maven.apache.org/maven2/org/apache/thrift/libthrift/0.9.2/libthrift-0.9.2.jar
LikeLike
Ganesh – Sorry about that – it should be fixed now.
If no-one noticed, this push was a little rushed…
Anyway, should be fixed now. Would appreciate if you’d confirm.
LikeLike
Issue still persist with the latest build or changes. Pasting the error below again
mvn clean package -DskipTests
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
[INFO] Scanning for projects…
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-spark:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 416, column 15
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-zrinterpreter:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘dependencies.dependency.(groupId:artifactId:type:classifier)’ must be unique: org.apache.spark:spark-core_${scala.binary.version}:jar -> duplicate declaration of version ${spark.version} @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 137, column 21
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ org.apache.zeppelin:zeppelin-zrinterpreter:[unknown-version], /home/ganesh/Desktop/Apache/incubator-zeppelin-rinterpreter/r/pom.xml, line 314, column 21
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-flink:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.(groupId:artifactId)’ must be unique but found duplicate declaration of plugin org.apache.maven.plugins:maven-dependency-plugin @ line 349, column 15
[WARNING]
[WARNING] Some problems were encountered while building the effective model for org.apache.zeppelin:zeppelin-cassandra:jar:0.6.0-incubating-SNAPSHOT
[WARNING] ‘build.plugins.plugin.version’ for org.scala-tools:maven-scala-plugin is missing. @ line 174, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO] ————————————————————————
LikeLike
Ganesh that doesn’t look like an error. It looks like warnings. It looks like the build should have gone on to complete. Can you email me through the git or provide an e-mail address and I’ll try to contact you directly?
LikeLike
Agreed. But the download does not go through. It is trying to download but does not download anything.
Downloading: https://repo.maven.apache.org/maven2/org/apache/thrift/libthrift/0.9.2/libthrift-0.9.2.jar
There is no internet issue as I am able to surf the internet and download stuff as well.
My email address: ganeshbhat.3@gmail.com
LikeLike
Got the error finally after waiting for 30 minutes:
[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 30:26 min
[INFO] Finished at: 2015-11-23T09:57:46+05:30
[INFO] Final Memory: 31M/247M
[INFO] ————————————————————————
[ERROR] Failed to execute goal on project zeppelin-interpreter: Could not resolve dependencies for project org.apache.zeppelin:zeppelin-interpreter:jar:0.6.0-incubating-SNAPSHOT: Could not transfer artifact org.apache.thrift:libthrift:jar:0.9.2 from/to central (https://repo.maven.apache.org/maven2): Timeout while waiting for concurrent download of /home/ganesh/.m2/repository/org/apache/thrift/libthrift/0.9.2/libthrift-0.9.2.jar.part to progress -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :zeppelin-interpreter
LikeLike
Ganesh that isn’t a problem with the R interpreter. That’s core vanilla Zeppelin failing to find its dependencies.
Can you contact me directly and I’ll try to help?
LikeLike
Since several people were running into errors while building Zeppelin, I tried it out again on a new clean machine. I ran into the same errors and that’s when I discovered that you need to install the following packages first before building Zeppelin-RInterpreter.
install.packages(“evaluate”, dependencies = TRUE)
install.packages(“base64enc”, dependencies = TRUE)
sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libxml2-dev
install.packages(“devtools”, dependencies = TRUE)
install_github(‘IRkernel/repr’)
After installing these packages, the build process worked without any errors. Also make sure you have access to the launch scripts using chmod u+x command.
LikeLike
It fails at R interpreter
LikeLike
Can you contact me directly and/or post an error log? It should compile fine on your system.
LikeLike
Hi Amos,
I’ve been trying for a while, including all of the comments (including specifying spark version) in the process, but keep running to failure for the R interpreter part.
I’ve moved it to the end of the process so the rest would actually work. Here’s what I get:
[INFO] Reactor Summary:
[INFO]
[INFO] Zeppelin ……………………………………. SUCCESS [02:18 min]
[INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 26.050 s]
[INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 14.347 s]
[INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [02:14 min]
[INFO] Zeppelin: Spark ……………………………… SUCCESS [ 31.050 s]
[INFO] Zeppelin: web Application …………………….. SUCCESS [03:21 min]
[INFO] Zeppelin: Server …………………………….. SUCCESS [ 33.539 s]
[INFO] Zeppelin: Packaging distribution ………………. SUCCESS [ 2.694 s]
[INFO] Zeppelin: Markdown interpreter ………………… SUCCESS [ 3.287 s]
[INFO] Zeppelin: Angular interpreter …………………. SUCCESS [ 3.002 s]
[INFO] Zeppelin: Shell interpreter …………………… SUCCESS [ 3.132 s]
[INFO] Zeppelin: Hive interpreter ……………………. SUCCESS [ 8.717 s]
[INFO] Zeppelin: Apache Phoenix Interpreter …………… SUCCESS [ 17.422 s]
[INFO] Zeppelin: PostgreSQL interpreter ………………. SUCCESS [ 6.395 s]
[INFO] Zeppelin: Tajo interpreter ……………………. SUCCESS [ 5.077 s]
[INFO] Zeppelin: Flink ……………………………… SUCCESS [ 15.353 s]
[INFO] Zeppelin: Apache Ignite interpreter ……………. SUCCESS [ 4.958 s]
[INFO] Zeppelin: Kylin interpreter …………………… SUCCESS [ 2.934 s]
[INFO] Zeppelin: Lens interpreter ……………………. SUCCESS [ 9.202 s]
[INFO] Zeppelin: Cassandra ………………………….. SUCCESS [ 23.548 s]
[INFO] Zeppelin: R Interpreter ………………………. FAILURE [ 2.472 s]
[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 11:29 min
[INFO] Finished at: 2015-11-27T14:30:14+01:00
[INFO] Final Memory: 89M/325M
[INFO] ————————————————————————
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default) on project zeppelin-zrinterpreter: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :zeppelin-zrinterpreter
I don’t understand where to type Emaasit’s commands, I receive ” syntax error near unexpected token `”evaluate” ” in the terminal.
LikeLike
That part of the error log just says there was an error it doesn’t say what the error was. That’s earlier.
Anyway you shouldn’t need to change the order when things install.
Please try it from a fresh clone of the git and contact me directly by email with the whole install log if it fails.
LikeLike
Am using CentOS Apache hadoop and spark cluster, Maven 3.3.3, npm installed.
The installation fails at R interpreter, I tried all the above steps.
LikeLike
Since several people were running into errors while building Zeppelin, I tried it out again on a new clean machine. I ran into the same errors and that’s when I discovered that you need to install the following packages first before building Zeppelin-RInterpreter.
install.packages(“evaluate”, dependencies = TRUE)
install.packages(“base64enc”, dependencies = TRUE)
sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libxml2-dev
install.packages(“devtools”, dependencies = TRUE)
install_github(‘IRkernel/repr’)
After installing these packages, the build process worked without any errors. Also make sure you have access to the launch scripts using chmod u+x command.
LikeLike
Thanks!
The only one of those that should be required is evaluate. I’ll look into the others. I’m also going to put up some documentation this week.
LikeLike
worked fine until it started to download HBase. Below is the error. How did you find out the dependencies? If yu can suggest how to identify dependencies, I can find it and update it accordingly here
Downloading: https://repo.maven.apache.org/maven2/org/apache/phoenix/phoenix/4.4.0-HBase-1.0/phoenix-4.4.0-HBase-1.0.pom
[INFO] ————————————————————————
[INFO] Reactor Summary:
[INFO]
[INFO] Zeppelin ……………………………………. SUCCESS [ 36.633 s]
[INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 46.716 s]
[INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 13.824 s]
[INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [01:50 min]
[INFO] Zeppelin: Spark ……………………………… SUCCESS [03:15 min]
[INFO] Zeppelin: R Interpreter ………………………. SUCCESS [10:28 min]
[INFO] Zeppelin: Markdown interpreter ………………… SUCCESS [ 1.335 s]
[INFO] Zeppelin: Angular interpreter …………………. SUCCESS [ 1.075 s]
[INFO] Zeppelin: Shell interpreter …………………… SUCCESS [ 1.195 s]
[INFO] Zeppelin: Hive interpreter ……………………. SUCCESS [ 9.754 s]
[INFO] Zeppelin: Apache Phoenix Interpreter …………… FAILURE [32:31 min]
[INFO] Zeppelin: PostgreSQL interpreter ………………. SKIPPED
[INFO] Zeppelin: Tajo interpreter ……………………. SKIPPED
[INFO] Zeppelin: Flink ……………………………… SKIPPED
[INFO] Zeppelin: Apache Ignite interpreter ……………. SKIPPED
[INFO] Zeppelin: Kylin interpreter …………………… SKIPPED
[INFO] Zeppelin: Lens interpreter ……………………. SKIPPED
[INFO] Zeppelin: Cassandra ………………………….. SKIPPED
[INFO] Zeppelin: web Application …………………….. SKIPPED
[INFO] Zeppelin: Server …………………………….. SKIPPED
[INFO] Zeppelin: Packaging distribution ………………. SKIPPED
[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 50:00 min
[INFO] Finished at: 2015-11-25T20:54:22+05:30
[INFO] Final Memory: 81M/679M
[INFO] ————————————————————————
[ERROR] Failed to read artifact descriptor for org.apache.phoenix:phoenix-core:jar:4.4.0-HBase-1.0: Could not transfer artifact org.apache.phoenix:phoenix:pom:4.4.0-HBase-1.0 from/to central (https://repo.maven.apache.org/maven2):: Read timed out -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :zeppelin-phoenix
LikeLike
Ran it again an it was setup.
Need to get the flights.csv for initial analysis? Can you please upload this file?
Thanks
Ganesh Bhat
LikeLike
Hi Ganesh,
The dataset can be found here: https://github.com/SparkIQ-Labs/Demos/tree/master/datasets
LikeLike
This is super cool. I was trying to replicate the steps, but when I mvn clean package -DskipTests, I got:
workspace/zeppelin/Zeppelin-With-R/r/src/main/scala/org/apache/zeppelin/rinterpreter/RContext.scala:157: error: value isSparkRSupported is not a member of org.apache.zeppelin.spark.SparkVersion
[INFO] if (!intp.getSparkVersion().isSparkRSupported) throw new RuntimeException(“SparkR requires Spark 1.4 or later”)
[INFO] ^
[ERROR] one error found
It seems like there is some issue with getting SparkVersion, but I am not sure how to fix it.
LikeLike
If you look here: https://github.com/elbamos/Zeppelin-With-R/blob/rinterpreter-0.5.5/spark/src/main/java/org/apache/zeppelin/spark/SparkVersion.java
You’ll see that `isSparkRSupported` is *definitely* part of the code.
The most likely suspect, is that you’re trying to install this on top of an existing Zeppelin source directory, or otherwise not starting from a clean slate.
Can you try again, cloning into an empty directory? Thanks
(Also, the git is definitely the place for posting issues, so we don’t divert Daniel’s blog.)
LikeLike
Amos Elberg (https://github.com/elbamos) is actually the creator of the R interpreter, and not SparkIQ Labs as you indicated in your blog post.
LikeLike
Hi,
I need some help here. I tried twice successfully but R was missing in both attempts. I have installed all required R packages as indicated but R is missing
[INFO] Zeppelin ……………………………………. SUCCESS [ 9.399 s]
[INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 25.896 s]
[INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 10.436 s]
[INFO] Zeppelin: Display system apis …………………. SUCCESS [ 21.377 s]
[INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [ 51.776 s]
[INFO] Zeppelin: Spark ……………………………… SUCCESS [ 46.444 s]
[INFO] Zeppelin: Markdown interpreter ………………… SUCCESS [ 1.403 s]
[INFO] Zeppelin: Angular interpreter …………………. SUCCESS [ 1.105 s]
[INFO] Zeppelin: Shell interpreter …………………… SUCCESS [ 0.625 s]
[INFO] Zeppelin: Hive interpreter ……………………. SUCCESS [ 4.718 s]
[INFO] Zeppelin: HBase interpreter …………………… SUCCESS [ 8.663 s]
[INFO] Zeppelin: Apache Phoenix Interpreter …………… SUCCESS [ 9.917 s]
[INFO] Zeppelin: PostgreSQL interpreter ………………. SUCCESS [ 1.199 s]
[INFO] Zeppelin: JDBC interpreter ……………………. SUCCESS [ 0.732 s]
[INFO] Zeppelin: Tajo interpreter ……………………. SUCCESS [ 2.253 s]
[INFO] Zeppelin: Flink ……………………………… SUCCESS [ 23.066 s]
[INFO] Zeppelin: Apache Ignite interpreter ……………. SUCCESS [ 2.542 s]
[INFO] Zeppelin: Kylin interpreter …………………… SUCCESS [ 2.075 s]
[INFO] Zeppelin: Lens interpreter ……………………. SUCCESS [ 8.047 s]
[INFO] Zeppelin: Cassandra ………………………….. SUCCESS [01:41 min]
[INFO] Zeppelin: Elasticsearch interpreter ……………. SUCCESS [ 7.917 s]
[INFO] Zeppelin: Tachyon interpreter …………………. SUCCESS [ 3.348 s]
[INFO] Zeppelin: web Application …………………….. SUCCESS [01:53 min]
[INFO] Zeppelin: Server …………………………….. SUCCESS [ 19.168 s]
[INFO] Zeppelin: Packaging distribution ………………. SUCCESS [ 1.549 s]
[INFO] ————————————————————————
[INFO] BUILD SUCCESS
LikeLike
redownload from github and tried again. This time I’m getting this error
[INFO] Zeppelin ……………………………………. SUCCESS [ 4.451 s]
[INFO] Zeppelin: Interpreter ………………………… SUCCESS [ 8.033 s]
[INFO] Zeppelin: Zengine ……………………………. SUCCESS [ 4.681 s]
[INFO] Zeppelin: Spark dependencies ………………….. SUCCESS [ 42.967 s]
[INFO] Zeppelin: Spark ……………………………… SUCCESS [ 37.819 s]
[INFO] Zeppelin: R Interpreter ………………………. FAILURE [ 35.534 s]
[INFO] Zeppelin: Markdown interpreter ………………… SKIPPED
[INFO] Zeppelin: Angular interpreter …………………. SKIPPED
[INFO] Zeppelin: Shell interpreter …………………… SKIPPED
[INFO] Zeppelin: Hive interpreter ……………………. SKIPPED
[INFO] Zeppelin: Apache Phoenix Interpreter …………… SKIPPED
[INFO] Zeppelin: PostgreSQL interpreter ………………. SKIPPED
[INFO] Zeppelin: Tajo interpreter ……………………. SKIPPED
[INFO] Zeppelin: Flink ……………………………… SKIPPED
[INFO] Zeppelin: Apache Ignite interpreter ……………. SKIPPED
[INFO] Zeppelin: Kylin interpreter …………………… SKIPPED
[INFO] Zeppelin: Lens interpreter ……………………. SKIPPED
[INFO] Zeppelin: Cassandra ………………………….. SKIPPED
[INFO] Zeppelin: Elasticsearch interpreter ……………. SKIPPED
[INFO] Zeppelin: web Application …………………….. SKIPPED
[INFO] Zeppelin: Server …………………………….. SKIPPED
[INFO] Zeppelin: Packaging distribution ………………. SKIPPED
[INFO] ————————————————————————
[INFO] BUILD FAILURE
[INFO] ————————————————————————
[INFO] Total time: 02:14 min
[INFO] Finished at: 2016-03-10T00:51:50-08:00
[INFO] Final Memory: 107M/301M
[INFO] ————————————————————————
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:exec (default) on project zeppelin-zrinterpreter: Command execution failed. Process exited with an error: 1 (Exit value: 1) -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :zeppelin-zrinterpreter
what is the problem?
LikeLike
I ran into this issue: com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name ‘id’ (in class org.apache.spark.rdd.RDDOperationScope)?
I got everything built successfully, but when trying the r sample notebook getting this error. Tried it on some of my regular pyspark code that i know works in regular zeppelin and getting the same error. I build the zeppelin using this command:
mvn package install -Pspark-1.6 -Phadoop-2.4 -DskipTests
Appreciate your assistance!
data(faithful)
faith <- createDataFrame(sqlContext, faithful)
data(faithful)
faith <- createDataFrame(sqlContext, faithful)
<simpleError in invokeJava(isStatic = TRUE, className, methodName, …): com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: {"id":"0","name":"parallelize"}; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
Thanks!
Alex
LikeLike
It looks like you’re trying to use a version of spark other than the one Zeppelin was compiled with.
Please open an issue on my git if you can’t fix this by recompiling.
LikeLike