The objective of this blog post is demonstrate how to use Apache SparkR to power Shiny applications. I have been curious about what the use cases for a “Shiny-SparkR” application would be and how to develop and deploy such an app.
SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets. SparkR also supports distributed machine learning using MLlib.
The objective of this blog post is to help you get started with Apache Zeppelin notebook for your R data science requirements. Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with Scala(with Apache Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown, Shell and more.
However, the latest official release, version 0.5.0, does not yet support the R programming language. Fortunately NFLabs, the company driving this open source project, pointed me to this pull request that provides an R Interpreter. An Interpreter is a plug-in which enables zeppelin users to use a specific language/data-processing-backend. For example to use scala code in Zeppelin, you need a
spark interpreter. So, if you are impatient like I am for R-integration into Zeppelin, this tutorial will show you how to setup Zeppelin for use with R by building from source.