Webinar: Getting Started with Spatial Data Analysis with R


I am glad to announce that I shall be presenting a live webinar with Domino Data Labs on February 24, 2016 from 11:00 – 11:30 AM PST: Getting Started with Spatial Data Analysis with R. If you are interested or know someone interested in learning how to manipulate spatial and spatial-temporal data with R, please send them along. Here is the abstract:

Tracking ggplot2 Extensions for Data Visualization



The purpose of this blog post is to inform R users of a website that I created to track and list ggplot2 extensions. The site is available at: http://ggplot2-exts.github.io. The purpose of this site is to help other R users easily find ggplot2 extensions that are coming in “fast and furious” from the R community.

If you have developed or intend on developing ggplot2 extensions, submit them so that other R users can easily find them. To do so, simply create an issue or a pull request on this github repository.

Incase you are wondering what ggplot2 extensions are; these are R packages that extend the functionality of the ggplot2 package by Dr. Hadley Wickham. This extensibility mechanism became available in ggplot2 version 2.0.0 released on December 17th, 2015.

Using Apache SparkR to Power Shiny Applications: Part I



The objective of this blog post is demonstrate how to use Apache SparkR to power Shiny applications. I have been curious about what the use cases for a “Shiny-SparkR” application would be and how to develop and deploy such an app.

SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets. SparkR also supports distributed machine learning using MLlib.

Shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R. Shiny helps you turn your analyses into interactive web applications without requiring HTML, CSS, or JavaScript knowledge.

Interactive Data Science with R in Apache Zeppelin Notebook



The objective of this blog post is to help you get started with Apache Zeppelin notebook for your R data science requirements. Zeppelin is a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with Scala(with Apache Spark), Python(with Apache Spark), SparkSQL, Hive, Markdown, Shell and more.



However, the latest official release, version 0.5.0, does not yet support the R programming language. Fortunately NFLabs, the company driving this open source project, pointed me to this pull request that provides an R Interpreter. An Interpreter is a plug-in which enables zeppelin users to use a specific language/data-processing-backend. For example to use scala code in Zeppelin, you need a spark interpreter. So, if you are impatient like I am for R-integration into Zeppelin, this tutorial will show you how to setup Zeppelin for use with R by building from source.

Launch Apache Spark on AWS EC2 and Initialize SparkR Using RStudio



In this blog post, we shall learn how to launch a Spark stand alone cluster on Amazon Web Services (AWS) Elastic Compute Cloud (EC2) for analysis of Big Data. This is a continuation from our previous blog, which showed us how to download Apache Spark and start SparkR locally on windows OS and RStudio.

We shall use Spark 1.5.1 (released on October 02, 2015) which has a spark-ec2 script that is used to install stand alone Spark on AWS EC2.  A nice feature about this spark-ec2 script is that it installs RStudio server as well. This means that you don’t need to install RStudio server separately. Thus you can start working with your data immediately after Spark is installed.

Installing and Starting SparkR Locally on Windows OS and RStudio


With the release of Apache Spark 1.4.1 on July 15th, 2015, I wanted to write a step-by-step guide to help new users get up and running with SparkR locally on a Windows machine using command shell and RStudio. SparkR provides an R frontend to Apache Spark and using Spark’s distributed computation engine allows R-Users to run large scale data analysis from the R shell. The steps listed here WILL also be documented in my upcoming online book titled “Getting Started with SparkR for Big Data Analysis” which can be accessed at: http://www.danielemaasit.com/getting-started-with-sparkr/. These steps will get you up and running in less than 5 mins.