Launch Apache Spark on AWS EC2 and Initialize SparkR Using RStudio

sparkr-ec2

Introduction

In this blog post, we shall learn how to launch a Spark stand alone cluster on Amazon Web Services (AWS) Elastic Compute Cloud (EC2) for analysis of Big Data. This is a continuation from our previous blog, which showed us how to download Apache Spark and start SparkR locally on windows OS and RStudio.

We shall use Spark 1.5.1 (released on October 02, 2015) which has a spark-ec2 script that is used to install stand alone Spark on AWS EC2.  A nice feature about this spark-ec2 script is that it installs RStudio server as well. This means that you don’t need to install RStudio server separately. Thus you can start working with your data immediately after Spark is installed.