The R project consists of a programming language, an interactive development environment and a set of statistical computing libraries. R is an interpreted language and provides support for common constructs such as conditional execution (if) and loops (for, while, repeat) etc. R also includes extensive support for numerical computing, with data types for vectors, matrices, arrays and libraries for performing numerical operations. Data frames in R: In addition to numerical computing, R pro- vides support for structured data processing through data frames. Data frames are tabular data structures where each column consists of elements of a particular type (e.g., numerical or categori- cal). Data frames provide an easy syntax for filtering, summarizing data and packages like dplyr [31] have further simplified expressing complex data manipulation tasks on data frames. Specifically, dplyr provides a small number of verbs for data manipulation and these include relational operations like selection, projection, aggregations and joins. Given its popularity among users, the concept of data frames has been adopted by other languages like Pandas [21] for Python etc. Next, we look at some of the common workflows of data scientists who use R as their primary programming language and motivate our design for SparkR based on these workflows.