An Introduction to ‘R’ Data Analytics Framework and Programming Language

Posted by

For all those who are willing to pursue their career in Data Visualization and Analytics, ‘R’ is a fantastic platform to start learning with Data science.  Just for a quick recap, ‘R’ is both a programming language and interactive environment for statistics.

‘R’ has a text-based script and command line. It’s a simple process where a user just types in command in R-language and hits ‘enter’ on the keyboard to execute the command.

In addition to the console, RStudio provides panels containing:

• A text editor, where R commands can be recorded for future reference.

• A history of commands that have been typed on the console.

• An “environment” pane with a list of variables, which contain values that R has been told to save from previous commands.

• A file manager.

• Help on the functions available in R.

• A panel to show plots (graphs)

For getting started with ‘R’, you need to download the ‘R’ programming language and then install the IDE to encode it, which is ‘R-Studio’.

You can download the R-Language from  by selecting any CRAN mirror of your choice.  R-language is available for Microsoft Windows/Linux/MacOS platforms

Snap of R-programming language download from the website

Secondly, you have to download the IDE: ‘R-studio’ from

Download and open the R-Studio, the layout should look similar to the image shown below.


Image result for Image for R-studio
Overview of R-Studio

More Books on R programming language and R-Studio:

  1. “R for Data Science”2 by Garrett Grolemund and Hadley Wickham is a good modern introduction to R, and can be read online. This covers use of a collecition of packages called the Tidyverse3. The dplyr4 package is of particular importance.
  2. Hadley Wickham5 also has several excellent books covering specific topics online. See “The R Book” by Michael J. Crawley for general reference.
  3. “Modern Applied Statistics with S” by W.N. Venable and B.D. Ripley is a well respected reference covering R and its predecessor S.
  4. “Linear Models with R” and “Extending the Linear Model with R” by Julian J. Faraway cover linear models, with many practical examples. Linear models, and the linear model formula syntax ~, are core to much of what R has to offer statistically. Many statistical techniques take linear models as their starting point, including limma for differential gene expression, glm for logistic regression (etc), survival analysis with coxph, and mixed models to characterize variation within populations.

Few more books : (Click on the titles for free downloads)

Cheat sheets

  • RStudio’s collection of cheat sheets6 cover newer packages in R.
  • An old-school cheat sheet7 for dinosaurs and people wishing to go deeper.
  • Bioconductor cheat sheet8

More packages

• CRAN9 has hundreds of contributed packages which can be installed with install.packages.
• Bioconductor10 is another huge collection of packages with a biological focus.

Life outside R

Not all data analysis is done in R. The Software Carpentry workshops give a broader introduction to
computing in science.
• Software Carpentry11

Q&A sites

Stackoverflow-style sites are great for getting help:
• support.bioconductor.org12 for bioconductor related questions.
• biostars.org13 for general bioinformatics questions.
• stats.stackexchange.com14 for statistics questions.
• stackoverflow.com15 for general programming questions.







Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s