Getting Started with R
This section provides a quick and brief, enough practical description on R to make anyone getting started and up-to-speed to work in R. R is basically a statistical software. Wikipedia describes R more completely as following:
R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. Polls, data mining surveys and studies of scholarly literature databases, show substantial increases in popularity in recent years. As of August 2018, R ranks 18th in the TIOBE index, a measure of popularity of programming languages.
Advantages and Disadvantages of R
Advantages of R Programming
- R is the most comprehensive statistical analysis package for statistics, and data analysis. It is also a leading tool for machine learning.
- R is an open source software. The installation of R is free, so we can use it without purchasing a license. Anyone can provide bug fixes, code enhancements, and new packages.
- R language is a platform independent so we can use it on any operating systems.
- R is good for business as it is an open source. R is great for visualization.
- R has far more capabilities as compared to earlier tools. For data-driven business, R is a nice tool for analysis works in data science.
- R has some statistical features:
- Basic Statistics : Mean, variance, median.
- Static graphics : Basic plots, graphic maps.
- Probability distributions : Beta, Binomial.
Disadvantages of R Programming
- The learning curve for R is too steep, although some R community members call it as a silly excuse. Nevertheless, it takes lot of effort to learn and remember the R codes, compared to other programming languages.
- Some people say in R, quality of some packages is less than perfect.
- R is a software application that many people devote their own time to developing. Hence in R, no one to complain, if something doesn’t work.
- R commands give little thought to memory management, and so R can consume all available memory.
- Capabilities such as security are not built into the R language. Hence, R cannot be embedded in a Web browser. Someone in the internet says, "It was basically impossible to use R as back-end server to do calculations because of its lack of security over the Web. The security issue, however, has been lessened by developments such as the use of virtual containers on the Amazon Web Services cloud platform."
R Console
There are two main ways of interacting with R: using the console or by using script files (plain text files that contain R code). The console window (in RStudio, the bottom left panel) is the place where R is waiting for you to tell it what to do, and where it will show the results of a command.
Sample code to get and set the working directory in R console:
> getwd()
[1] "C:/Users/vrowm/Documents"
> setwd("~/R")
> getwd()
[1] "C:/Users/vrowm/Documents/R"
>
Some basic but useful packages in R and their descriptions
Demos in package ‘base’:
error.catching More examples on catching and handling errors
is.things Explore some properties of R objects and
is.FOO() functions. Not for newbies!
recursion Using recursion for adaptive integration
scoping An illustration of lexical scoping.
Demos in package ‘graphics’:
Hershey Tables of the characters in the Hershey vector
fonts
Japanese Tables of the Japanese characters in the
Hershey vector fonts
graphics A show of some of R's graphics capabilities
image The image-like graphics builtins of R
persp Extended persp() examples
plotmath Examples of the use of mathematics annotation
Demos in package ‘grDevices’:
colors A show of R's predefined colors()
hclColors Exploration of hcl() space
Demos in package ‘stats’:
glm.vr Some glm() examples from V&R with several
predictors
lm.glm Some linear and generalized linear modelling
examples from `An Introduction to Statistical
Modelling' by Annette Dobson
nlm Nonlinear least-squares using nlm()
smooth `Visualize' steps in Tukey's smoothers
Use ‘demo(package = .packages(all.available = TRUE))’
to list the demos in all *available* packages.
RStudio
RStudio is an integrated development environment for R for statistical computing and graphics.
The R program file (.r file)
I have created and tested the following .r file to show the following:
- Working with data files, like .csv files
- Importing data from the data file like read, attach, etc.
- Summarising and analysing data - summary, slot, table, cor
- Statistical analysis like t test, anove, and regressions
Sample data from the csv file read in R as "mydata" in the code below
# Working in R
# Set working directory to the location where csv file is present
setwd("C:/Users/vrowm/Documents/R/MyRData")
getwd()
# Read the data
mydata<- read.csv("~/R/MyRData/intro_auto.csv")
attach(mydata)
# List the columns in the csv file
names(mydata)
# Show first few lines of data from the csv file
head(mydata)
mydata[1:10,]
mydata[1:20,]
# Descriptive statistics on the data in csv file
summary(mpg)
sd(mpg)
length(mpg)
summary(price)
sd(price)
mean(price)
min(price)
max(price)
# Sort the data
sort(make)
# Frequency tables
table(make)
table(make, foreign)
# Correlation among variables
cor(price, mpg)
# T-test for mean of one group
t.test(mpg, mu=20)
# ANOVA for equality of means for two groups
anova(lm(mpg ~ factor(foreign)))
# OLS (Ordinary Least Squares)regression:
# mpg (dependent variable) and weight, length and foreign (independent variables)
olsreg <- lm(mpg ~ weight + length + foreign)
summary(olsreg)
summary(lm(mpg ~ weight + length + foreign))
# Plotting data
plot (mpg ~ weight)
abline(olsreg1)
# Redefining variables
Y <- cbind(mpg)
X <- cbind(weight, length, foreign)
summary(Y)
summary(X)
olsreg <- lm(Y ~ X)
summary(olsreg)
# Install and use packages
install.packages("plm")
library(plm)