Introduction to R software

Introduction to R software

R is a free and open source programming language widely used by statisticians and analysts for representation, interpretation and many other purposes.

  • Setting up of R-Environment

For windows we need to download R-3.6.1(32/64 bit) from official site of R : https://cran.r-project.org . For our convenience we install R-studio (which is much user friendly) from the official site of R-studio https://www.rstudio.com.

  • Installing Packages in R:

Few packages are by default installed in R but few are required to e installed by the user. Most additional packages are available in “CRAN”. From there we can install the packages required by the code:

install.packages(“package name”)

Like for installing the package of “vioplot
[ used to create box plots and density plots jointly called violin plots] we use the code:

install.packages(vioplot”)

In R studio there is an option under “Tools” section where we can load the required package in the section “Install packages…”


To check whether a package is installed or not we need the code:

any(grepl("package name",installed.packages()))
					
  • Loading a package in R:

After installing the package in our personal computer we need to load it for use with the help f the library() function. Code:

library(“package name”)

  • Naming variable in R:

The variables names in R must start with alphabet (no special character and numbers are allowed). It can start with dot (.) but immediately, after it number must not follow. Like “.3A2” is invalid name.

Like: A2_i , K3.S

Only dot (.) and underscore (_) are allowed. No other special characters like $,%etc.

  • Operators in R:

    An operator is a symbol that tells the compiler to perform specific tasks.


Assignment operators:

These operators are used to assign values to variables.

Operator Description
<-, <<-, = Leftwards assignment
->, ->> Rightwards assignment

Example:

x <- 50
y = 60
10 -> z
print (x)
[1] 50 print(y)
[1] 60
print(z)
[1] 10

Mathematical operators:

Operator Description
+ Addition
Subtraction
* Multiplication
/ Division
^ Exponent
%% Modulus (Remainder from division)
%/% It gives us the quotient of the division
Example:
x=5
y=10
x+y
[1] 15 x*y
[1] 50 x/y
[1] 0.5 x-y
[1] -5 y%%x
[1] 0 y%/%x
[1] 2 y^x
[1] 1e+05

Logical operator:

Operator Description
! Logical NOT. Takes each element of a vector and gives the opposite logical value (Boolean value)
& Element-wise logical AND. Combines each element of first vector with corresponding element of second vector and gives a output TRUE if both the elements are TRUE.
| Element-wise Logical OR operator. It combines each element of the first vector with the corresponding element of the second vector and gives a output TRUE if one the elements is TRUE.
&& Logical AND. Takes first element of both the vectors and gives the TRUE only if both are TRUE.
|| Logical OR operator. Takes first element of both the vectors and gives the TRUE if one of them is TRUE.

Example: Zero is considered FALSE and non-zero numbers are taken as TRUE (Boolean value). To understand this example please refer to the vector part of this article.

x=c(1,TRUE,0,5)
y=c(5,FALSE,0,5)
!x
[1] FALSE FALSE TRUE FALSE x&y
[1] TRUE FALSE FALSE TRUE x|y
[1] TRUE TRUE FALSE TRUE x&&y
[1] TRUE x||y
[1] TRUE

Relational operators:

Relational operators are used to compare between values and returns the result as a Boolean value. Few commonly used relational operators are:

Operator Description
< Checks if the 1st value is Less than the 2nd value.
> Checks if the 1st value is Greater than the 2nd value.
<= Checks if the 1st value is Less than or equal to the 2nd value.
>= Checks if the 1st value is Greater than or equal to the 2nd value.
== Checks if the 1st value is Equal to the 2nd value.
!= Checks if the 1st value is Not equal to the 2nd value.

Example:

x=15
y=5
 x<y
[1] FALSE x>y
[1] TRUE x=y
x>=y
[1] TRUE x<=y
[1] TRUE x!=y
[1] FALSE

Miscellaneous Operators

operator Description Example

:

Colon operator creates the series of numbers in sequence for a vector.

x=1:8
print(x)
[1] 1 2 3 4 5 6 7 8

%in%

This operator is used to check whether an element belongs to a vector.

a=5
b=10
x=1:8
print(b %in% x)
[1] FALSE
print(a %in% x)
[1] TRUE

%*%

Matrix multiplication
A= matrix(c(1,5,8,9), nrow=2, ncol=2, byrow=TRUE)

B= matrix(1:4,nrow=2, ncol=2, byrow= TRUE)

A%*%B

[,1] [,2]

[1,] 16 22

[2,] 35 52

  • Functions in R:

function, in a programming environment, is a set of instructions designed to accomplish a particular task. A R programmer is able to pass the control to the function, along with arguments that may be necessary to accomplish the actions. The function in turn performs its task and returns control to the interpreter as well as any result to be stored in other objects.


Below is the list of some commonly used pre-defined functions: (package: base)

Function name Used for Default syntax Code & Examples
print() or cat() Getting output or printing objects
print(x, ...)

a=2.86
print(a)
[1] 2.86
paste() Concatenation of vectors
paste(..., sep = " ", collapse = NULL)

a='Good'
b="Morning"
print(paste(a,b))
[1] "Good Morning"
c() Used to input vectors.
c(..., recursive = FALSE, use.names = TRUE)

y=c(1,2,5,6)
print(y)
[1] 1 2 5 6
length() To know the length of a vector
length(x)

length(y)
[1] 4
seq() Creates a sequence of numbers
seq(...)

print(seq(1,8))
[1] 1 2 3 4 5 6 7 8
mean() To calculate the mean or average
mean(x, ...)

y=c(1,2,5,6)
mean(y)
[1] 3.5
max()

min()

To find the maximum and minimum among the values
max(..., na.rm = FALSE)
min(..., na.rm = FALSE)

y=c(1,2,5,6)

max(y)
[1] 6

min(y)
[1] 1
sum() To find the sum of the values

y=c(1,2,5,6)
 sum(y)
[1] 14

There are many other pre-defined functions which we will learn in the course of time.

User defined functions:

An R function is created by using the keyword function. The basic syntax of an R function definition is as follows –

function_name = function(arg_1, arg_2, …) {

Function body }

Examples:

F=function(a=2,b=3,c=5)

+ {

+ result=a*b+c

+ print(result)

+ }

F(2,6,8)

[1] 20

F()

[1] 11

  • Strings:

Anything that is written within a pair of single quote or double quotes in R is a string. R by default stores every string within double quotes, even if they are entered within single quotations. Like,

string = "Hi"
print(string)
[1] "Hi"                 #it is the output

The strings cannot not be entered within a double quote and a single quote.

  • Types of Data in R:

The Most commonly used types of data is:

  • Vectors
  • Lists
  • Matrices
  • Arrays
  1. Vectors:

There are many sub-types of Vector objects commonly used in R:

Data Type Example
Logical TRUE, FALSE
Numeric 15.7, 7, 68
Integer 0L, 15L, 8L
Complex 5 + 6i
Character (strings) ‘k’ , ‘”big”, “TRUE”, ‘25.8’

Creating Vector in R:

 x=c(5,6,7)    
				
print(x)
[1] 5 6 7
print(class(x))  
				 [1] "numeric"
  1. Lists:

A list is an R-object which can contain many different types of elements inside it like vectors, functions and even another list inside it.

Creating List in R:

 L1 = list(75.5,c(3,5,4),cos)        #it is a list consisting of numeric, vector, function
print(L1)
[[1]] [1] 75.5 [[2]] [1] 3 5 4 [[3]]
function (x)  .Primitive("cos")
print(class(L1))    
[1] "list"
  1. Matrix;

A matrix is a two-dimensional rectangular data set.

Creating matrix in R:

M = matrix( c(1,2,5,89,8,6), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
 [,1] [,2] [,3]        #A matrix with order 2X3 [1,] 1 2 5 [2,] 89 8 6

Source Code:

matrix(data = NA, nrow = n, ncol = k, byrow = FALSE, dimnames = NULL)

  • nrow
the desired number of rows.
  • ncol
the desired number of columns.
  • byrow
If FALSE (the default) the matrix is filled by columns, otherwise the matrix is filled by rows.

The other frequently used data types are arrays, factors, data frames etc.

Loops in R:

  • For loop:

Syntax:

for (value in sequence/vector)

{

statement

}

Examples:

(i)

 colors= c('red','blue','green','yellow')
for(i in colors) {
+  print(i)
+ }
[1] "red" [1] "blue" [1] "green" [1] "yellow"

(ii)

 v=LETTERS[5:10]
 for (i in v) {
print(i)    
+ }
[1] "E"
[1] "F"
[1] "G"
[1] "H"
[1] "I"
[1] "J"
  • If Loop:

Syntax:

if(boolean_expression) {

// statement(s) will execute if the boolean expression is true.

}

Examples:

x=8
if(x>0) {
+ print('x is a positive value')}
[1] "x is a positive value"
  • If else loop:

Syntax:

if (test_expression) {

statement1

} else {

statement2

}

Example:

 x= - 8
if(x>0){print("x is a positive value")} else{print('negative value')}
[1] "negative value"

Nested If Else loop:

Syntax:

if ( test_expression1) {

statement1

} else if ( test_expression2) {

statement2

} else if ( test_expression3) {

statement3

} else {

statement4

}

Example

x=0
if(x>0){print('x is positive')}else if(x<0){print('x is negative')}else {print('x is zero')}
[1] "x is zero"

While loop:

Syntax

while (test_expression)

{

statement

}

Example:

i=1
while (i<8) { 
+  print(i)
+  i=i+1
+ }
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7

Importing Data into R:

At first, we have to save the file in .csv format if it is a text file or excel file save it as a csv file in the present in current working directory so that R can read it.

To know the current working directory of R workspac, we can use the function getwd() and to change the current working directory we can use the function setwd(“path name”)

getwd()

[1] “C:/Users/admin/OneDrive/Documents”

setwd(“C:/Users/admin/OneDrive/Documents/R”)

 getwd()

[1] “C:/Users/admin/OneDrive/Documents/R”

Now we can import the datafile into R workspace by using the following function read.csv(“file name”). Suppose you want to import a file named as: “Book1.csv”, then R code for it will be:

data= read.csv(“Book1.csv”)

print(“data”)

[1] “data”

print(data)

X Y

1 1 7

2 2 7

3 3 7

4 4 7

5 5 7

6 6 7

7 21 42

By default the read.csv() function gives the output as a data frame. Now we can find the maximum and minimum value of a column by using the function max(), min()

 max(data$Y)

[1] 42

 min(data$X)

[1] 1

Some examples:

 max(data$Y)

[1] 42

 min(data$X)

[1] 1

 subset(data, Y==max(Y))

X Y

7 21 42

 subset(data,x>=3)

X Y

1 1 7

2 2 7

3 3 7

4 4 7

5 5 7

6 6 7

7 21 42

subset(data, X>=3 & Y==42)

X Y

7 21 42

For Excel file We can use the following methods:

 library(readxl)

 Book1 <- read_excel(“Book1.xlsx”)

 View(Book1)

 data.frame(Book1)        #converts the data into data frame

X Y

1 1 7

2 2 7

3 3 7

4 4 7

5 5 7

6 6 7

7 21 42

Mathematica-City

Mathematica-city is an online Education forum for Science students run by Kounteyo, Shreyansh and Souvik. We aim to provide articles related to Actuarial Science, Data Science, Statistics, Mathematics and their applications using different Statistical Software. We also provide freelancing services on the aforementioned topics. Feel free to reach out to us for any kind of discussion on any of the related topics.

2 thoughts on “Introduction to R software

  1. I as well as my friends were found to be digesting the great points found on the website while instantly got a terrible suspicion I never expressed respect to the web blog owner for those secrets. My men were so very interested to learn all of them and have now without a doubt been having fun with these things. I appreciate you for actually being so considerate and for considering some really good areas most people are really eager to be aware of. Our own sincere apologies for not saying thanks to sooner.

  2. I simply had to thank you so much all over again. I am not sure what I could possibly have created in the absence of these secrets revealed by you about that theme. It was actually a traumatic scenario for me personally, however , discovering a new skilled manner you processed the issue made me to weep over joy. I am just happy for your guidance and sincerely hope you find out what a powerful job your are undertaking instructing the others with the aid of your site. I am certain you’ve never met all of us.

Comments are closed.