Basics of R- software
In this article we will go through some basic applications of R-software with the help of some examples. Later on we will get to know some commonly used R functions.
Contents of the article-
- Identifying, extracting and removing duplicates
- Sorting
- Basic functions such as sum, range, date, time in R-software
Identifying and removing duplicates in R software
Duplicates are the exact repeat of an element in a data set more than once. We can easily identify the duplicates in R and remove them and hence obtain the unique elements in our dataset.
Example 1:
Suppose we have a have vector of names and we want to identify and remove duplicates from it.
Code-
names<-c(“a”,”b”,”c”,”d”,”a”,”a”,”b”) ## creating a vector of names
duplicated(names)
## this code will return Boolean values to us for the vector “names”. It will return TRUE if the element is duplicated and FALSE otherwise.
Output
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE |
##the first four names are unique and hence the code has returned the value FALSE for them. We can see that the 5th, 6th and 7th elements are repeated in the vector “names” and hence the code is returning us the Boolean value TRUE for them indicating that they are duplicates.
This code will also work on a vector of numeric type.
Suppose we want to extract the duplicate elements.
Code & Output
names[duplicated(names)] [1] “a” “a” “b”
##we can see all the duplicated values have been extracted.
Suppose we want to extract all the unique names from our vector.
Code & Output
names[!duplicated(names)]
[1] “a” “b” “c” “d”## we can see that the code has extracted all the unique values from our vector names.
Similarly we can remove duplicates based on columns from a data frame with multiple columns.
Example 2:
Suppose you have a dataset of name, age and genders and you want to identify, extract
and remove all the duplicates from the column gender of the dataframe
Code-
name<-c(“manvir”,”yash”,”sandra”,”rohit”,”raj”)
age<-c(22,21,22,61,18)
gender<-c(“m”,”f”,”f”,”m”,”m”)
b<-data.frame(name,age,gender)
## these codes will create a data frame with three columns and store it in the variable b.
Identifying the duplicates in column “gender”of the data frame
Note that the function duplicates works row wise only.
Code and Output-
duplicated(b$gender)
[1] FALSE FALSE TRUE TRUE TRUE |
Extracting the duplicates from the data frame on the basis of gender
Code & Output-
b[duplicated(b$gender),]
name age gender
3 sandra 22 f
4 rohit 61 m
5 raj 18 m
## we can see that all the rows with duplicates on the basis of gender has been extracted
Removing the duplicates from the data frame on the basis of gender
Code & Output-
b[!duplicated(b$gender),]
name age gender 1 manvir 22 m 2 yash 21 f
## we can see that all the duplicates have been removed and unique values have been extracted
Using similar techniques, one can easily identify, extract and remove duplicate values from their data set.
Sorting in R-software
Sorting means arranging the data in either ascending or descending order. It is done by using the function sort() . It can work on both numeric elements and character elements. Below are few examples-
Suppose we want to sort a vector in ascending order
Code & Output
age<-c(22,21,32,61,18) ## creating a numeric vector which we are going to sort in ascending order.
sort(age, decreasing = FALSE)
##since we want to sort in ascending order so we have set the logical value of argument decreasing as FALSE.
[1] 18 21 22 32 61## we can see that the vector has been arranged in ascending order
Suppose we want to sort a vector in descending order
Code & Output
name<-c(“a”,”y”,”s”,”r”,”j”) ##creating a vector of character elements
sort(name,decreasing = TRUE) ## sorting the data in descending order, that is, from z to a
[1] “y” “s” “r” “j” “a”## output of the code , we can see is in descending order alphabetically
Some other basic functions in R
age<-c(22,21,32,61,18) ## creating a vector
sum(age) ## this code will give us the sum of all the rows of the vector age
max(age) ## this code will give us the maximum element of the vector age
min(age) ## this code will give us the minimum element of the vector age
mean(age) ## this code will give us the mean of the vector age
median(age) ## this code will give us the median of the vector age
mode(age) ## this code will give us the mode of the vector age
cumsum(age) ## this code will give you a row-wise cumulative sum of ages
var(age) ## this code will give us the variance.
sd(age) ##this code will give us the standard deviation
Sys.Date() ## this code will return today’s date in R-software
Sys.time() ## this code will return today’ date and current time in R-software
names(dataset) ## this code will return the name/header of all the columns of a dataset
range(dataset) ## this code will give you the minimum and maximum value of a dataset
I want to show thanks to this writer for bailing me out of this particular challenge. Because of browsing through the the net and seeing basics which were not helpful, I was thinking my life was gone. Being alive minus the approaches to the problems you have resolved by way of your report is a crucial case, and those which may have negatively affected my career if I hadn’t come across your site. Your own knowledge and kindness in taking care of a lot of stuff was tremendous. I don’t know what I would’ve done if I had not discovered such a subject like this. It’s possible to now relish my future. Thanks for your time so much for this skilled and effective help. I will not hesitate to propose your site to any person who should have guidance about this situation.