Learn with Sphere: Light Intro to R Programming Language

My name is Denis Shvedchenko, Data Engineer and Back End Developer at Sphere Partners. Most of my duties include extracting and transforming language type jobs. Recently, I started working with the R language. I found it’s a good time to learn R language deeply. As a result, this article is a short explanation of small caveats of the language that you are likely to be faced with at the beginning of learning the R language. Let’s start with some basics.

R Variables and Constants

Valid identifiers:

Total, Sum, .fine.with.dot,

This_is_acceptable, Number 5

Use “.” or camelCase over “_” preferably
Built-in Constants:

LETTERS : "A" "B" … "X" "Y" "Z"

letters : "a" "b" … "x" "y" "z"

pi : 3.141593

month.abb: "Jan" ... "Nov" "Dec"

month.name: names of month

R language is an interpreted language, used by statisticians and data scientists for data applications. Its popularity has declined somewhat with the rise of machine learning and the extensive data science libraries available for Python, however R enjoys a large and loyal following and is a great choice for many data related use cases.

It has a variety of functionalities. Let’s start with identifiers. You can see from the start that one of the unusual aspects of this language is the use of “.” (dot). The dot is a valid identifier. R also has built-in constants for letters, numbers, months abbreviations and more.

Operators

+, – , / , *
^ – exponent
%% – modulus
%/% – integer division
< , >, ==, != , <= , >=
!, &, &&, |, ||

x <- 1
y <- 2
x + y
[1] 3
x <- c(1,2,0,4)
y <- c(4,5)
x & y
[1] TRUE TRUE FALSE TRUE]
x && y
[1] TRUE

The operators are mostly what you would expect, except for, for example, the integer division part. A user can create an infix separator but all infix separators are surrounded by the percentage sign.

Creation of variables, performed with this <- sign is creation and assignment operator. The rest include examples of the usual summation, creation of vectors and operational vectors.

Operators on Vectors

x <- c(1,2,3)
y <- c(4,5,6)
x * y
[1] 4 10 12
x <- c(1,2,3,4)
y <- c(4,5)
x ^ y
[1] 1 32 81 1024
x <- c(1,2,3,4)
y <- c(4,5,3)
x ^ y
[1] 1 32 27 256

As it stands from the initial purpose of statistical analysis of data, vectors/lists are mostly used. For example, we can multiply vector by vector. In this case, per element. If vectors are different, then the smallest one will be repeated. But there is a requirement of the shortest to be fully included a number of times into the longest one. For example, let’s say you want to apply an operation between these two vectors:

x <- c(1,2,3,4)
y <- c(4,5,3)

Then, you would have a warning. In other words, when there is a mismatch in length (number of elements) of operand vectors, the elements in the shorter one are recycled in a cyclic manner to match the length of the longer one. If the longer vector is not integral, multiplying by the shorter R will raise a warning. In x^y : longer object length is not a multiple of shorter object length.

Assignment operators

The operators <- and = can be used, almost interchangeably, to assign to a variable in the same environment. The <<- operator is used for assigning to variables in the parent environments (more like global assignments). The rightward assignments, although available, are rarely used.

In function call, use only `=` for the named parameter.

Function Calls with named parameter

spjoin <- function(a,b) print (paste(a,b))

```
spjoin( a=1, b=2 )  : “1 2”
```
```
spjoin(2,3) : “2 3”
```
```
spjoin(b=4,a=2) “2 4”
```

spjoin(b<-3, a=1) “1 3”  # first named assigned from a = 1, then rest assigned to result of b<-3 , and

b : “3” # we created variable in calling context

```
spjoin(b<-4,2) : “4 2”
```

If else

“If else” function is almost like in C, but the statement has a result of last statement in branch.

> x <- -5
y <- if(x > 0) {5;4} else {6;7}
> y
[1] 6 7
Ifelse function
ifelse(statement, true branch, false branch)

The last statement from each branch can be assigned as a result.

Looping

As for looping, “for” has the structure as seen above. R also includes a “while” structure, similar to C, except instead of “continue”, it has “next”. It also includes unconditional loop and repeat.

Structures

There are three data structures we’ll go through:

Vector
List
DataFrame

Vector

> x <- c(1, 5, 4, 9, 0)
> typeof(x)
[1] “double”
> length(x)
[1] 5
> x <- c(1, 5.4, TRUE, “hello”)
> x
[1] “1” “5.4” “TRUE” “hello”
> typeof(x)
[1] “character”

The vector is a basic data structure in R. It contains an element of the same type. The data types can be logical, integer, double, character, complex or raw.

A vector’s type can be checked with the typeof() function. Another important property of a vector is its length. This is the number of elements in the vector and can be checked with the function length().

Since a vector must have elements of the same type, this function will try to coerce elements to the same type if they are different.

Coercion is from lower to higher types from logical to integer to double to character.

> x <- c(1, 5, 4, 9, 0)
> typeof(x)
[1] “double”
> length(x)
[1] 5
> x <- c(1, 5.4, TRUE, “hello”)
> x
[1] “1” “5.4” “TRUE” “hello”
> typeof(x)
[1] “character”

To access an element of a vector, the usual method is square brackets. You can also access elements of a vector by using another vector, by providing the vector of indexes which you want to get.

Modify

> x <- c(-7, -6, -4,  0,  4,  6); x

[1] -7 -6 -4  0  4  6

> x[3] <- 8; x        # modify 3nd element

[1] -7 -6 8  0  4  6

> x[x<4] <- 3; x   # modify elements less than 4

[1] 3 3 8  3  4  6

> x <- x[1:3]; x      # truncate x to first 3 elements

[1] 3 3 8

We can delete a vector by simply assigning a NULL to it.

> x <- c(-7, -6, -4,  0,  4,  6); x

[1] -7 -6 -4  0  4  6

> x <- NULL

> x

NULL

> x[2]

NULL

To modify a vector, you set which item you want to modify and it will be modified. Also, you can provide a set of items by condition. The original vector can also be modified.

List

A list can be created using the list() function:

> x <- list("a" = 45.1, "b" = TRUE, "c" = 4:9)

examined with the str() function.
> str(x)

List of 3

 $ a: num 45.1

 $ b: logi TRUE

 $ c: int [1:6] 4 5 6 7 8 9

Lists can be accessed in a similar fashion to vectors. Integer, logical or character vectors can be used for indexing.

x$a

[1] 45.1

x$b

[1] TRUE

> x[c('a','b')]

$a

[1] 45.1

$b

[1] TRUE

Tags are optional. We can create the same list without the tags as follows. In such a scenario, numeric indices are used by default.
Indexing with [ as shown above will give us sublist not the content inside the component. To retrieve the content, we need to use [[.
However, this approach will allow us to access only a single component at a time.

Access: x$name the same as x[[“name”]]

X[“name”] returns pair of name and value

x[[“name”]] <- “Keyt” # Modify

x[[“married”]] <- FALSE # add new

x[[“age”]] <- NULL # remove

Dataframe

The data frame is a two dimensional data structure in R. It is a special case of a list which has each component of equal length.

Each component forms the column and contents of the component form the rows.

names(df); nrow(df); ncol(df) ⇔ length(df)

Create: >

x <- data.frame(“SN” = 1:2, “Age” = c(21,15), “Name” = c(“John”,”Dora”))
> str(x) # structure of x
‘data.frame’: 2 obs. of 3 variables:
$ SN : int 1 2
$ Age : num 21 15
$ Name: Factor w/ 2 levels “Dora”,”John”: 2 1

Notice above that the third column, Name is of type factor, instead of a character vector. By default, data.frame() function converts character a vector into a factor. Many data input functions of R like, read .table(), read.csv(), read.delim(), read.fwf() also read data into a data frame.

We can use either [, [[ or $ operator to access columns of a data frame.

> x["Name"]

Name

1 John

2 Dora

> x$Name

[1] "John" "Dora"

> x[["Name"]]

[1] "John" "Dora"

> x[[3]]

[1] "John" "Dora"

Functions

Syntax
Calling
Return
Documentation

func_name <- function (argument) {

statement

}

Syntax of return()

return(expression)

Functions without return()

If there are no explicit returns from a function, the value of the last evaluated expression is returned automatically in R.

How to call a function?

Named Arguments

pow <- function (x,y) {

x^y

}

pow(8,2), the formal arguments x and y are assigned 8 and 2 respectively.

In the above function calls, the argument matching of formal arguments to the actual arguments takes place in positional order. This means that, in the call pow(8,2), the formal arguments x and y are assigned 8 and 2 respectively.

Call with vectors

When a vector is supplied as a function parameter, which awaits a single value:

pow <- function(x,y) {

print(“enter”)

x^y

}

> pow (x = c(4,5,6), y = 2)

[1] “enter”

[1] 16 25 36

> pow (x = c(4,5,6,3), y = c(2,3))

[1] “enter”

[1] 16 125 36 27

Function docs

Documentation for functions (Roxygen)

You add roxygen comments to your source file.
roxygen2::roxygenise() converts roxygen comments to .Rd files.
R converts .Rd files to human readable documentation.

#’ sum two numbers

#’

#’ @param a A number

#’ @param b A number

#’ @return The sum of \code{a} and \code{b}

#’ @examples

#’ summ(1, 1)

#’ summ(10, 1)

summ <- function(a, b) {

a + b

}

TestThat, UseThis, Pipe (from magritr)

Install packages

install.packages(“testthat”)

install.packages(“usethis”)

Create test: usethis::use_test(“test_show”)

Links:

Chapter 12 Testing

https://testthat.r-lib.org/

If you want to see me present the Python Operator and provide more context on all the above, check out the video of the full presentation at the top of this blog post! I dug into Python Operator in the last 5 minutes.

I hope you found this light introduction to R programming language informative. The R language has an abundance of possibilities for use and an enormous amount of libraries, many of which require deep statistics knowledge. So I advise you to find some time to take some courses on the R language and dig deeper. R has great testing capabilities including interoperability with Python, can operate in the cloud, and works with most databases.