General Programming Crash Course

Optional Class: Uncovered Concepts of h2l2c

Author

Justin Landis

The goal of this document is to provide a brief overview of some topics that are pervasive in most programming languages. This will aim to give topical understanding of logical operators, if statements, loops, and classes.

Important

For more formal and extensive details please see the Advanced R Book.

logical operators

We have talked about the base types in Lectures 1 and 2. As a reminder:

# character types wrapped in quotes ""
character_obj <- "hello world"
# integer type (numbers followed by L)
integer_obj <- 1L
# double type
numeric_obj <- 1
# logical type
logical_obj <- c(TRUE, FALSE)

In this section we will discuss how to logical operators. This will lead into our next section on conditionals.

Logical operators are operators that compare two vectors in some way and will return a logical vector. These are convenient for filtering data according to what you want, as you can use the returned logical vector to index another!

AND operator &

This operator can be used between logical and numeric vector types. Suppose that you had two vectors assigned to x and y. You would use & like so: x & y

This operator will return TRUE when both x AND y are TRUE.

all possible combinations between TRUE and FALSE values.
x value y value Outcome of x & y
TRUE TRUE TRUE
TRUE FALSE FALSE
FALSE TRUE FALSE
FALSE FALSE FALSE
Note

For numeric vectors, any value equal to 0 is equivalent to FALSE and otherwise considered TRUE.

OR operator |

This operator can also be used between logical and numeric vector types. It will be used similar to &, for example x | y.

This operator will return TRUE when either x OR y are TRUE.

all possible combinations between TRUE and FALSE values.
x value y value Outcome of x | y
TRUE TRUE TRUE
TRUE FALSE TRUE
FALSE TRUE TRUE
FALSE FALSE FALSE
Note

Similar to before, you can also use | with numeric types as well.

Not operator !

This operator is a convenient way to negate a logical expression. Unlike before, where & and | require two variables to be used, this operator is used on a single variable.

x value Outcome of !x
TRUE FALSE
FALSE TRUE

Sometimes using ! with a single variable is sufficient for our needs, but we can do a lot more by combining other logical operators together.

x value y value Outcome of !(x & y) Outcome of !(x | y)
TRUE TRUE FALSE FALSE
TRUE FALSE TRUE FALSE
FALSE TRUE TRUE FALSE
FALSE FALSE TRUE TRUE

Equalities and Inequalities

list of equalities/inequalities and how to use them
Plain English Operator Example Object Types
Is x equal to y == x == y All types
Is x not equal to y != x != y All types
Is x greater than y > x > y numeric, logical
Is x greater than or equal to y >= x >= y numeric, logical
Is x less than y < x < y numeric, logical
Is x less than or equal to y <= x <= y numeric, logical

Doing more with ()

Just like in algebra, the () are used to prioritize execution of expressions. Suppose we had three logical vectors, x, y, and z.

The expression x & (y | z) will evaluate y | z before comparing x with its result.

x y z (y | z) x & (y | z)
TRUE TRUE TRUE TRUE TRUE
FALSE TRUE TRUE TRUE FALSE
TRUE FALSE TRUE TRUE TRUE
FALSE FALSE TRUE TRUE FALSE
TRUE TRUE FALSE TRUE TRUE
FALSE TRUE FALSE TRUE FALSE
TRUE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE

Other Notable R functions

  • all(x) returns TRUE if all values of x are TRUE

  • any(x) returns TRUE if at least one value of x is TRUE

  • which(x) returns the indices of x that were TRUE

  • x %in% y returns a logical vector the length of x, with TRUE values for indices that exist in y and FLASE otherwise

if statements

Sometimes we want to execute code only when a certain condition is met. Here, we may have a result of some logical operators and then do something special.

# imagine `condition` was some result
# from a prior analysis
condition <- TRUE

if (condition) {
  print("Condition was TRUE!")
}
[1] "Condition was TRUE!"
if (!condition) {
  print("Condition was not TRUE")
}

Sometimes we know that we want to run code if condition is TRUE and another chunk if condition is FALSE. In this case, we would use an if else statement.

# imagine `condition` was some result
# from a prior analysis
condition <- FALSE

if (condition) {
  print("Condition was TRUE!")
} else {
  print("Condition was not TRUE")
}
[1] "Condition was not TRUE"

We can also test multiple conditions with if else if statements!

condition1 <- FALSE
condition2 <- TRUE

if (condition1) {
  print("Condition 1 was TRUE!")
} else if (condition2) {
  print("Condition 2 was TRUE")
} else {
  print("No condition was TRUE")
}
[1] "Condition 2 was TRUE"
Note

You can chain multiple if else if statements together, but doing so may lead some hard to read code.

Loops

for loop

The for loop has special syntax and works with any vector like object that has non-zero length.

#.for syntax
#.^     iterator placeholder
#.|     ^     syntax of the loop
#.|     |     ^     input vector
#.|     |     |     ^
#.|     |     |     |
for (element in x_vector) {
  # Do something with `x`
  # ... 
}

while loop

The while loop allows us to evaluate some code provided some condition is TRUE.

#. `while` syntax
#.  ^      some expression that evaluates to 
#.  |      `TRUE` or `FALSE`
#.  |      ^
#.  |      |
while (condition) {
  # Do something here
  # ...
}

repeat loop

repeat is special syntax to repeat the next expression with no specified stops. This is different than for loops which will stop after the last element of the input vector is reached, and in while where the associated expression only evaluates when the condition is TRUE.

To prevent the next bit of code from running forever, I have included the control flow keyword break within an if statement.

count <- 0L
start <- Sys.time()
repeat {
  count <- count + 1L
  time_diff <- Sys.time() - start
  # if we reach a time diff greater 
  # than a second, break out of loop
  if (time_diff > 1) break
}
cat("`repeat` ran ", count, " times in a second")
`repeat` ran  38581  times in a second

Control flow

Technically, everything we have talked about is control flow (see ?Control). But more specifically, we have two keywords that help control the flow of execution within for, while, and repeat

next

The next keyword will skip the rest of the code block and start the next iteration of the loop.

for (i in 1:10) {
  if (i != 5L) next
  print("We found the 5")
}
[1] "We found the 5"

break

The break keyword will stop the execution of the code block.

for (i in 1:10) {
  print(i)
  if (i == 5L) break
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Classes

We have talked about object “types” before and have loosely related them to object “classes”. These are technically distinct attributes of every R object.

Every object’s class can be modified by the user at any time.

obj <- 1:10
class(obj)
[1] "integer"
typeof(obj)
[1] "integer"
class(obj) <- "foo"
obj
 [1]  1  2  3  4  5  6  7  8  9 10
attr(,"class")
[1] "foo"
typeof(obj)
[1] "integer"

Changing the class of your objects is the first step in creating custom behaviors for your code through R’s various class systems. Creating custom behaviors for our custom class are handled by creating methods on generic functions.

Beyond custom methods, a custom class may also carry additional data in the object’s attributes(). A class with consistent structure will lead to more structured analysis and execution.

R has (at the moment) about 5 class systems to choose from. We do not have time to discuss all of them in detail. Instead I will discuss the most pervasive class system S3. For the other class systems, I will briefly talk about where they are used and some important key features, but I will not show how to construct a new class or define methods for them.

S3

S3 is the simplest class system in R, and much of base R is built from this design style.

As we saw before, we can create a new class by passing in a character vector to class().

obj <- list()
class(obj) <- "foo"
obj
list()
attr(,"class")
[1] "foo"

generic functions

S3 defines methods for generic functions through a specific naming convention (<generic function>.<class name>). We have been interacting with many generic functions in R already. One of them is the print() generic function.

print
function (x, ...) 
UseMethod("print")
<bytecode: 0x557b6ad08fc8>
<environment: namespace:base>

We can see that the only expression in the body of print() is UseMethod("print"). This is what causes R to dispatch to print. We can make our own print method for our custom class "foo".

print.foo <- function(x) {
  cat("<My custom class `foo`>\n")
}
obj
<My custom class `foo`>

We can also create our own function too!

my_generic <- function(x) {
  UseMethod("my_generic")
}
my_generic.foo <- function(x) {
  print("my_generic: foo method")
}
my_generic(obj)
[1] "my_generic: foo method"

If we try to call my_generic() on some other object that isn’t "foo" we will get an error

my_generic(list())
Error in UseMethod("my_generic"): no applicable method for 'my_generic' applied to an object of class "list"

If we want my_generic() to work with any object, we need to create a default method.

my_generic.default <- function(x) {
  print("my_generic: default method")
}
my_generic(list())
[1] "my_generic: default method"
A note on method dispatch

S3 is very simple. While this dispatch mechanism does provide for some customization it is limited in that it will only dispatch on one argument (the first argument of your function).

Other class systems like S4 and S7 support multiple dispatch generics, in which the method called depends on 1 or more arguments.

Inheritance

Inheritance is the concept that objects can “inherit” behaviors that have already been defined by other classes. In S3, methods will dispatch on the first class that has a method available. Lets define a new class "bar" that inherits from "foo"

obj2 <- obj
class(obj2) <- c("bar", "foo")
obj2
<My custom class `foo`>

All we did here was copy our original object, and add a new class. With this, we can see that the print method we defined for "foo" still works for our "bar" class because it inherits the "foo" class.

Similarly we can define a method on my_generic for the "bar" class.

my_generic.bar <- function(x) {
  print("my_generic: bar method")
}
my_generic(obj)
[1] "my_generic: foo method"
my_generic(obj2)
[1] "my_generic: bar method"

As seen above, my_generic dispatches to the bar method on obj2 since in its class attribute, "bar" comes before "foo".

S4

S4 is a more complex framework. Class definitions are more formal and this framework supports multiple dispatch as well as multiple inheritance. You will see this class type more frequently with R packages hosted on Bioconductor and may deal with objects built from this framework.

We do not have the time to discuss the details of this class. Instead, please refer to the Adanced R book documentation on S4 classes

R6

R6 is a less wildly used framework. Here, methods do not belong to generics, but live on the object itself.

R6 objects cannot be “copied” as you would a normal R object. R6 is built by environments and environments use reference semantics. This loosely means that any time we assign an environment to a new variable and then change something about the new variable, we are ultimately effecting both variables!

env <- new.env()
env$value <- "hello"
# try to "copy"
env2 <- env
# change `value`
env2$value <- "world"
# check original
env$value
[1] "world"
env2$value
[1] "world"

This can be advantageous if you want only one copy of your data to exist through your code. If this is undesirable, don’t worry, R6 by default includes a object$clone() method that allows you to make a copy.

For more specific details please refer to the Advanced R book documentation on R6 classes

S7

S7 is the newest class system in development with the aim of replacing S3 and S4. It is not yet used by the community, but is a refreshing and consistent design pattern. For those who are interested in exploring, check out the github or their documentation