# character types wrapped in quotes ""
<- "hello world"
character_obj # integer type (numbers followed by L)
<- 1L
integer_obj # double type
<- 1
numeric_obj # logical type
<- c(TRUE, FALSE) logical_obj
General Programming Crash Course
Optional Class: Uncovered Concepts of h2l2c
The goal of this document is to provide a brief overview of some topics that are pervasive in most programming languages. This will aim to give topical understanding of logical operators, if
statements, loops, and classes.
For more formal and extensive details please see the Advanced R Book.
logical operators
We have talked about the base types in Lectures 1 and 2. As a reminder:
In this section we will discuss how to logical operators. This will lead into our next section on conditionals.
Logical operators are operators that compare two vectors in some way and will return a logical vector. These are convenient for filtering data according to what you want, as you can use the returned logical vector to index another!
AND operator &
This operator can be used between logical and numeric vector types. Suppose that you had two vectors assigned to x
and y
. You would use &
like so: x & y
This operator will return TRUE
when both x
AND y
are TRUE
.
x value |
y value |
Outcome of x & y |
---|---|---|
TRUE |
TRUE |
TRUE |
TRUE |
FALSE |
FALSE |
FALSE |
TRUE |
FALSE |
FALSE |
FALSE |
FALSE |
For numeric vectors, any value equal to 0 is equivalent to FALSE
and otherwise considered TRUE
.
OR operator |
This operator can also be used between logical and numeric vector types. It will be used similar to &
, for example x | y
.
This operator will return TRUE
when either x
OR y
are TRUE
.
x value |
y value |
Outcome of x | y |
---|---|---|
TRUE |
TRUE |
TRUE |
TRUE |
FALSE |
TRUE |
FALSE |
TRUE |
TRUE |
FALSE |
FALSE |
FALSE |
Similar to before, you can also use |
with numeric types as well.
Not operator !
This operator is a convenient way to negate a logical expression. Unlike before, where &
and |
require two variables to be used, this operator is used on a single variable.
x value |
Outcome of !x |
---|---|
TRUE |
FALSE |
FALSE |
TRUE |
Sometimes using !
with a single variable is sufficient for our needs, but we can do a lot more by combining other logical operators together.
x value |
y value |
Outcome of !(x & y) |
Outcome of !(x | y) |
---|---|---|---|
TRUE |
TRUE |
FALSE |
FALSE |
TRUE |
FALSE |
TRUE |
FALSE |
FALSE |
TRUE |
TRUE |
FALSE |
FALSE |
FALSE |
TRUE |
TRUE |
Equalities and Inequalities
Plain English | Operator | Example | Object Types |
---|---|---|---|
Is x equal to y |
== |
x == y |
All types |
Is x not equal to y |
!= |
x != y |
All types |
Is x greater than y |
> |
x > y |
numeric, logical |
Is x greater than or equal to y |
>= |
x >= y |
numeric, logical |
Is x less than y |
< |
x < y |
numeric, logical |
Is x less than or equal to y |
<= |
x <= y |
numeric, logical |
Doing more with ()
Just like in algebra, the ()
are used to prioritize execution of expressions. Suppose we had three logical vectors, x
, y
, and z
.
The expression x & (y | z)
will evaluate y | z
before comparing x
with its result.
x |
y |
z |
(y | z) |
x & (y | z) |
---|---|---|---|---|
TRUE |
TRUE |
TRUE |
TRUE |
TRUE |
FALSE |
TRUE |
TRUE |
TRUE |
FALSE |
TRUE |
FALSE |
TRUE |
TRUE |
TRUE |
FALSE |
FALSE |
TRUE |
TRUE |
FALSE |
TRUE |
TRUE |
FALSE |
TRUE |
TRUE |
FALSE |
TRUE |
FALSE |
TRUE |
FALSE |
TRUE |
FALSE |
FALSE |
FALSE |
FALSE |
FALSE |
FALSE |
FALSE |
FALSE |
FALSE |
Other Notable R functions
all(x)
returnsTRUE
if all values ofx
areTRUE
any(x)
returnsTRUE
if at least one value ofx
isTRUE
which(x)
returns the indices ofx
that wereTRUE
x %in% y
returns a logical vector the length ofx
, withTRUE
values for indices that exist iny
andFLASE
otherwise
if
statements
Sometimes we want to execute code only when a certain condition is met. Here, we may have a result of some logical operators and then do something special.
# imagine `condition` was some result
# from a prior analysis
<- TRUE
condition
if (condition) {
print("Condition was TRUE!")
}
[1] "Condition was TRUE!"
if (!condition) {
print("Condition was not TRUE")
}
Sometimes we know that we want to run code if condition
is TRUE
and another chunk if condition
is FALSE
. In this case, we would use an if else
statement.
# imagine `condition` was some result
# from a prior analysis
<- FALSE
condition
if (condition) {
print("Condition was TRUE!")
else {
} print("Condition was not TRUE")
}
[1] "Condition was not TRUE"
We can also test multiple conditions with if else if
statements!
<- FALSE
condition1 <- TRUE
condition2
if (condition1) {
print("Condition 1 was TRUE!")
else if (condition2) {
} print("Condition 2 was TRUE")
else {
} print("No condition was TRUE")
}
[1] "Condition 2 was TRUE"
You can chain multiple if else if
statements together, but doing so may lead some hard to read code.
Loops
for
loop
The for
loop has special syntax and works with any vector like object that has non-zero length.
#.for syntax
#.^ iterator placeholder
#.| ^ syntax of the loop
#.| | ^ input vector
#.| | | ^
#.| | | |
for (element in x_vector) {
# Do something with `x`
# ...
}
while
loop
The while
loop allows us to evaluate some code provided some condition is TRUE
.
#. `while` syntax
#. ^ some expression that evaluates to
#. | `TRUE` or `FALSE`
#. | ^
#. | |
while (condition) {
# Do something here
# ...
}
repeat
loop
repeat
is special syntax to repeat the next expression with no specified stops. This is different than for
loops which will stop after the last element of the input vector is reached, and in while
where the associated expression only evaluates when the condition is TRUE
.
To prevent the next bit of code from running forever, I have included the control flow keyword break
within an if statement.
<- 0L
count <- Sys.time()
start repeat {
<- count + 1L
count <- Sys.time() - start
time_diff # if we reach a time diff greater
# than a second, break out of loop
if (time_diff > 1) break
}cat("`repeat` ran ", count, " times in a second")
`repeat` ran 38581 times in a second
Control flow
Technically, everything we have talked about is control flow (see ?Control
). But more specifically, we have two keywords that help control the flow of execution within for
, while
, and repeat
next
The next
keyword will skip the rest of the code block and start the next iteration of the loop.
for (i in 1:10) {
if (i != 5L) next
print("We found the 5")
}
[1] "We found the 5"
break
The break
keyword will stop the execution of the code block.
for (i in 1:10) {
print(i)
if (i == 5L) break
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Classes
We have talked about object “types” before and have loosely related them to object “classes”. These are technically distinct attributes of every R object.
Every object’s class can be modified by the user at any time.
<- 1:10
obj class(obj)
[1] "integer"
typeof(obj)
[1] "integer"
class(obj) <- "foo"
obj
[1] 1 2 3 4 5 6 7 8 9 10
attr(,"class")
[1] "foo"
typeof(obj)
[1] "integer"
Changing the class of your objects is the first step in creating custom behaviors for your code through R’s various class systems. Creating custom behaviors for our custom class are handled by creating methods on generic functions.
Beyond custom methods, a custom class may also carry additional data in the object’s attributes()
. A class with consistent structure will lead to more structured analysis and execution.
R has (at the moment) about 5 class systems to choose from. We do not have time to discuss all of them in detail. Instead I will discuss the most pervasive class system S3. For the other class systems, I will briefly talk about where they are used and some important key features, but I will not show how to construct a new class or define methods for them.
S3
S3 is the simplest class system in R, and much of base R is built from this design style.
As we saw before, we can create a new class by passing in a character vector to class()
.
<- list()
obj class(obj) <- "foo"
obj
list()
attr(,"class")
[1] "foo"
generic functions
S3 defines methods for generic functions through a specific naming convention (<generic function>.<class name>
). We have been interacting with many generic functions in R already. One of them is the print()
generic function.
print
function (x, ...)
UseMethod("print")
<bytecode: 0x557b6ad08fc8>
<environment: namespace:base>
We can see that the only expression in the body of print()
is UseMethod("print")
. This is what causes R to dispatch to print. We can make our own print method for our custom class "foo"
.
<- function(x) {
print.foo cat("<My custom class `foo`>\n")
} obj
<My custom class `foo`>
We can also create our own function too!
<- function(x) {
my_generic UseMethod("my_generic")
}<- function(x) {
my_generic.foo print("my_generic: foo method")
}my_generic(obj)
[1] "my_generic: foo method"
If we try to call my_generic()
on some other object that isn’t "foo"
we will get an error
my_generic(list())
Error in UseMethod("my_generic"): no applicable method for 'my_generic' applied to an object of class "list"
If we want my_generic()
to work with any object, we need to create a default method.
<- function(x) {
my_generic.default print("my_generic: default method")
}my_generic(list())
[1] "my_generic: default method"
S3 is very simple. While this dispatch mechanism does provide for some customization it is limited in that it will only dispatch on one argument (the first argument of your function).
Other class systems like S4 and S7 support multiple dispatch generics, in which the method called depends on 1 or more arguments.
Inheritance
Inheritance is the concept that objects can “inherit” behaviors that have already been defined by other classes. In S3, methods will dispatch on the first class that has a method available. Lets define a new class "bar"
that inherits from "foo"
<- obj
obj2 class(obj2) <- c("bar", "foo")
obj2
<My custom class `foo`>
All we did here was copy our original object, and add a new class. With this, we can see that the print method we defined for "foo"
still works for our "bar"
class because it inherits the "foo"
class.
Similarly we can define a method on my_generic
for the "bar"
class.
<- function(x) {
my_generic.bar print("my_generic: bar method")
}my_generic(obj)
[1] "my_generic: foo method"
my_generic(obj2)
[1] "my_generic: bar method"
As seen above, my_generic
dispatches to the bar method on obj2
since in its class attribute, "bar"
comes before "foo"
.
S4
S4 is a more complex framework. Class definitions are more formal and this framework supports multiple dispatch as well as multiple inheritance. You will see this class type more frequently with R packages hosted on Bioconductor and may deal with objects built from this framework.
We do not have the time to discuss the details of this class. Instead, please refer to the Adanced R book documentation on S4 classes
R6
R6 is a less wildly used framework. Here, methods do not belong to generics, but live on the object itself.
R6 objects cannot be “copied” as you would a normal R object. R6 is built by environments and environments use reference semantics. This loosely means that any time we assign an environment to a new variable and then change something about the new variable, we are ultimately effecting both variables!
<- new.env()
env $value <- "hello"
env# try to "copy"
<- env
env2 # change `value`
$value <- "world"
env2# check original
$value env
[1] "world"
$value env2
[1] "world"
This can be advantageous if you want only one copy of your data to exist through your code. If this is undesirable, don’t worry, R6 by default includes a object$clone()
method that allows you to make a copy.
For more specific details please refer to the Advanced R book documentation on R6 classes
S7
S7 is the newest class system in development with the aim of replacing S3 and S4. It is not yet used by the community, but is a refreshing and consistent design pattern. For those who are interested in exploring, check out the github or their documentation