vector()
x <- vector("numeric", length = 10)c()
x <- c("a", "b", "c")
x <- c(TRUE, FALSE)
as.numeric(x)
as.logical(x)
as.character(x)
attributes()print()
m <- matrix(nrow = 2, ncol = 3)
m <- matrix(1:6, nrow = 2, ncol = 3)
m<-1:10
dim(m) <- c(2, 5)
cbind(x, y)
rbind(x, y)
list
Lists are a special type of vector that can contain elements of different classes.
1
x <- list(1, "a", TRUE, 1 + 4i)
Factor
Factors are used to represent categorical data. Factors can be unordered or ordered. One can think of a factor as an integer vector where each integer has a label.
1
2
3
4
5
6
7
lm()glm()
x <- factor(c("yes", "yes", "no", "yes", "no"))table(x)
x <- factor(c("yes", "yes", "no", "yes", "no"), levels = c("yes", "no"))
Missing Values
Missing values are denoted by NA or NaN for undefined mathematical operations.
x <- c(1,2,NaN,NA,4)
is.na(x)
is.nan(x)
x<-c(1,2,NA,4,NA,5)
bad <- is.na(x)
x[!bad]
#### What if there are multiple things and you want to take the subset with no missing values?
x<-c(1,2,NA,4,NA,5)
y <- c("a", "b", NA, "d", NA, "f")
good <- complete.cases(x, y)
x[good]
y[good]
df[1:6,]
good <- complete.cases(df)
df[good,][1:6,]
x <- df[df$Month==5,]summary(x$Ozone)
read.table() #for reading tabular data
read.csv()
write.table()
readLines() #for reading lines of a textfile
writeLines()
source() #for reading in R code files
dump()
dump(c("x", "y"), file = "data.R")
rm(x, y)
source("data.R")
dget() #for reading in R code files
dput()
load() #for reading in saved workspaces
save()
unserialize() #for reading single R objects in binary form
serialize()
Data are read in using connection interfaces.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
file #opens a connection to a file
gzfile #opens a connection to a file compressed with gzip
bzfile #opens a connection to a file compressed with bzip2
url #opens a connection to a webpage
str(file)
con <- file("foo.txt", "r")
data <- read.csv(con)
close(con)
con <- gzfile("words.gz")
x <- readLines(con, 10)
con <- url("http://www.jhsph.edu", "r")
x <- readLines(con)
args(paste)
function (..., sep = " ", collapse = NULL)
I have a data frame with 1,500,000 rows and 120 columns, all of which are numeric data. Roughly, how much memory is required to store this data frame?
1
1,500,000 × 120 × 8bytes/numeric = 1.34 GB
3 Control structures
if, else: testing a condition
for: execute a loop a fixed number of times
while: execute a loop while a condition is true · repeat: execute an infinite loop
break: break the execution of a loop
next: skip an interation of a loop
return: exit a function
4 Looping on the Command Line
lapply: Loop over a list and evaluate a function on each element ·
sapply: Same as lapply but try to simplify the result
apply: Apply a function over the margins of an array
tapply: Apply a function over subsets of a vector
mapply: Multivariate version of lapply
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
x <- list(a = 1:5, b = rnorm(10))
lapply(x, mean) #
x<-1:4
lapply(x, runif, min = 0, max = 10)
##### lapply and friends make heavy useof anonymous functions.
x <- list(a = matrix(1:4, 2, 2), b = matrix(1:6, 3, 2))
lapply(x, function(elt) elt[,1])
##### sapply will try to simplify the result of lapply if possible.
##### If the result is a list where every element is length 1, then a vector is returned
##### If the result is a list where every element is a vector of the same length (> 1), a matrix is returned.
##### If it can’t figure things out, a list is returned.
x <- list(a = 1:4, b = rnorm(10), c = rnorm(20, 1), d = rnorm(100, 5))
lapply(x, mean)
sapply(x, mean)
applay
apply is used to a evaluate a function (often an anonymous one) over the margins of an array.
It is most often used to apply a function to the rows or columns of a matrix.
It can be used with general arrays, e.g. taking the average of an array of matrices
It is not really faster than writing a loop, but it works in one line!
dnorm(x, mean = 0, sd = 1, log = FALSE)
##### pnorm(q) = fi(q); qnorm(p) = fi(q)反函数
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)
x <- rnorm(10,20,10)
summary(x)
### Setting the random number seed with set.seed ensures reproducibility### Always set the random number seed when conducting a simulation!set.seed(1)