For general computations routinely performed in the scientific setting, the statistical programming language R has become a powerful option. [@R-base] This programming environment (a) is available on all major platforms (Mac, PC, unix, linux), (b) has many easy-to-use (and many not-so-easy-to-use) built-in functions for plotting data, generating curves, manipulating matrices, etc., and best of all (c) is freely available for download over the internet.

The Console Window and Simple Calculations

After installing R (and, perhaps, Rstudio) on your system and initializing,

34 + 17
[1] 51
sqrt(49)
[1] 7
besselI(0.25,1)
[1] 0.1259791
pi
[1] 3.141593

Calculations with Variables and Functions

Define variables and perform calculations…

E0 = 938; W = 8000
sqrt( 1 - (E0/(E0+W))^2 )
[1] 0.994478
# or, define a variable:
beta = sqrt( 1 - (E0/(E0+W))^2 )
# just typing the name of the variable displays its value:
beta
[1] 0.994478

Create a function as follows:

beta = function(x){ sqrt( 1 - (E0/(E0+x))^2 ) }
beta(8000)
[1] 0.994478

Make a list of numbers and perform a calculation on all of them

x = c(0,2,5,9,23)
beta(x)
[1] 0.0000000 0.0651981 0.1028413 0.1375393 0.2174718

Round off our answers to 3 digits

round(beta(x),3)
[1] 0.000 0.065 0.103 0.138 0.217

Saving/Resurrecting the Workspace

Variables and their values can be saved for future use:

save(list=ls(all.names=TRUE),file="vars.Rdata")

Once saved, the workspace can be cleared, using the command:

rm(list=ls())
ls()    # list all the variables in the work environment
character(0)

The data just saved can be re-loaded into the working environment:

load("vars.Rdata")
ls()    # list all the variables in the work environment
[1] "beta" "E0"   "W"    "x"   

Making Simple Plots

x = c(0,1,2,3,4,5,6,7,8,9)
y = beta(x)
plot(x,y)

Plot a curve of a function

curve(beta,0,10)

Plot the points, and add a curve to the same plot; also, add some new labels to the axes

plot(x,y,
   xlab="proton kinetic energy [MeV]", ylab="v/c", main="Velocity vs. Kinetic Energy")
curve(beta, 0, 10, add=TRUE, col="blue")

Creating Random Distributions

# Uniform Distribution of Npts points between 0 and 1
Npts = 500
rnum = runif(Npts,0,1)
plot(rnum)

hist(rnum)

# Normal (Gaussian) Distribution of Npts points, with mean=0, sigma=sig
sig=2
set.seed(231)  # set the random number seed to get same results each run
rnum = rnorm(Npts,0,sig)
plot(rnum)

plot(rnum,pch=".") # use "dots" instead of the default "circles"

hist(rnum)
curve(Npts/(sqrt(2*pi)*sig)*exp(-x^2/(2*sig^2)), add=TRUE, col="blue")

Decisions and Loops

imax=10
x = array(c(1:imax))
x
 [1]  1  2  3  4  5  6  7  8  9 10
i = 1
while (i<imax){
  x[i+1] = x[i] + 12
  i = i+1
}
x
 [1]   1  13  25  37  49  61  73  85  97 109

Can use ifelse(test,yes,no) to perform tests and make decisions

imax=10
x = runif(imax,-1,1)
round(x,2)
 [1] -0.17  0.29 -0.56  0.82 -0.32 -0.25 -0.25  0.24 -0.70  0.50
i = 1
while (i<imax+1){
  x[i] = ifelse(x[i]<0, abs(x[i]), x[i])
  i = i+1
}
round(x,2)
 [1] 0.17 0.29 0.56 0.82 0.32 0.25 0.25 0.24 0.70 0.50

R as a Vector Language

As the reader may have surmized by now, the inner workings of the R language are tailored to the use of vectors. Each variable is thought of as an ordered list of possible values and simple functions operate on theses values separately, by default. So, if x is a vector, then sin(x) will create a vector where the sine function is applied to each value of x:

x = c(0,1,2,3,4,5,6,7,8,9)
y = sin(x)
y
 [1]  0.0000000  0.8414710  0.9092974  0.1411200 -0.7568025 -0.9589243
 [7] -0.2794155  0.6569866  0.9893582  0.4121185
x*y
 [1]  0.000000  0.841471  1.818595  0.423360 -3.027210 -4.794621 -1.676493
 [8]  4.598906  7.914866  3.709066

Note that x*y takes each element of x and multiplies it by the corresponding element of y. (If the two vectors do not have the same length, an error is produced.) Of course vector functions such as inner products and cross products can be performed as well, though special syntax and symbols must be used to perform the desired calculation. We will see examples of this below.

On the other hand, this syntax can be very useful at times. Consider the series \[ f(x) = \sum_{n=1}^{N} \frac{c_n}{n}\sin(nx) \] where the \(c_n\) are provided and the sum needs to be produced for a specific value of \(x\). One often would write a chunk of code that loops over a variable n, say, and sums up the terms then breaks out of the loop after N iterations, etc. This operation can now be written in single line of code

N = 32
x0 = 2
c_n = runif(N,-1,1) # just for illustration
fcn = sum( c_n/c(1:N)*sin(c(1:N)*x0) ) # <--- one line!
fcn
[1] 0.3568438

Matrix Manipulations

A matrix is essentially an ordered list; so, create a list of numbers and tell R where to break the list into rows or columns.

M = matrix( c(1,2,3,4), ncol=2, byrow=TRUE)
M
     [,1] [,2]
[1,]    1    2
[2,]    3    4

Nice to type the input this way, to visually emphasize the arrangement:

A = matrix( c(1, 2,
              3, 4  ), ncol=2, byrow=TRUE)
A
     [,1] [,2]
[1,]    1    2
[2,]    3    4
B = matrix( c(5, 6,
              7, 8  ), ncol=2, byrow=TRUE)
B
     [,1] [,2]
[1,]    5    6
[2,]    7    8

Multiply matrices – but be careful! If you write A * B, this will take each element of A (say a11) and multiply by the corresponding element in B (say b11). What we want is a real matrix multiply – performed using the %*% operator:

C = A * B
C
     [,1] [,2]
[1,]    5   12
[2,]   21   32

WRONG!!

Here’s the right way (in R ):

C = A %*% B
C
     [,1] [,2]
[1,]   19   22
[2,]   43   50

Take the transpose using t(), determinant using det()

t(A)
     [,1] [,2]
[1,]    1    3
[2,]    2    4
det(A)
[1] -2

To find the inverse of a matrix, we use the solve() function:

solve(A)
     [,1] [,2]
[1,] -2.0  1.0
[2,]  1.5 -0.5

Check:

  A      %*% solve(A)
     [,1]         [,2]
[1,]    1 1.110223e-16
[2,]    0 1.000000e+00
solve(A) %*%   A
     [,1]         [,2]
[1,]    1 4.440892e-16
[2,]    0 1.000000e+00

The function solve(A,B) solves the equation \(A\vec{x} = B\) for \(\vec{x}\). If \(B\) is missing in the solve() command, it is taken to be the identity and the inverse of \(A\) is returned.

Multiply a vector by a matrix to get a new vector; still use the operator %*%

x = c(2,1)
x
[1] 2 1
t(t(x))  # trick, to see it in a "usual" (up/down) vector format
     [,1]
[1,]    2
[2,]    1
A
     [,1] [,2]
[1,]    1    2
[2,]    3    4
A %*% x
     [,1]
[1,]    4
[2,]   10

Reading/Writing Data

Write a set of numbers to a file, and read them back in…

x = runif(500,3,9)
y = rnorm(500,2,6)
write.table(cbind(x,y), file="data.txt", row.names=FALSE)
data = read.table("data.txt", header = TRUE)
head(data)  # shows the first few lines...
xx = data[,1]  # first column
yy = data[,2]  # second column
plot(xx,yy)
abline(h=c(mean(yy),mean(yy)+c(-1,1)*sd(yy)),
   lty=c(1,2,2),lwd=2,col=c("red","blue","blue"))
abline(v=mean(xx),lty=1,lwd=2,col="green")

Note: The abline() function adds either horizontal, vertical, or general (\(y = a + bx\)) lines to an existing plot. The argument “lty” depicts the style of the line (solid, dotted, etc.).

Dataframes

Often times we wish to create or maybe read in a table of data, perhaps from measurements that were taken or from the results of a computer simulation, and take a quick look at the numbers, make simple plots, or pursue other manipulations. Above we read in data from a file and then identified new variables for each of the two columns of numbers. Actually, the read.table command will create a “data frame” in R which makes many new options available for our use. Below we generate some data and write it to a file; then read it back in, treating it as a data frame.

# Create some sample data and write to a file, with a header line
x = rnorm(50,0,2)
y = rnorm(50,0,5)
p = (y/5*2-5*x)/10
z = 2-(x+runif(50,-2,2))/2
filename = "data2.txt"
write.table(cbind(x,y,z,p), file=filename, row.names = FALSE)

Next, we read the file back in, creating a dataframe named “data2”.

filename = "data2.txt"
data2 = read.table(file=filename,header=TRUE,sep="")
# What have we created?
attributes(data2)
$names
[1] "x" "y" "z" "p"

$class
[1] "data.frame"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
[25] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
[49] 49 50
head(data2)  # look at first few lines of the dataframe

Note that the names of the columns in our dataframe are taken from the header (first line) of the text file we read in. To access the data for one of the variables (columns) from our dataframe data we use the dollar sign:

# number of entries for "x"
length(data2$x) 
[1] 50
 # value of 23rd entry for "x"
data2$x[23]    
[1] 0.9182747
# perform calculation on all values of "x"
print(sqrt(abs(data2$x)))  
 [1] 0.6084816 0.8739309 1.4903011 1.9915934 1.4277139 1.6439511 1.2164426
 [8] 1.4701206 1.7993016 0.5850132 1.8616033 0.4317011 1.0924151 1.6296967
[15] 0.2154265 1.6189016 1.4255884 1.4386191 0.3784294 0.9138128 0.5334032
[22] 1.3422886 0.9582665 1.9949085 1.4001613 0.4762613 1.3154949 1.2312756
[29] 1.6939402 1.3210473 0.5674360 0.5617848 0.9786093 1.3660924 0.6112916
[36] 1.1206362 1.3710219 1.8047939 0.9928415 1.9122926 1.2847397 0.8249718
[43] 1.3083360 1.6665517 1.3239432 0.1611211 1.7620906 1.9686507 1.6027884
[50] 1.9826357
# create plots of each varialble vs. every other variable
pairs(data2)    

# plot one variable vs. another variable
plot(data2$x,data2$p)  

You can create a subset of your data set:

# find subset where abs(x)<0.8 AND abs(y)<1:
data.sub = subset(data2, subset=( abs(data2$x)<4 & abs(data2$y)<1) )
head(data.sub)
plot(data.sub$x,data.sub$p)  # plot one variable vs. another variable

#  note: a & b     "a is true AND b is true" (intersection)
#        a | b     "a is true OR  b is true"    (union)
#         !a            "a is NOT true"        (negation)
#      other logical options:  a< b  a<=b  a==b  a>=b   a!=b   

Next, we take our plot and fit the data to a linear model, finding the intercept and slope, and adding the corresponding line to the plot

fitparams = lm(p ~ x, data.sub)
fitparams

Call:
lm(formula = p ~ x, data = data.sub)

Coefficients:
(Intercept)            x  
     0.0067      -0.4989  
plot(data.sub$x,data.sub$p)
abline(fitparams,lty=2,col="red")

You can also add columns to your data frame using the already existing data if required. An example:

data2$a = sqrt(data2$x^2 + data2$y^2)
head(data2)

And Much, Much More

text manipulations; complex plotting control; optimizations; complex statistical analyses and modeling; …

visit https://www.r-project.org/ for more information.

Documenting Results

blah blah blah blah blah

Notebooks, rmarkdown and bookdown

blah blah blah blah blah

discuss options, how this web page was created using rmarkdown within an R notebook, now to download the full .Rmd document (at top of the web page), and how bookdown has been used to create full web sites for the courses

---
title: "Essential **R** for basic scientific computations"
output: 
   html_notebook:
      theme: united
      toc: yes
---


```{r options, echo=FALSE, eval=TRUE}
library(knitr)
library(ggplot2)
knitr::opts_chunk$set(echo=TRUE, eval=TRUE, cache = TRUE,
  results = 'html', fig.show='asis', warning=FALSE, message=FALSE)
```

For general computations routinely performed in the scientific setting, the statistical programming language **R** has become a powerful option. [@R-base]   This programming environment (a) is available on all major platforms (Mac, PC, unix, linux), (b) has many easy-to-use (and many not-so-easy-to-use) built-in functions for plotting data, generating curves, manipulating matrices, *etc.*, and best of all (c) is freely available for download over the internet. 

## The Console Window and Simple Calculations

After installing **R** (and, perhaps, **Rstudio**) on your system and initializing,

- in the Console window, type commands and hit *Return*:
```{r startout01, echo=TRUE}
34 + 17
sqrt(49)
besselI(0.25,1)
pi
```

- in the Source window, select lines that you wish to execute and (a) hit *Run*, or (b) hit *Command-Return* (MAC) or *Control-Return* (PC)
- in the console window, type `source("/dir/filename")` (quotation marks must be included) and hit *Return* to execute **R** commands stored in a text file located in directory and file */dir/filename* 


## Calculations with Variables and Functions

Define variables and perform calculations...

```{r calcs01}
E0 = 938; W = 8000
sqrt( 1 - (E0/(E0+W))^2 )
# or, define a variable:
beta = sqrt( 1 - (E0/(E0+W))^2 )
# just typing the name of the variable displays its value:
beta
```

Create a function as follows:

```{r calcs02}
beta = function(x){ sqrt( 1 - (E0/(E0+x))^2 ) }
beta(8000)
```

Make a list of numbers and perform a calculation on all of them
```{r calcs03}
x = c(0,2,5,9,23)
beta(x)
```

Round off our answers to 3 digits
```{r calcs04}
round(beta(x),3)
```

### Saving/Resurrecting the Workspace

Variables and their values can be saved for future use:

```{r}
save(list=ls(all.names=TRUE),file="vars.Rdata")
```

Once saved, the workspace can be cleared, using the command:

```{r}
rm(list=ls())
ls()    # list all the variables in the work environment
```

The data just saved can be re-loaded into the working environment:

```{r}
load("vars.Rdata")
ls()    # list all the variables in the work environment
```


## Making Simple Plots
```{r plots00}
x = c(0,1,2,3,4,5,6,7,8,9)
y = beta(x)
plot(x,y)
```

Plot a curve of a function
```{r plots01}
curve(beta,0,10)
```

Plot the points, and add a curve to the same plot; also, add some new labels to the axes
```{r plots02}
plot(x,y,
   xlab="proton kinetic energy [MeV]", ylab="v/c", main="Velocity vs. Kinetic Energy")
curve(beta, 0, 10, add=TRUE, col="blue")
```

## Creating Random Distributions
```{r randoms}
# Uniform Distribution of Npts points between 0 and 1
Npts = 500
rnum = runif(Npts,0,1)
plot(rnum)
hist(rnum)
# Normal (Gaussian) Distribution of Npts points, with mean=0, sigma=sig
sig=2
set.seed(231)  # set the random number seed to get same results each run
rnum = rnorm(Npts,0,sig)
plot(rnum)
plot(rnum,pch=".") # use "dots" instead of the default "circles"
hist(rnum)
curve(Npts/(sqrt(2*pi)*sig)*exp(-x^2/(2*sig^2)), add=TRUE, col="blue")
```

## Decisions and Loops

```{r loops01}
imax=10
x = array(c(1:imax))
x
i = 1
while (i<imax){
  x[i+1] = x[i] + 12
  i = i+1
}
x
```

Can use `ifelse(test,yes,no)` to perform tests and make decisions
```{r loops02}
imax=10
x = runif(imax,-1,1)
round(x,2)
i = 1
while (i<imax+1){
  x[i] = ifelse(x[i]<0, abs(x[i]), x[i])
  i = i+1
}
round(x,2)
```


## **R** as a Vector Language

As the reader may have surmized by now, the inner workings of the **R** language are tailored to the use of vectors.  Each variable is thought of as an ordered list of possible values and simple functions operate on theses values separately, by default.  So, if `x` is a vector, then `sin(x)` will create a vector where the sine function is applied to each value of `x`:

```{r}
x = c(0,1,2,3,4,5,6,7,8,9)
y = sin(x)
y
x*y
```

Note that `x*y` takes each element of `x` and multiplies it by the corresponding element of `y`.  (If the two vectors do not have the same length, an error is produced.)  Of course vector functions such as inner products and cross products can be performed as well, though special syntax and symbols must be used to perform the desired calculation.  We will see examples of this below.

On the other hand, this syntax can be very useful at times.  Consider the series
$$
f(x) = \sum_{n=1}^{N} \frac{c_n}{n}\sin(nx)
$$
where the $c_n$ are provided and the sum needs to be produced for a specific value of $x$.  One often would write a chunk of code that loops over a variable `n`, say, and sums up the terms then breaks out of the loop after `N` iterations, etc.  This operation can now be written in single line of code
```{r}
N = 32
x0 = 2
c_n = runif(N,-1,1) # just for illustration
fcn = sum( c_n/c(1:N)*sin(c(1:N)*x0) ) # <--- one line!
fcn
```



## Matrix Manipulations

A matrix is essentially an ordered list; so, create a list of numbers and tell **R** where to break the list into rows or columns.
```{r matrices01}
M = matrix( c(1,2,3,4), ncol=2, byrow=TRUE)
M
```
Nice to type the input *this* way, to visually emphasize the arrangement:
```{r matrices02}
A = matrix( c(1, 2,
              3, 4  ), ncol=2, byrow=TRUE)
A
```
```{r matrices03}
B = matrix( c(5, 6,
              7, 8  ), ncol=2, byrow=TRUE)
B
```

Multiply matrices -- but be careful!  If you write `A * B`, this will take each element of A (say a11) and multiply by the corresponding element in B (say b11).  What _we_ want is a real matrix multiply -- performed using the `%*%` operator: 
```{r matrices04a}
C = A * B
C
```

**WRONG!!**

Here's the right way (in **R** ):
```{r matrices04b}
C = A %*% B
C
```

Take the *transpose* using `t()`, *determinant* using `det()`
```{r matrices05}
t(A)
det(A)
```
To find the inverse of a matrix, we use the `solve()` function:
```{r matrices05b}
solve(A)
```

Check:
```{r matrices05c}
  A      %*% solve(A)
solve(A) %*%   A
```

The function `solve(A,B)` solves the equation $A\vec{x} = B$ for $\vec{x}$.  If $B$ is missing in the `solve()` command, it is taken to be the identity and the inverse of $A$ is returned.

Multiply a vector by a matrix to get a new vector; still use the operator `%*%`...
```{r matrices06}
x = c(2,1)
x
t(t(x))  # trick, to see it in a "usual" (up/down) vector format
A
A %*% x
```

## Reading/Writing Data
Write a set of numbers to a file, and read them back in...
```{r inout}
x = runif(500,3,9)
y = rnorm(500,2,6)
write.table(cbind(x,y), file="data.txt", row.names=FALSE)

data = read.table("data.txt", header = TRUE)
head(data)  # shows the first few lines...

xx = data[,1]  # first column
yy = data[,2]  # second column

plot(xx,yy)
abline(h=c(mean(yy),mean(yy)+c(-1,1)*sd(yy)),
   lty=c(1,2,2),lwd=2,col=c("red","blue","blue"))
abline(v=mean(xx),lty=1,lwd=2,col="green")
```

Note:  The `abline()` function adds either horizontal, vertical, or general ($y = a + bx$) lines to an existing plot.  The argument "lty" depicts the style of the line (solid, dotted, etc.).

## Dataframes

Often times we wish to create or maybe read in a table of data, perhaps from measurements that were taken or from the results of a computer simulation, and take a quick look at the numbers, make simple plots, or pursue other manipulations.  Above we read in data from a file and then identified new variables for each of the two columns of numbers.  Actually, the `read.table` command will create a "data frame" in **R** which makes many new options available for our use.  Below we generate some data and write it to a file; then read it back in, treating it as a data frame.

```{r makedata}
# Create some sample data and write to a file, with a header line
x = rnorm(50,0,2)
y = rnorm(50,0,5)
p = (y/5*2-5*x)/10
z = 2-(x+runif(50,-2,2))/2

filename = "data2.txt"
write.table(cbind(x,y,z,p), file=filename, row.names = FALSE)
```

Next, we read the file back in, creating a dataframe named "data2".
```{r makeframe}
filename = "data2.txt"
data2 = read.table(file=filename,header=TRUE,sep="")
# What have we created?
attributes(data2)
head(data2)  # look at first few lines of the dataframe
```

Note that the names of the columns in our dataframe are taken from the header (first line) of the text file we read in.  To access the data for one of the variables (columns) from our dataframe `data` we use the dollar sign:

```{r x}
# number of entries for "x"
length(data2$x) 
 # value of 23rd entry for "x"
data2$x[23]    
# perform calculation on all values of "x"
print(sqrt(abs(data2$x)))  
# create plots of each varialble vs. every other variable
pairs(data2)    
# plot one variable vs. another variable
plot(data2$x,data2$p)  
```

You can create a `subset` of your data set:

```{r attachframe}
# find subset where abs(x)<0.8 AND abs(y)<1:
data.sub = subset(data2, subset=( abs(data2$x)<4 & abs(data2$y)<1) )
head(data.sub)
plot(data.sub$x,data.sub$p)  # plot one variable vs. another variable
#  note: a & b     "a is true AND b is true" (intersection)
#        a | b     "a is true OR  b is true"    (union)
#         !a            "a is NOT true"        (negation)
#      other logical options:  a< b  a<=b  a==b  a>=b   a!=b   
```

Next, we take our plot and fit the data to a linear model, finding the intercept and slope, and adding the corresponding line to the plot

```{r linfit}
fitparams = lm(p ~ x, data.sub)
fitparams

plot(data.sub$x,data.sub$p)
abline(fitparams,lty=2,col="red")
```

You can also add columns to your data frame using the already existing data if required.  An example:

```{r add2frame}
data2$a = sqrt(data2$x^2 + data2$y^2)
head(data2)
```

### And Much, Much More

text manipulations; complex plotting control; optimizations; complex statistical analyses and modeling; ...

visit  https://www.r-project.org/ for more information.


## Documenting Results

blah blah blah blah blah 

### Notebooks, `rmarkdown` and `bookdown`

blah blah blah blah blah 

discuss options, how this web page was created using rmarkdown within an R notebook, now to download the full .Rmd document (at top of the web page), and how bookdown has been used to create full web sites for the courses



