$24
Packages needed: knitr, xtable, pander
Task 1: Evaluation of a standard random number generator
Code set-up
The code chunk below presents R code for the RANDU random number generator presented in the video lectures. The code chunk generates 100 pseudo-random numbers from using the RANDU generator and presents a histogram, empirical cdf, and Kolmogorov-Smirnov test to evaluate the performance of the algorithm. Try running it.
# generate random variates according to the RANDU generator
# X_{n+1} = 65539 X_n mod 2^31
n<-100 # number of variates
x<-1 # seed
for(i in 1:n){
x<-c(x, (65539*x[i])%%(2^31))
}
x<-x/2^31
x<-x[2:n]
#par(mfrow=c(2,1))
# histogram of the n RANDU variates
hist(x, main="", xlab="RANDU variates", ylab="Frequencies")
# empirical distribution function of the n RANDU variates and
# uniform(0, 1) probability plot
plot.ecdf(x, verticals= TRUE, do.p = FALSE, main="", ylab="Probability")
abline(0,1)
# Kolmogorov-Smirnov test of RANDU variates against U(0, 1) distribution
ks.test(x,"punif",0,1)
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.12773, p-value = 0.07229
## alternative hypothesis: two-sided
The problem
In this task, we will evaluate an original implementation of the rand command in Matlab for generating uniform random numbers. The random number generator has formulation
Ammend the code chunk above to generate 1000 random uniform variates from this random number generator.
##
## One-sample Kolmogorov-Smirnov test
##
## data: x
## D = 0.028771, p-value = 0.38
## alternative hypothesis: two-sided
Report the following: (these are all outputs of the code chunk above already)
• Present a histogram of the 1000 variates you generated.
• Present the empirical distribution function of your data and a probability plot under the assumption of independent and identically distributed data points (on the same set of axes as we did in the online lecture).
• Use your generations from (a) to perform a Kolmogorov-Smirnov test to validate the routine.
Questions:
• Do the pseudo-random numbers appear uniformly distributed? Why or why not?
Pseudo-random numbers don’t appear uniformly distributed since the Empirical cdf histogram is far from close to rectangular shape. Also, the p-value is 0.38 which is much larger than alpha = 0.05. We reject the Null Hypothesis.
• What are your conclusions from the Kolmogorov-Smirnov test at the level?
We can reject the Null Hypothesis because p-value is significantly larger than alpha level.
Task 2: Simulation of discrete distributions
A bag contains one red, two blue, three green, and four yellow marbles. A sample of three marbles is taken without replacement. Let B denote the number of blue marbles and Y denote the number of yellow marbles in the sample. The probabilities of each outcome is as follows:
# we will create a table using xtable and pander
library(knitr)
library(xtable)
library(pander)
table.elts = rbind(
c(0, "4/120", "24/120", "24/120", "4/120", "56/120"),
c(1, "12/120", "32/120", "12/120", "0/120"),
c(2, "4/120", "4/120", "0/120", "0/120"),
c("Yellow marg", "20/120", "60/120", "36/120", "4/120", "1"))
#row.names(table.elts) = cbind("# coin tosses", "Number of heads", "Proportion of heads")
colnames(table.elts) = cbind("B/Y", "0", "1", "2", "3", "Blue marg")
lab3.table = xtable(table.elts, caption = "Discrete distribution of marble game.", label="distr_table", align = "|l|rrrr|rr")
pander(lab3.table, hline.after = c(3))
Discrete distribution of marble game.
B/Y
0
1
2
3
Blue marg
0
4/120
24/120
24/120
4/120
56/120
1
12/120
32/120
12/120
0/120
1
2
4/120
4/120
0/120
0/120
2
Yellow marg
20/120
60/120
36/120
4/120
1
Code set-up
We can use the sample function of R to randomly generate from any sequence, including characters. For example, we can code the bag of marbles as
pop = c("r","b","b","g","g","g","y","y","y","y")
We may then sample from this bag of letters (marbles). The following code chunk draws three marbles (so one run of the experiment). We record the number of blue marbles and yellow marbles from this draw.
# True values:
# P(0 blue) = 56/120; P(1 blue) = 56/120; P(2 blue) = 8/120
# P(0 Y) = 20/120; P(1 Y) = 60/120; P(2 Y) = 36/120; P(3 Y) = 4/120
# Bag of marbles (population)
pop = c("r","b","b","g","g","g","y","y","y","y")
samp = sample(pop,3) # draw 3 marbles
blues = sum(samp=="b") # number of blue marbles drawn
yellows = sum(samp=="y") # number of yellow marbles drawn
c(blues, yellows) # just checking how many blue and yellow marbles are drawn
## [1] 1 0
For this task, you will repeat this experiment 10,000 times. You still need to store the number of blue and yellow marbles from each experiment. The easiest way is to just turn the blue and yellow variables into vectors in which to store the values each step of a for-loop over the 10,000 experiments.
A note, the table command in R is a handy way of summarizing output. In particular, if you have a storage vector blues of the number of blue marbles for each of 10,000 experiments, table(blues) will tabulate the number of experiments with 0 blue marbles drawn, 1 blue marble drawn, 2 blue marbles drawn, and 3 blue marbles drawn.
The problem
Perform a simulation experiment of drawing three marbles from the bag 10,000 times. I have initialized storage vectors for you.
# True values:
# P(0 blue) = 56/120; P(1 blue) = 56/120; P(2 blue) = 8/120
# P(0 Y) = 20/120; P(1 Y) = 60/120; P(2 Y) = 36/120; P(3 Y) = 4/120
simnum<-10000 # number of experiments
# Initialize storage vectors for the number of blue and yellow marbles drawn
# for each of the 10,000 runs of the experiment.
blues<-numeric(simnum); yellows<-numeric(simnum)
pop<-c("r","b","b","g","g","g","y","y","y","y")
for(i in 1:simnum){
samp<-sample(pop,3) # draw 3 marbles
blues[i]<-sum(samp=="b")
yellows[i]<-sum(samp=="y")
bp0<-sum(blues==0)/i #P(B=0)
bp1<-sum(blues==1)/i #P(B=1)
bp2<-sum(blues==2)/i #P(B=2)
bp3<-sum(blues==3)/i #P(B=3)
yp0<-sum(yellows==0)/i #P(Y=0)
yp1<-sum(yellows==1)/i #P(Y=1)
yp2<-sum(yellows==2)/i #P(Y=2)
yp3<-sum(yellows==3)/i #P(Y=3)
}
tableBlue <- data.frame(
BLUE = c(0:3, "Mean", "Variance"),
epmf = c(bp0, bp1, bp2, bp3, mean(blues), sd(blues)),
pmf = c("56/120", "56/120", "8/120", "0/120", " ", " "),
diff = c(abs(bp0-56/120), abs(bp1-56/120), abs(bp2-8/120), bp3, " ", " ")
)
tableYellow <- data.frame(
YELLOW = c(0:3, "Mean", "Variance"),
epmf = c(yp0, yp1, yp2, yp3, mean(yellows), sd(yellows)),
pmf = c("20/120", "60/120", "36/120", "4/120", " ", " "),
diff = c(abs(yp0-20/120), abs(yp1-60/120), abs(yp2-36/120), abs(yp3-4/120), " ", " ")
)
print(tableBlue)
## BLUE epmf pmf diff
## 1 0 0.474200 56/120 0.00753333333333334
## 2 1 0.460700 56/120 0.00596666666666668
## 3 2 0.065100 8/120 0.00156666666666666
## 4 3 0.000000 0/120 0
## 5 Mean 0.590900
## 6 Variance 0.609897
print(tableYellow)
## YELLOW epmf pmf diff
## 1 0 0.1654000 20/120 0.00126666666666667
## 2 1 0.5037000 60/120 0.00370000000000004
## 3 2 0.2975000 36/120 0.0025
## 4 3 0.0334000 4/120 6.66666666666663e-05
## 5 Mean 1.1989000
## 6 Variance 0.7463206
Reporting and questions
It is up to you how you want to present the values requested. Rather than getting into R Markdown tables (xtable and pander) for this lab report, I recommend using R data frames (data.frame command). The command setNames allows you to rename the columns of a data frame for purposes of clear labeling.
By empirical pmf (probability mass function), we mean the proportion of experiments in which we draw 0 marbles of the given color, 1 marble of the given color, 2 marbles of the given color, and 3 marbles of the given color. So if is a random variable denoting the number of blue marbles drawn, the empirical pmf is an estimate of , , , and .
• Present the empirical pmf of the number of blue marbles drawn.
• Present the empirical pmf of the number of yellow marbles drawn.
• Present the mean and variance of the number of blue marbles drawn.
• Present the mean and variance of the number of yellow marbles drawn.
• Compare the empirical pmf with the true values.