FI 450/597 – 9.16.2021 – Distributions (part 2)
Binomial (continued)
There are two other distribution related functions that we want to be comfortable using in our work: qbinom and rbinom.
median_value <- qbinom(0.5,100,.6) # 100 trials with prob(success)=0.6
median_value
[1] 60
tenth_percentile <- qbinom(0.1,100,.6)
tenth_percentile
[1] 54
We can also simulate binomial experiments!
Consider a single binomial experiment consisting of 10 Bernoulli trials, each with a success probability of 0.6. Doing this once will not give us a distribution, right? That is, this binomial experiement will return a single outcome. The binomial distribution, itself, is the distribution of these outcomes.
If we want to do our experiment (consisting of 10 Bernoulli trials) 15 times and view the results of these, we can do this easily.
results <- rbinom(15,10,0.6)
results
[1] 7 8 7 6 6 4 6 5 5 4 4 6 9 4 7
Question
In the Intergalactic Shooting Olympics, an Iridium Medal is given for hitting a target 3 out of 3 trials, and an Osmium Medal is awarded for hitting a target 5 out of 5 times. A well-trained stormtrooper hits the target 90% of the time, and the probability of hitting the target on any one attempt is independent of all other attempts. Each trooper will choose to shoot 3 times, or 5 times, but not both.
- What is the probability that a well-trained stormtrooper shooting 3 times will win an Iridium Medal?
# part a
p <- 0.9
dbinom(3,3,p)
[1] 0.729
- What is the probability that a well-trained stormtrooper will score at least two hits in three shots?
# part b
1-pbinom(1,3,p)
[1] 0.972
- Assume that the performance of each stormtrooper is independent of the performance of all others. The planet Zeus has 4 stormtroopers, each shooting 5 times for the Osmium Medal in the Shooting Olympics. What is the probability that Zeus will win at least one Osmium Medal?
# part c
stwins <- dbinom(5,5,p)
1-pbinom(0,4,stwins)
[1] 0.9718772
Poisson
This distribution is all about arrivals, and these arrivals can be in time or space. We require only that arrivals are independent and occur at a constant mean rate.
Suppose the number of HW questions you can answer in an hour is a Poisson process with a mean of 4. What is the probability that you answer 6 questions in an hour?
lambda <- 4 # notice that this is the mean and variance
dpois(6,lambda)
[1] 0.1041956
What is the probability of finishing 3 or fewer questions in the next hour?
dpois(0,lambda)+dpois(1,lambda)+dpois(2,lambda)+dpois(3,lambda)
[1] 0.4334701
ppois(3,lambda)
[1] 0.4334701
What if you went on a question-solving bender and solved problems for 1000 hours?
outcomes <- rpois(1000,lambda)
What would be the resulting distribution of problems solved per hour?
hist(outcomes)
Notice that this isn’t in terms of probabilities. To get things outcomes in terms of probabilities, use an alternative histogram output through the histogram option in plot.
successes <- 0:max(outcomes) # set up our x axis
plot(successes,dpois(successes,lambda=lambda), type='h', ylab="prob",main="Poisson Outcomes (lambda = 4)")
We can also pass the vector successes to dpois to get the corresponding probabilities.
dpois(successes,lambda=lambda)
Of course, we can get the percentiles of a poison using qpois.
Geometric Distribution
The geometric distribution builds (again) on Bernoulli trials. Let’s set ourselves up (read: set YOURSELF up) to use some more of R’s inherent firepower to generate the distribution from scratch. However, as you expect, we’ll simply be able to conjure up the desired distribution.
X <- 1:6 # we've seen this part
a.sample <- rep(sample(X),2) # and this...
a.sample
[1] 4 6 3 1 5 2 4 6 3 1 5 2
two.samples <- replicate(2,sample(X)) # and we mentioned this, but let's look closer
two.samples
[,1] [,2]
[1,] 3 5
[2,] 5 2
[3,] 6 3
[4,] 4 4
[5,] 1 6
[6,] 2 1
Replicate ran our sampling twice, and stored each outcome separately.
Check these out!
two.samples[1]
[1] 3
two.samples[3]
[1] 6
two.samples[8] # Why is this important, given what we did last class?
[1] 2
two.samples[3,]
[1] 6 3
two.samples[,2]
[1] 5 2 3 4 6 1
So we can access sample \(i\) by using [,\(i\)]. More useful functions:
2 %in% two.samples[,2] # is 2 in this vector?
[1] TRUE
two.samples[,2]==2 # is EACH element == 2?
[1] FALSE TRUE FALSE FALSE FALSE FALSE
which(two.samples[,2]==2) # which one(s) == 2?
[1] 2
which(c(1,1,1,2,3)==1) # be aware!
[1] 1 2 3
which(c(1,1,1,2,3)==1)[1] # where is the FIRST location?
[1] 1
What it is depends on who you ask…
Version 1:
A geometric distribution captures the number of Bernoulli trials it takes to get the first success.
Version 2:
A geometric distribution captures the number of failures in a series of Bernoulli trials occurring before we get the first success.
Since you know that we are working with the (a,b,0) class of distributions, which of these versions is ours?
We have all the expected distribution functions available for the geometric distribution in R.
X <- seq(0,12)
p <- 0.4
plot(X,dgeom(X,p),main=sprintf('Geometric distribution (with p = %s)',p)) # check out that title!
Homework 4 – Question 3 (FI 597 only)
Using our single Bernoulli trial function and a you-modified version of multi-trial function (see notes from last class), generate and plot a geometric distribution given Bernoulli success probability of 0.10. Do this on the interval [0,20]. What does the x axis of your plot represent? Be careful not to give me a conditional probability!
Here’s a little hint to help you get started!
X <- seq(0,20)
counter <- rep(0,21)
for(i in 1:10000){
...
// add bootstrap table styles to pandoc tables function bootstrapStylePandocTables() { $('tr.odd').parent('tbody').parent('table').addClass('table table-condensed'); } $(document).ready(function () { bootstrapStylePandocTables(); });