FI 450/597 – 9.14.2021 – Distributions (part 1)






Some basic R stuff to get us going


In R, we have some of the same structures we’ve had in Python.

a_vector <- 1:6 # vectors are kinda defaults in R
my_list_of_vectors <- list(1:6) # lists are multidimensional
my_list_of_vectors <- list(1:6,1:6) # lists are multidimensional

my_vector <- c(1,2,3,4,5,6)
typeof(my_vector)

[1] “double”

my_other_vector <- c('one','two','three','four','five','six')
typeof(my_other_vector)

[1] “character”

length(my_vector)

[1] 6

my_other_vector[0] # nope

character(0)

my_other_vector[1] # primitive... but whatevs

[1] “one”

my_other_vector[my_vector[1]] # cool!

[1] “one”

wait_what <- list(c(rep(my_other_vector,3)),'450/597',450/597,rep(my_list_of_vectors,2))


Working with these more complex data structures is doable in R, but it’s usually much easier to handle in Python.

There are also data.frame (in both Python (import pandas) and R) and data.table (in its nascient stages in Python, but not there yet, and in full effect in R) structures. The latter of which is one of humanity’s crowning achievements! We won’t be using them right away, but you can think of them as lists of same-length lists


Distributions in R


Generating a discrete uniform distribution

X <- c(1,2,3,4,5,6) # support
k <- length(X) # get the length of X
p_X <- rep(1/k,k) # generate a vector of repeated series of numbers (np.linspace?)

We can see this distribution!

plot(p_X) # boring

plot(p_X,xlab='X',ylab='p(x)',main='Uniform distributions are boring')

What about F(x)?

F_X <- cumsum(p_X) # cumulative summation of probs
my_F_plot <- plot(F_X) # there's no reason to be fancy when we're exploring!

Just like we did in Python, we can get random draws from our distribution with implicit uniform probability.

sample(X,1) # draw one at random

[1] 4

sample(X,2) # or two

[1] 3 1

sample(X) # what does this do? NOT rearrange, but sample same length without replacement!

[1] 3 5 6 1 2 4

sample(X,replace=TRUE)

[1] 5 2 5 1 2 2

rep(sample(X,6,replace=TRUE),5) # note how this works... EXACTLY like it should!

[1] 3 3 2 1 3 6 3 3 2 1 3 6 3 3 2 1 3 6 3 3 2 1 3 6 3 3 2 1 3 6

sample(X,replace=TRUE,prob=p_X) # assigned our probabilities (still uniform, though)

[1] 1 1 1 4 1 2

# but there is a ton to learn from this one line of code
sample(X,replace=TRUE,prob=c(1,rep(0,length(p_X)-1)))

[1] 1 1 1 1 1 1


Frequency distributions (beginning with (a,b,0)


Why are frequency distributions important?
What do we call the kinds of risk management techniques that are aimed at reducing the frequency of losses?


Bernoulli


Fundamental building block of our frequency distributions. It basically lets us say, “either something happens or it doesn’t.”

\[
X=\{1,0\}\\
\text{Pr}(p) + \text{Pr}(q)=1\\
q=1-p
\]

We’ve seen how to do this already.

sample(c(0,1),1)

[1] 1

# we can make functions in R, too
bern <- function(p){
  bern_out <- sample(c(1,0),1,prob=c(p,1-p))
  return(bern_out)}


Binomial


Consider a set of \(m\) events, each being a Bernoulli trial. That is, we have an \(m\)-convolution of iid Bernoulli random variables. The distribution of the potential \(m\)-trial outcomes is a binomial distribution.

\[
p_k = \binom{m}{k} p^k(q)^{m-k}\text{, with }k=0,…,m
\]

Since a binomial distribution describes the outcomes of repeated Bernoullis, we can create one by simulating a bunch of Bernoulli trials of length \(m\)!

m <- 30 # total number of iid Bernoullis
p <- 0.6 # prob of any one outcome

# what's going on here?
outcomes <- rep(0,m)
trial <- replicate(m,bern(p)) # that's what we wanted to do
trial <- function(m,p){
  t <- replicate(m,bern(p))
  t <- sum(t)
  return(t)
}
for(i in 1:100000){
  t <- trial(m,p)
  outcomes[t] = outcomes[t]+ 1
} # replicate our function m times
our_binom <- outcomes/sum(outcomes)

Let’s see how it looks.

plot(our_binom)

That’s normal. HA!


Luckily, R has an easier (and faster) way to generate many distributions, including a binomial.

m <- 30 # total number of iid Bernoullis
p <- 0.6 # prob of any one outcome
k <- seq(0, 30, 1) # make a "sequence" of numbers from 0 to 30 (inclusive)
f_m <- dbinom(k, m, p) # get a binomial "dbinom" is DENSITY of a binomial!

We can see that this looks just like our homemade version.

plot(f_m)


It is better to use a sequence for \(x\) values, rather than relying on the index.

plot(k,f_m)


Using this approach, it’s simple to generate another binomial distribution with different \(p\) value “inside” each of the Bernoullis. Also, we can add some color.

f_m2 <- dbinom(k,m,(p-.1)) # another one
plot(k,f_m,xlab='k',ylab='f_m(k)',
     main='Comparing binomials with different success probabilities', col='blue')
lines(k,f_m2,col='red',type='p')


Using the same function structure with pbinom rather than dbinom, we can generate the CDF for the distribution. Note that we’re using the same \(k\), \(m\), and \(p\) values.

F_m2 <- pbinom(k,m,p)
plot(k,F_m2,main='Binomial CDF')


Questions


Suppose you’re flipping an unfair coin with Pr(heads) = 0.6. If you flip the coin 25 times, what’s the probability that you get exactly 12 heads? What’s the probability that you get at least 11 heads?


Answers

dbinom(12,25,0.6) # k,m,p == num_successes, num_trials, prob_success_in_each

[1] 0.07596671

1-pbinom(10,25,0.6) # why 10?

[1] 0.9656085

pbinom(10,25,0.6,lower.tail=FALSE) # the 1-F seems more inline with our thinking

[1] 0.9656085