3 Chapter 03: Discrete Random Variables

Published

March 19, 2023

3.1 Problem 3.1

Let X be a discrete random variable with probability mass function given by

p(x) = \begin{cases} \frac{1}{4} & x = 0 \\ \frac{1}{2} & x = 1 \\ \frac{1}{8} & x = 2 \\ \frac{1}{8} & x = 3 \\ \end{cases}

Verify that p(x) is a probability mass function.
Find P(X \geq 2).
Find P(X \geq 2 | X \geq 1).
Find P(X \geq 1 \cup X \geq 2).

3.1.1 Part (a)

To be a valid probability function p(x) must satisfy the following conditions:

p(x) \geq 0 for all x.
\sum_{x} p(x) = 1.

The first condition is satisfied by construction. The second condition is satisfied by the following calculation:

\sum_{x} p(x) = \frac{1}{4} + \frac{1}{2} + \frac{1}{8} + \frac{1}{8} = 1

3.1.2 Part (b)

We can use the union rule as:

P(X \geq 2) = P(X = 2) + P(X = 3)

P(X \geq 2) = \frac{1}{8} + \frac{1}{8} = \frac{1}{4}

3.1.3 Part(c)

We can use conditional probability as:

P(X \geq 2 | X \geq 1) = \frac{P(X \geq 2 \cap X \geq 1)}{P(X \geq 1)}

Since X \geq 2 is a subset of X \geq 1, we have that:

P(X \geq 2 \cap X \geq 1) = P(X \geq 2)

And Finally:

P(X \geq 2 | X \geq 1) = \frac{\frac{1}{4}}{\frac{3}{4}} = \frac{1}{3}

3.1.4 Part (d)

We can use the union rule as:

P(X \geq 1 \cup X \geq 2) = P(X \geq 1) + P(X \geq 2) \\ - P(X \geq 1 \cap X \geq 2)

Then, similarly to part (c), we can use the intersection rule as:

P(X \geq 1 \cap X \geq 2) = (X \geq 2)

P(X \geq 1 \cap X \geq 2) = \frac{1}{8} + \frac{1}{8} = \frac{1}{4}

And finally:

P(X \geq 1 \cup X \geq 2) = \frac{3}{4} + \frac{1}{4} - \frac{1}{4} = \frac{3}{4}

3.2 Problem 3.2

Let X be a discrete random variable with probability mass function given by

p(x) = \begin{cases} \frac{1}{4} & x = 0 \\ \frac{1}{2} & x = 1 \\ \frac{1}{8} & x = 2 \\ \frac{1}{8} & x = 3 \\ \end{cases}

Use sample to create a sample of size 10000 from X and estimate $P(X = 1) from your sample. Your result should be close to 1/2.
Use table on your sample from part (a) to estimate the pmf of X from your sample. Your result should be similar to the pmf given in the problem.

3.2.1 Part (a)

x <- sample(0:3, size = 10000, replace = TRUE, prob = c(1/4, 1/2, 1/8, 1/8))
mean(x == 1)

[1] 0.5007

3.2.2 Part (b)

x <- sample(0:3, size = 10000, replace = TRUE, prob = c(1/4, 1/2, 1/8, 1/8))
table(x) / 10000

x
     0      1      2      3 
0.2513 0.4970 0.1206 0.1311

3.3 Problem 3.3

Let X be a discrete random variable with probability mass function given by

p(x) = \begin{cases} \frac{C}{4} & x = 0 \\ \frac{C}{2} & x = 1 \\ C & x = 2 \\ 0 & \text{otherwise} \end{cases}

Find the value of C that makes p(x) is a probability mass function.

\sum_{x} p(x) = \frac{C}{4} + \frac{C}{2} + C = 1

C = \frac{4}{7}

3.4 Problem 3.4

Give an example of a probability mass function p whose associated random variable has mean 0.

By definition, the mean of a random variable is:

\mu = \sum_{x} x p(x)

We know the pmf is positive and the sum of all probabilities is 1, so we can choose any symetric random variable over the real numbers, with equal probabilities for both sides.

p(x) = \begin{cases} \frac{1}{4} & x = -2 \\ \frac{1}{4} & x = -1 \\ \frac{1}{4} & x = 1 \\ \frac{1}{4} & x = 2 \\ 0 & \text{otherwise} \end{cases}

The mean is:

\mu = -2 \frac{1}{4} + -1 \frac{1}{4} + 1 \frac{1}{4} + 2 \frac{1}{4} = 0

3.5 Problem 3.5

Find the mean of the random variable X given in Exercise 3.1.

\mu = \sum_{x} x p(x) = 0 \frac{1}{4} + 1 \frac{1}{2} + 2 \frac{1}{8} + 3 \frac{1}{8} = \frac{9}{8}

3.6 Problem 3.6

Let X be a random variable with pmf given by p(x) = 1/10 for x = 1, \ldots, 10 and p(x) = 0 for all other values of x. Find E[X] and confirm your answer using simulation.

E[x] = \sum_{x} x p(x) = 1 \frac{1}{10} + 2 \frac{1}{10} + \ldots + 10 \frac{1}{10} = 5.5

x <- sample(1:10, size = 10000, replace = TRUE, prob = rep(1/10, 10))
mean(x)

[1] 5.5081

3.7 Problem 3.7

Suppose you roll two ordinary dice. Calculate the expected value of their product.

First we need to find the pmf of the product of two dice:

Product	Outcomes	Probability
1	(1, 1)	1/36
2	(1, 2), (2, 1)	2/36
3	(1, 3), (3, 1)	2/36
4	(1, 4), (4, 1), (2, 2)	3/36
5	(1, 5), (5, 1)	2/36
6	(1, 6), (6, 1), (2, 3), (3, 2)	4/36
8	(2, 4), (4, 2)	2/36
9	(3, 3)	1/36
10	(2, 5), (5, 2)	2/36
12	(3, 4), (4, 3), (6, 2), (2, 6)	4/36
15	(3, 5), (5, 3)	2/36
16	(4, 4)	1/36
18	(6, 3), (3, 6)	2/36
20	(5, 4), (4, 5)	2/36
24	(6, 4), (4, 6)	2/36
25	(5, 5)	1/36
30	(6, 5), (5, 6)	2/36
36	(6, 6)	1/36

And then we can calculate the expected value:

E[x] = 1 \frac{1}{36} + 2 \frac{2}{36} + \ldots + 36 \frac{1}{36} = 12.25

We can also use simulation to confirm our answer:

dice_1 <- sample(1:6, size = 10000, replace = TRUE)
dice_2 <- sample(1:6, size = 10000, replace = TRUE)
mean(dice_1 * dice_2)

[1] 12.2766

3.8 Problem 3.8

Suppose that a hat contains slips of papers containing the numbers 1, 2, and 3. Two slips of paper are drawn without replacement. Calculate the expected value of the product of the numbers on the slips of paper.

We need to find the pmf:

Product	Outcomes	Probability
2	(2, 1), (1, 2)	2/6
3	(3, 1), (1, 3)	2/6
6	(3, 2), (2, 3)	2/6

And then we can calculate the expected value:

E[x] = 2 \frac{2}{6} + 3 \frac{2}{6} + 6 \frac{2}{6} = \frac{22}{6}

We can also use simulation to confirm our answer:

event <- replicate(100000, {
  hat <- sample(1:3, size = 2, replace = FALSE)
  prod(hat)
})
mean(event)

[1] 3.66855

3.9 Problem 3.9

Pick an integer from 0 to 999 with all possible numbers equally likely. What is the expected number of digits in your number?

We need to find the pmf:

Number of digits	Outcomes	Probability
1	0-9	10/1000
2	10-99	90/1000
3	100-999	900/1000

And then we can calculate the expected value:

E[x] = 1 \frac{10}{1000} + 2 \frac{90}{1000} + 3 \frac{900}{1000} = 2.89

We can also use simulation to confirm our answer:

number <- sample(0:999, size = 100000, replace = TRUE)
mean(nchar(as.character(number)))

[1] 2.88844

3.10 Problem 3.10

In the summer of 2020, the U.S. was considering pooled testing of COVID-19. This problem explores the math behind pooled testing. Since the availability of tests is limited, the testing center proposes the following pooled testing technique:

Two samples are randomly selected and combined. The combined sample is tested.
If the combined sample tests negative, then both people are assumed negative.
If the combined sample tests positive, then both people need to be retested for the disease.

Suppose in a certain population, 5% of the people being tested for COVID-19 actually have COVID-19. Let X be the total number of tests that are run in order to test two randomly selected people.

What is the pmf of X?
What is the expected value of X?
Repeat the above, but imagine that three samples are combined, and let Y be the total number of tests that are run in order to test three randomly selected people. If the pooled test is positive, then all three people need to be retested individually.
If your only concern is to minimize the expected number of tests given to the population, which technique would you recommend?

3.10.1 Part (a)

To find the pmf of X, we need to determine the probability of each possible value of X:

First, consider the case where neither of the two individuals selected has COVID-19. The probability that the first individual selected does not have COVID-19 is 0.95, and the probability that the second individual selected does not have COVID-19, given that the first individual does not have COVID-19, is also 0.95. Therefore, the probability that neither individual has COVID-19 is:

P = 0.95 \cdot 0.95 = 0.9025

In this case, only one test is needed, since the combined sample will test negative and both individuals can be assumed negative. So, in this case, X = 1 with probability 0.9025.

Now, consider the case where one or both of the individuals selected has COVID-19. The probability that at least one has COVID-19 is the complement of the probability that neither has COVID-19: P = 1 - 0.9025 = 0.0975

In this case, the combined sample will test positive, and both individuals will need to be retested. Since each individual will need to be tested once more, a total of three tests will be needed. So, in this case, X = 3 with probability 0.0975.

X	Probability
1	0.9025
3	0.0975

3.10.2 Part (b)

E[x] = 1 \cdot 0.9025 + 3 \cdot 0.0975 = 1.195

3.10.3 Part (c)

To find the pmf of Y, we need to determine the probability of each possible value of Y:

P = 0.95 \cdot 0.95 \cdot 0.95 = 0.857375

In this case, only one test is needed, since the combined sample will test negative and all three individuals can be assumed negative. So, in this case, Y = 1 with probability 0.857375.

Now consider the case where one or more of the individuals selected has COVID-19. The probability that at least one individual has COVID-19 is:

P = 1 - 0.857375 = 0.142625

In this case, the combined sample will test positive, and all three individuals will need to be retested. Since each individual will need to be tested once more, a total of four tests will be needed. So, in this case, Y = 4 with probability 0.142625.

Y	Probability
1	0.857375
4	0.142625

And the expected value of Y is:

E[y] = 1 \cdot 0.857375 + 4 \cdot 0.142625 = 1.442

3.10.4 Part (d)

Since the expected value of X is 1.195 and the expected value of Y is 1.442, we would recommend the technique that minimizes the expected number of tests, which is the technique that tests two samples at a time.

3.11 Problem 3.11

A roulette wheel has 38 slots and a ball that rolls until it falls into one of the slots, all of which are equally likely. Eighteen slots are black numbers, eighteen are red numbers, and two are green zeros. If you bet on “red” and the ball lands in a red slot, the casino pays you your bet; otherwise, the casino wins your bet.

What is the expected value of a $1 bet on red?
Suppose you bet $1 on red, and if you win you “let it ride” and bet $2 on red. What is the expected value of this plan?

3.11.1 Part (a)

We need to find the probability of winning, which is the probability that the ball lands in a red slot. There are 18 red slots, so the probability of winning is:

P = \frac{18}{38} = 0.4736

The pmf of the random variable X is:

X	Probability
-1	0.5263
1	0.4736

And the expected value of X is:

E[x] = -1 \cdot 0.5263 + 1 \cdot 0.4736 = -0.0527

3.11.2 Part (b)

Under this betting plan, there are three possible outcomes:

The ball lands on black or green on the first spin, and you lose $1.
The ball lands on red on the first spin, and you win $1. Then, the ball lands on black or green on the second spin, and you lose $2.
The ball lands on red on both spins, and you win $1 on the first spin and $2 on the second spin.

The probability of each of these outcomes can be calculated as follows:

Probability of losing on the first spin: \frac{20}{38} = 0.5263
Probability of winning on the first spin and losing on the second spin \frac{18}{38} \cdot \frac{20}{38} = 0.2493
Probability of winning on both spins \frac{18}{38} \cdot \frac{18}{38} = 0.2243

P = \frac{18}{38} \cdot \frac{18}{38} = 0.2243

The pmf of the random variable X is:

X	Probability
-2	0.2493
-1	0.5263
3	0.2243

And the expected value of X is:

E[x] = -2 \cdot 0.2493 - 1 \cdot 0.5263 + 3 \cdot 0.2243 = -0.352

3.12 Problem 3.12

One (questionable) roulette strategy is called bet doubling: You bet $1 on red, and if you win, you pocket the $1. If you lose, you double your bet so you are now betting $2 on red, but have lost $1. If you win, you win $2 for a $1 profit, which you pocket. If you lose again, you double your bet to $4 (having already lost $3). Again, if you win, you have $1 profit, and if you lose, you double your bet again. This guarantees you will win $1, unless you run out of money to keep doubling your bet.

Say you start with a bankroll of $127. How many bets can you lose in a row without losing your bankroll?
If you have a $127 bankroll, what is the probability that bet doubling wins you $1?
What is the expected value of the bet doubling strategy with a $127 bankroll?
If you play the bet doubling strategy with a $127 bankroll, how many times can you expect to play before you lose your bankroll?

3.12.1 Part (a)

The maximum number of bets you can lose in a row without losing your bankroll is 7.

Bet 1: Lose $1, bankroll is now $126
Bet 2: Lose $2, bankroll is now $124
Bet 3: Lose $4, bankroll is now $120
Bet 4: Lose $8, bankroll is now $112
Bet 5: Lose $16, bankroll is now $96
Bet 6: Lose $32, bankroll is now $64
Bet 7: Lose $64, bankroll is now $0

3.12.2 part (b)

The proability of losing all 7 bets is (\frac{20}{38})^7. Thus the probability of winning at least once is:

P = 1 - (\frac{20}{38})^7 = 0.9881

3.12.3 Part (c)

Based on the previous part, we can built the pmf of the random variable X:

X	Probability
-127	0.0119
1	0.9881

And the expected value of X is:

E[x] = -127 \cdot 0.0119 + 1 \cdot 0.9881 = - 0.5232

3.12.4 Part (d)

we can simulate this in R as follows:

event <- replicate(10000 , {
  bankroll <- 127
  num_rounds <- 0
  nth_bet <- 1
  while (bankroll > 0) {
    num_rounds <- num_rounds + 1
    bet_result <- sample(x = c(0, 1), size = 1, prob = c(20/38, 18/38))
    if (bet_result == 0) {
      bankroll <- bankroll - 2^(nth_bet - 1)
      nth_bet <- nth_bet + 1
    } else {
      bankroll <- bankroll + 2^(nth_bet - 1)
      nth_bet <- 1
    }
  }
  num_rounds
})

mean(event)

[1] 991.8872

3.13 Problem 3.13

Flip a fair coin 10 times and let X be the proportion of times that a head is followed by another head. Discard the sequence of ten tosses if you don’t obtain a head in the first nine tosses. What is the expected value of X? (Note: this is not asking you to estimate the conditional probability of getting heads given that you obtained heads).

Let X_{j} be the indicator random variable of HH appearing at position j for j = 1, 2, \dots, 9. These events aren’t quite independent from each other, so the binomial random variable does not accurante describe the distribution of X. However, we can still use the Binomial random variable to calculate the expected value of X, because the expectation is linear.

Getting two heads in a row has probability equal to p = 1/4. So, we look at HH is our success outcome and the rest \{HT, TH, TT \} consist of the failures outcomes. Since we are discarding the sequence of ten tosses if we don’t obtain a head in the first nine tosses, we have n = 9 and p = 1/4. Thus, the expected value of X is:

E[X] = n \cdot p = 9 \cdot \frac{1}{4} = 2.25

3.14 Problem 3.14

To play the Missouri lottery Pick 3, you choose three digits 0-9 in order. Later, the lottery selects three digits at random, and you win if your choices match the lottery values in some way. Here are some possible bets you can play:

$1 Straight wins if you correctly guess all three digits, in order. It pays $600.
$1 Front Pair wins if you correctly guess the first two digits. It pays $60.
$1 Back Pair wins if you correctly guess the last two digits. It pays $60.
$6 6-Way Combo wins if the three digits are different and you guess all three in any order. It pays $600.
$3 3-Way Combo wins if two of the three digits are the same, and you guess all three in any order. It pays $600.
$1 1-Off lets you win if some of your digits are off by 1 (9 and 0 are considered to be one off from each other). If you get the number exactly correct, you win $300. If you have one digit off by 1, you win $29. If you have two digits off by 1, you win $4, and if all three of your digits are off by 1 you win $9.

Consider the value of your bet to be your expected winnings per dollar bet. What value do each of these bets have?

3.14.1 Part (a)

$1 Straight wins if you correctly guess all three digits, in order. It pays $600.

The total amount of money you can win is $(600 - 1). The total amount of money you can lose is $1. The proability of winning is \frac{1}{n} \cdot \frac{1}{n} \cdot \frac{1}{n} where n=10 and the probability of losing is the complement. Thus, the expected value of this bet is:

E[X] = (600 - 1) \cdot \frac{1}{1000} - 1 \cdot \left(1 - \frac{1}{1000}\right) = -0.4

We can also simulate this in R as follows:

event <- replicate(100000 , {
  my_bet <- sample(x = 0:999, size = 1, replace = TRUE)
  lottery <- sample(x = 0:999, size = 1, replace = TRUE)
  if(lottery == my_bet) 599 else -1
})

mean(event)

[1] -0.37

3.14.2 Part (b)

$1 Front Pair wins if you correctly guess the first two digits. It pays $60.

Quite similar to the previous part, the total amount of money you can win is $(60 - 1). The total amount of money you can lose is $1. The proability of winning is \frac{1}{n} \cdot \frac{1}{n} where n=10 and the probability of losing is the complement. Thus, the expected value of this bet is:

E[X] = (60 - 1) \cdot \frac{1}{100} - 1 \cdot \left(1 - \frac{1}{100}\right) = -0.4

We can also simulate this in R as follows:

event <- replicate(100000 , {
  my_bet <- sample(x = 0:99, size = 1, replace = TRUE)
  lottery <- sample(x = 0:999, size = 1, replace = TRUE)
  if(lottery %/% 10 == my_bet) 59 else -1
})

mean(event)

[1] -0.3958

3.14.3 Part (c)

$1 Back Pair wins if you correctly guess the last two digits. It pays $60.

It’s basically the same. So, let’s just calculate the expected value:

E[X] = (60 - 1) \cdot \frac{1}{100} - 1 \left(1 - \frac{1}{100}\right) = -0.4

And just for the record, here is the simulation:

event <- replicate(100000 , {
  my_bet <- sample(x = 0:99, size = 1, replace = TRUE)
  lottery <- sample(x = 0:999, size = 1, replace = TRUE)
  if(lottery %% 100 == my_bet) 59 else -1
})

mean(event)

[1] -0.4174

3.14.4 Part (d)

$6 6-Way Combo wins if the three digits are different and you guess all three in any order. It pays $600.

With a 3-digit number (All digits different) we have 6 (3!) ways to guess all three digits in any order. So it means that the probability of winning is \frac{6}{1000}. Thus the expected value is:

E[X] = (600 - 6) \cdot \frac{6}{1000} - 6 \cdot \left(1 - \frac{6}{1000}\right) = -2.4

We can simulate this in R as follow:

event <- replicate(100000, {
  my_bet <- sample(x = 0:9, size = 3, replace = FALSE)
  lottery <- sample(x = 0:9, size = 3, replace = TRUE)
  if(all(my_bet %in% lottery)) 594 else -6
})

mean(event)

[1] -2.202

3.14.5 Part (e)

$3 3-Way Combo wins if two of the three digits are the same, and you guess all three in any order. It pays $600.

With a 3-digit number where 2 digits are the same we have 3 ways to guess all three digits in any order. So it means that the probability of winning is \frac{3}{1000}. Thus the expected value is:

E[X] = (600 - 3) \cdot \frac{3}{1000} - 3 \cdot \left(1 - \frac{3}{1000}\right) = -1.2

We can simulate this in R as follow:

event <- replicate(100000, {
  my_bet <- sample(x = 0:9, size = 2, replace = FALSE)
  my_bet <- c(my_bet, my_bet[1])
  lottery <- sample(x = 0:9, size = 3, replace = TRUE)
  if(identical(sort(my_bet), sort(lottery))) 597 else -3
})


mean(event)

[1] -1.248

3.14.6 Part (f)

$1 1-Off lets you win if some (*of your digits are off by 1 (9 and 0 are considered to be one off from each other). If you get the number exactly correct, you win $300. If you have one digit off by 1, you win $29. If you have two digits off by 1, you win $4, and if all three of your digits are off by 1 you win $9.

This case is a combination of the previous cases. So, let’s calculate the expected value:

E[X] = (300 - 1) \cdot \frac{1}{1000} + (29 - 1) \cdot \frac{6}{1000} +

(4 - 1) \cdot \frac{6}{1000} + (9 - 1) \cdot \frac{2}{1000} - 1 \cdot \left(1 - \frac{15}{1000}\right) = -0.484

We can simulate this in R as follow:

event <- replicate(100000, {
  my_bet <- sample(x = 0:9, size = 3, replace = TRUE)
  lottery <- sample(x = 0:9, size = 3, replace = TRUE)
  
  diff <- abs(my_bet - lottery)

  if(identical(my_bet, lottery)) 299
  else if(sum(diff) == 1) 28
  else if(all(diff <= 1) && sum(diff) == 2) 3
  else if(all(diff <= 1) && sum(diff) == 3) 8
  else -1

})

mean(event)

[1] -0.38679

3.15 Problem 3.15

Let k be a positive integer and let X be a random variable with pmf given by p(x) = 1/k for x=1,...,k and p(x) = 0 for all other values of x. Find E[X]

We can use the definition of the expected value:

E[X] = \sum_{x=1}^k x \cdot \frac{1}{k} = \frac{1}{k} \sum_{x=1}^k x

E[X] = \frac{1}{k} \cdot \frac{k(k+1)}{2} = \frac{k+1}{2}

3.16 Problem 3.16

Suppose you take a 20-question multiple choice test, where each question has four choices. You guess randomly on each question.

What is your expected score?
What is the probability you get 10 or more questions correct?

3.16.1 Part (a)

This is a Binomial Random Variable with n=20 and p=0.25. So, the expected value is:

E[X] = np = 20 \cdot 0.25 = 5

We can simulate this in R as follows:

event <- replicate(100000, {
  my_answers <- sample(x = 0:3, size = 20, replace = TRUE)
  correct_answers <- sample(x = 0:3, size = 20, replace = TRUE)
  sum(my_answers == correct_answers)
})

mean(event)

[1] 4.99481

But also we can use the built in function rbinom:

mean(rbinom(100000, size = 20, prob = 0.25))

[1] 4.99401

3.16.2 Part (b)

The probability of getting 10 or more questions correct is:

P(X \geq 10) = 1 - P(X < 10) = 1 - \sum_{x=0}^{9} \binom{20}{x} \cdot 0.25^x \cdot 0.75^{20-x}

The calcs are quite tedious for this one, so lets go stright to the code:

1 - sum(dbinom(0:9, size = 20, prob = 0.25))

[1] 0.01386442