$24
Submit your work as a PDF, or a Python notebook, or both if you want to separate your code and your report. Please type up your answers, using Google Docs, LaTeX, Jupyter notebooks, CoCalc, or a any other software that allows you to type text and math. Make sure your code is readable and commented.
Show your work for all exercises! Do not simply turn in final answers.
1. Call center data modeling
Complete the call center data modeling assignment we start in the Preclass work and Activity 2 breakouts of Session 2.2. You may reuse and build on all code or any other work from the class session.
In class we completed the Bayesian data modeling problem for 1 hour of the day. In this assignment you need to do the same analysis for all 24 hours of the day.
Compute a 95% posterior confidence interval over the number of calls per minute (the call rate λ) for each hour of the day — so you will have 24 confidence intervals. Also compute the posterior mean of λ for each hour of the day.
Present your results graphically using Matplotlib. Make a plot that looks like the one below. Each dot is at the posterior mean and each line shows a 95% confidence interval for a λ. You can use the errorbar() function in the plotting library to do this.
Write a paragraph (100–200 words) to accompany your plot and present your findings to the client. Carefully summarize how many calls you expect during different parts of the day, and how much uncertainty there is in your estimates. Remember that the client is not an expert in statistics, so make it easy for them to understand. You may also make additional plots to help communicate your results.
Stretch goal (optional)
Reparameterize the normal likelihood function in terms of the precision parameter,
τ=1/σ2.
Prove that if you substitute into the normalinversegamma pdf σ =1/√τ,you get the normalgamma pdf, which is the conjugate prior for the normal likelihood with unknown mean μ and precision τ.
3. As part of this transformation you will have to multiply by || dσdτ || .Explain why this is needed
in your own words and so that a student who has not yet encountered this concept can understand it.
More practice exercises (optional)
Below are additional practice exercises for you to attempt. These are optional and you can choose to do as many or as few as you want. These exercises will not be graded.
If you get stuck on any of them, contact your instructor with specific questions via email and during office hours. Just saying “I’m stuck” is not enough — explain what you tried and where you got stuck so your instructor can understand your thinking and where you might have missed something or made a mistake.
Answer the following questions using Python.
Generate 1000 samples from a normal distribution with mean 100 and standard deviation 10. How many of the numbers are at least 2 standard deviations away from the mean? How many to you expect to be at least 2 standard deviations away from the mean?
Toss a fair coin 50 times. How many heads do you have? How many heads to you expect to have?
Roll a 6sided die 1000 times. How many 6s did you get? How many 6s do you expect to get?
How much area (probability) is to the right of 1.5 for a normal distribution with mean 0 and standard deviation 2?
Let y be the number of 6s in 1000 rolls of a fair die.
Draw a sketch of the approximate distribution of y,based on the normal approximation.
Using the normal distribution function in SciPy, give approximate 5%, 25%, 50%, 75%, and 95% points for the distribution of y.
A random sample of n students is drawn from a large population, and their weights are measured. The average weight of the sampled students is ˉy=75 kg. Assume the weights in the population are normally distributed with unknown mean μ and known standard deviation 10 kg. Suppose your prior distribution for μ is normal with mean 180 and standard deviation 40.
Give your posterior distribution for μ.(Your answer will be a function of n.)
A new student is sampled at random from the same population and has a weight of y′ pounds. Give a posterior predictive distribution for y′.(Your answer will still be a function of n.)
For n=10,give a 95% posterior interval for theta and a 95% posterior predictive interval for y′.
Do the same for n=100.
Perfectly and partially observed data in the exponential model.
Suppose y| λ is exponentially distributed with rate λ,and the prior distribution of λ is Gamma (α,β).Suppose we observe that y≥100,but do not observe the exact value of y.What is the posterior distribution, p(λ| y≥100),as a function of α and β? Write down the posterior mean and variance of λ.
In the above problem, suppose that we are now told that y is exactly 100. Now what are the posterior mean and variance of λ?
Explain why the posterior variance of λ is higher in part (b) even though more specific information has been observed.