The course instructor is Simon Wood (Fry GA.05). There are 2 lectures (see below) and one computer lab based tutorial (Tuesday 12 - see BB page) each week. 20% of the course mark will be from an asessed practical project. The remaining 80% comes from a 2.5 hour exam consisting of 4 questions (no choice): an example is provided below. Tutorial sheet questions provide the best preparation for the exam. Office hours are Tuesday 13-14 (see BB page).

- p-values and testing Section 8.5, also regularity conditions and test inversion.
- AIC Section 8.9 on Akaike's Information Criterion.
- Bayesian MCMC Section 9-9.6 on Bayesian stochastic simulation by Metropolis Hastings Markov Chain Monte Carlo.
- Gibbs sampling Section 9.7-9.11, also covering convergence diagnostics and credible intervals.
- DAGs, Gibbs and JAGS Section 10.
- More Bayes Section 11. Point estimation and model comparison.

Please install R on you own computer from CRAN, and the R package 'rjags' from the same place. 'rjags' also requires that you install JAGS as standalone software on your computer.

- Lecture notes. These may be updated as the course progresses.
- Matrix notes. Essential revision notes on matrices.
- Core Statistics is a short textbook covering the material in this course, along with background and extensions. Chapter one should be reviewed for the essential background assumed by the course.
- CRAN is the place to get your own (highly recommended) free copy of the R statistical computing language and environment used in the course. Notice the
*Documentation*section on the left of the CRAN page --- it has alot of useful information. - The JAGS user manual is a handy reference for when we cover Bayesian computation.
- D.R. Cox (2006)
*Principles of Statistical Inference*also covers much of the material in the course. - A.C. Davison (2003)
*Statistical Models*covers everything we cover and much more. - G. Casella and R.L Berger (1990)
*Statistical Inference*covers many of the topics in greater mathematical depth. - Daniel Kahneman's
*Thinking Fast and Slow*has alot of interesting things to say about statistical reasoning, and the built in flaws in how we humans tend to reason and make inferences.

Need to brush up on basic matrix algebra etc? Try

- Basic Matrix Algebra
- Simple matrix differentiation
- Simple explanation of multivariate Taylor expansion

Exam papers from before 2018 are not a good guide to the course exam, because the course content has been modified to increase the emphasis on practical application of statistical inference theory (e.g. the introduction of weekly labs and assessed coursework). While studying your notes, attending labs and handing in work for marking remain the best ways to prepare for the exam, here is an example exam (with solutions) to indicate roughly what to expect in terms of paper style and question format. Here is the 2019 exam and solution

Given that the remaining lectures are now online (see above) I have uploaded all the remaining tutorial sheets, so that you can attempt them early, if you go through the material early. In the current situation I would strongly recommend working online with your project group on tutorial problems. If you help someone else, the act of explaining is better than anything else at consolidating your knowledge. If you need help from someone else, the benefits are obvious. If you finish a tutorial sheet early and want the solutions, just email me with the subject line 'TOI solutions request'. There is an online computer lab/tutorial session at 12 on Tuesdays (see Black Board page). To get tutorial work marked, upload it to the course Black Board page as detailed there, by the end of Thursday.

- Tutorial sheet 1 (with solutions) is about making sure that you are up to speed with essential background.
- Tutorial sheet 2 (with solutions) is about practical inference with linear models.
- Tutorial sheet 3 (with solutions) covers some more inference with linear models.
- Tutorial sheet 4 (with solutions) covers some causal inference and linear model theory.
- Tutorial sheet 5 (with solutions) covers use of MLE theory.
- Tutorial sheet 6 (with solutions) covers more use of MLE theory.
- Tutorial sheet 7 (with solutions) more MLE.
- Tutorial sheet 8 (with solutions) An MCMC exercise, and some critical interpretation.
- Tutorial sheet 9 (with solutions) Gibbs sampling with JAGS.
- Tutorial sheet 10 (with solutions) Gibbs sampling with JAGS - extended example.

Here are some datasets used in labs. To read them directly into R, use something like:

`dat <- read.table("https://people.maths.bris.ac.uk/~sw15190/TOI/confound.txt")`

CORRECTION: in class I said that the summation up to K in the model started at 1. For M0019 only, this is wrong, it should start at 0 or the form used in the slightly modified sheet below should be used.

The coursework assignments are here:

- MATH35600 practical.
- MATHM0019 practical.
- MATHM0019 alternative practical (only use this if already discussed with Arne Kovac).

In contrast to the above paper, some statistical shockers are being used to argue for the current policy. For example, an article in the Guardian argued that there would be no life lost as a result of the large recession likely to be caused by the current measures, and that the weight of evidence suggested that life expectancy would actually be increased by recession. Following the link for this `weight of evidence' takes you to a single paper. Unfortunately the full text is behind a paywall, but there is a summary. Hopefully after our discussons on confounding and causality you can see what is wrong with going from their results to their conclusion (unless you can't think of anything else that might have changed over time and impacted life expectancy, in which case it's all fine). Sadly this kind of problem is not rare, although this particular example is rather spectacular. Note that I can't tell you for certain that a recession will lead to substantial loss of life, but there is a 5-8 year difference in life expectancy between the richest and poorest in the UK and data comparing regions of the US shows a 2/3 of a year reduction in life expectancy for each percentage point increase in unemployment. It's quite a gamble (with other people's lives) to simply assume that all these effects are down to confounding, which is what those saying there will be no life lost to recession are doing. In fact the gap between life expectancy of the top 10% and bottom 10% in the UK grew by 20 weeks in the aftermath of the 2008 financial crisis, an increase that is difficult to ascribe to confounding (see here for the data). That's a loss of life of 2 weeks per person if it were averaged over everyone, rather than being concentrated in the poorest group. 2 weeks is about the average loss of life per person expected if you did nothing to mitigate the covid-19 pandemic.

Part of the current mood is down to a problem in communicating risks to non-specialists. An excellent attempt to redress this is the More or less Corona special (although you may have the feeling that the first and last contributors have not quite managed to communicate to each other). David Spiegelhalter finds an excellent way of communicating what the Corona risks really mean to individuals - that if you get the disease (and show symptoms) you will essentially have a year's worth of your risk of dying compressed into a couple of weeks. So it is a risk you'd avoid if you can, but probably not with measures out of all proportion to those you usually take to avoid your risk of dying in a year. It's also not clear to me why we would take measures to protect others out of all proportion to the measures we are prepared to take to protect them over a year (and I routinely vote to spend more of my money on protecting others). Let me know if you have thoughts on that one. My guess is that the explanation is to be found in Daniel Kahneman's *Thinking Fast and Slow*, which would make better lock down reading than anything I can teach you.

My miniscule contribution to the public discussion is here, while this article makes a number of interesting points, including the one that even the data on `deaths from' corona are not what they might seem. Not reported there is that one of the big problems with reported data on death rates, in particular, is that while almost 100 percent of deaths are known and reported, only a small proportion of the cases is known. The only country that is close to attempting to rectify this is Iceland, where they have a testing programme aiming to directly get at true disease prevalence. It's still not fully randomized, so we still have confounding problems, but if we treat their data as coming from a near random sample we get some idea of the size of the problem. At time of writing Iceland has 890 cases having tested 3 percent of its population and has had 3 deaths, generally reported as a 0.3 percent crude death rate. But the 890 cases come from 3 percent of the population and the deaths from the whole population. If the 3 percent were a truly random sample, then there would have to be about 30000 cases in the whole population, suggesting a crude death rate of 1 in 10000. A more careful accounting using the published information from Iceland suggests 3-9000 cases in addition to the known ones, suggesting a crude death rate considerably below 1 in 1000. I have written `crude' death rate here, because although it is what is commonly reported it neglects disease duration effects that matter.

The 2019 coursework assignments are here:

- MATH35600 practical on Grouse and Hen Harriers.
- MATHM0019 practical on blowflies.

The 2018 course work is here:

- MATH35600 practical (2018)
- MATHM0019 practical (2018)
- Example of a nice practical report. This is nice because it is well structured, convincingly argued and targets the intended audience given in the assignment sheet well, and the work itself is well done. It is not intended to be used as a template for writing a report!!