Theory of Inference 2015/16 (MATH 35610, MATH M0022)

This unit has a mission statement!

Helping people to make better choices under uncertainty.

Lecturer	Jonathan Rougier, j.c.rougier@bristol.ac.uk
Level	H/6 and M/7, 10cp, TB2
Official unit page	level H/6, level M/7
Timetable	1300-1350, Mon, Maths SM1
	1600-1650, Mon, Maths SM2
	1300-1350, Tue, Maths SM1
	Please note: All lectures start promptly on the hour and last for fifty minutes.
Office Hour	1100-1150, Mon, room 3.0, ST MICHAELS HILL 31-37

Navigation: Course outline, details, homework and assessment.

Click on 'details' to see a lecture-by-lecture summary.

Announcements

Office Hours. Mon 8 May, 11am, PC2. Tue 16 May, 3pm, SM2. Mon 22 May, 2pm, SM2.
May Bank Holiday: rescheduled lectures. The 1 May lectures at 1300 and 1600 have been canceled. The new lectures are: Tue 2 May 1000 in SM3, and Fri 5 May 1700 in SM2 (these should be in your calendar). The Office Hour on Mon 1 May 1100 is also cancelled. Instead, I will be available on Tue 2 May between 1100 and 1300 in my office — feel free to come along whenever suits.
Level 7/M assignment now set.
Answers and feedback now available for HW2.
Office hour now confirmed (see above). See here for details of the room.
Office hours: Still awaiting confirmation of the time and place.

Course outline

The basic premise of inference is our belief that the things we would like to know are related to other things that we can measure. This premise holds over the whole of the sciences. The distinguishing features of statistical science are

A probabilistic approach to quantifying uncertainty, and, within that,
A concern to assess the principles under which we make good inferences, and
The development of tools to facilitate the making of such inferences, particularly those using data.

This course illustrates these features at a high level of generality, while also covering some of the situations in science and policy where they arise. See Details for more information.

Reading

There is a comprehensive handout for this course. The following are just suggestions if you are interested in following up the topics in the lectures.

For statistical theory, at the appropriate level of difficulty, see

D.R. Cox, 2006, Principles of Statistical Inference, Oxford University Press.
G. Casella and R.L. Berger, 2002, Statistical Inference, 2nd edn, Brooks/Cole.

For the use of probability and statistics in society,

I. Evans et al, 2011, Testing Treatments: Better Research for Better Healthcare, 2nd edition, London, UK: Pinter & Martin. Thought-provoking account of evidence in healthcare.
G. Gigerenzer, 2003, Reckoning with Risk: Learning to Live with Uncertainty, London, UK: Penguin.
S. Senn, 2003, Dicing with death: Chance, Risk and Health, Cambridge UK: Cambridge University Press.
D. Kahneman, 2012, Thinking, Fast and Slow, London UK: Penguin. Awesome. Although Ch4 had to be retracted.
P. Tetlock and D. Gardner, 2015, Superforecasting: The Art and Science of Prediction, London UK: Random House (Penguin).

Comment on the exam

Previous exam papers are available on Blackboard. You should be aware that the course continues to evolve, and these questions cannot be taken as a reliable guide to the questions that will be set this year.

Answers to previous exam papers will not be made available. The exam is designed to assess whether you have attended the lectures, read and thought about your lecture notes and the handouts, done the homework, and read a bit more widely in the textbooks. Diligent students who have done the above will gain no relative benefit from studying the answers to previous exam questions. On the other hand, less diligent students may suffer the illusion that they will do well in the exam, when probably they will not.

Instead, I will supply 'exam-style' questions in the homeworks for revision purposes.

Finally, please note that in the exam ALL questions will be used for assessment. The number of marks will not necessarily be the same for each question.

Course details

Here is a summary of the course, looking as far ahead as seems prudent. This plan is subject to revision. There will be some time at the end for revision of the major themes.

Introduction: Models, prediction, and inference

Statistics, another short introduction

6 Mar, 1pm. Introduction. Random quantities, a statistical model is a family of probability distributions indexed by θ in Ω. Predictands, observables, observations. The challenge of prediction: to provide an algorithm to map from the observations y to a distribution for the predictands X. True endpoints and surrogate endpoints.
6 Mar, 4pm. The fundamental difficulty of science: the things we are interested in are not the same as the things we can measure. The statistical model as a working hypothesis, what we mean by the 'true' value Θ. Two approaches to collapsing to a single distribution for X: plug-in, and integrate-out. Why Frequentists prefer the former, and Bayesians the latter. Why we focus on inference even though our task is prediction.
7 Mar. Bayesian and Frequentist approaches to inference. Bayesians do everything through the posterior distribution; requires them to specify a prior distribution. Frequentists specify an algorithm, and then follow the Principle of certification. 'Unbiased estimator' as an example of a certificate for a point estimator for Θ.
Principles for statistical inference
Handout, Principles for Statistical Inference.
13 Mar, 1pm. Statistical principles: what can they do for us? The template for how we use principles. Our first example, the Weak Indifference Principle (WIP). Is this self-evident? Maybe not, but it does not matter because we can prove that it is implied by two principles which are self-evident, the Distribution Principle (DP) and the Transformation Principle (TP).
13 Mar, 4pm. The 'two instruments' thought experiment. The Weak Conditionality Principle (WCP) — this is self-evident. The Likelihood Principle — surely this cannot be defended? Birnbaum's bombshell, (WIP ∧ WCP) ↔ LP. All Frequentist inference violates the LP, because certification of an algorithm depends on the probabilities attached to outcomes which did not happen.
14 Mar. Stronger forms of conditionality. Ancillary random quantities and the Conditionality Principle (CP). LP → CP. Nuisance parameters and auxiliary random quantities, the Strong Conditionality Principle (SCP) and the Sure Thing Principle (STP). (LP ∧ STP) → SCP, although we did not prove this in the lecture (in the handout).
20 Mar, 1pm. Sequential experiments and stopping rules. The Stopping Rule Principle (SRP) — surely that cannot possibly be defended? Although it would be very handy. LP → SRP, one of the most beautiful results in mathematical statistics.
20 Mar, 4pm. Wrap-up on principles. Likelihood-based inference (LBI), more liked by philosophers and physicists than by applied statisticians. LBI respects the LP. Bayesian inference respects the LP (proof to follow tomorrow). Frequentist inference does not respect the LP. So are Frequentists either illogical or obtuse?
Statistical Decision Theory
Handout, Statistical Decision Theory. This is the full set of notes for this section.
21 Mar. Introduction, choosing an 'Ev' to suit the application and the consequences of a poor decision. The action set, the loss function, and the decision rule. Objective is to find a good decision rule. The Bayes rule. The Bayes Rule Theorem, makes it easy to compute the Bayes rule for any y. Theorem: Bayes rules respect the Likelihood Principle.
I mentioned David MacKay today. Here is his obituary in the Guardian. His books are freely available: Information Theory, Inference, and Learning Algorithms; and Sustainable Energy Without the Hot Air.
27 Mar, 1pm. Admissible rules. Admissibility as an alternative way to rule out bad decision rules. Wald's Complete Class Theorem (CCT) which proves, under some conditions, that a decision rule is admissible if and only if it is a Bayes rule for some prior distribution.
28 Mar, 4pm. Finish blackboard proof of Wald's CCT. Decision rules for point estimation. The nature of the loss function: convex loss functions as a generic loss functions capturing the idea that small errors are tolerable but large ones are not. Some situations where the loss function is not an even (symmetric about 0) function. Quadratic loss as an approximation to even convex loss functions. Conditional expectation is the Bayes rule for quadratic loss.
29 Mar. Set estimation. A loss function which captures the notion that set estimators should be small, and that they should contain Θ. These desiderata conflict. A necessary condition to be a Bayes rule: the decision rule should be a level set of the conditional distribution: blackboard proof. Bayesian high posterior density (HPD) regions satisfy this condition. So do level sets of the likelihood function.
24 Apr, 1pm. Hypothesis testing. The simple Neyman-Pearson case, completely solved. The 0-1 loss function, not a very defensible choice, but it has a simple Bayes rule, to choose the hypothesis with the largest posterior probability. Co-opting the theory of set estimators: this is a precise solution to the wrong problem! 'Accepting', 'rejecting' and 'undecided'. Types of hypothesis: simple versus composite, degenerate versus non-degenerate. Cannot accept a degenerate hypothesis.
Confidence procedures and p-values
Handout, Confidence sets.
24 Apr, 4pm. Confidence procedures: definition, coverage, exact and conservative. Reminder: the level set property (LSP). Computing the coverage of an arbitrary function C. Confidence sets. Families of confidence procedures. Super-uniform random quantities; a representation theorem for families of confidence procedures.
25 Apr. Question: is the set of families of confidence procedures with the LSP empty? Answer: no, indeed there are an uncountable number of elements in this set. Families based on Markov's inequality. Transformations and marginalisation: the obvious thing works. Coverage is preserved in the case where the transformation is bijective, otherwise coverage typically goes up.
2 May, 10am. P-values for families of confidence procedures. Definition of a p-value: the smallest α for which C(y; α) does not intersect Ω₀, where Ω₀ is the null hypothesis. Interesting fact: the p-value so defined is super-uniform. More general definition of a significance procedure: a statistic which is super-uniform under the null hypothesis. How to make a p-value for a composite null hypothesis out of a family of p-values for simple hypotheses.
2 May, 1pm. The Probability Integral Transform. The uncountable number of families of significance procedures, one for each mapping from the observables to the real line. Choosing a good one: larger values under decision-relevant departures from the model. Interpreting a small p-value. You can definitely say "Were my null hypothesis to be correct, something improbable has occurred". You cannot say "My model is incorrect". Not only is this an illogical deduction, but it is also fatuous: all models are incorrect. The 'crisis of reproducibility', and how ambitious or ignorant humans behave in practice. We should not be surprised.

Homework and assessment

Homework

There will be a homework every week. Hand-in dates and hand-back dates will usually be Mon 5pm for hand-in, and Tue 1pm for hand-back. Hand-in in the box in the Maths Dept foyer, or at the lecture.

You are strongly encouraged to do the homeworks and to hand in your efforts, to be commented on and marked.

First homework.
Here are the answers. Modified, 14 Mar. I have added some comments which you should look at, concerning odds ratios and likelihood ratios.
Feedback: Poor technique costs marks. See my feedback comments on the first Bayesian Modelling B homework. Try not to insult your reader with an answer that says "Really, I just don't care enough to make an effort for you."
Second homework.
Here are the answers.
Feedback: Pay close attention to the notation. Capitals for random quantities, and a θ in every expectation which depends on θ. It is very logical once you understand the conventions, and helpful too. Some good answers this week, although the correct ones were not particularly elegant. Try for the most elegant answer you can. Not under exam conditions, but when doing any homeworks. Maths is not just about bashing through the algebra, it is about revealing the essence of a result by saying no more than you have to.
Third homework.
Here are the answers.
Feedback: These questions (2 and some of 3) were hard: quite ambiguous and discursive. Exactly the kinds of questions we must deal with all the time. Don't ever think there is one 'right answer' in statistics. The main challenge for the statistician is to understand the question; and her main contribution is often to help the question's poser to understand his own question a bit better.
Fourth homework.
Answers to questions 1, 2, 3.
Fifth homework: questions 4, 5, 6 from the fourth homework.
Answers to questions 4, 5, 6 as well.
Sixth homework.
Here are the answers.

Assessment (Level 7/M)

There is one assessment, counting 20% towards the final mark. This will be set in about Teaching Week 23, with the deadline two weeks later. You may want to consult the University regulations on assessed work and the Science Faculty regulations on late submissions.

Assignment. Set on 24 April, due 5pm on 8 May.