Theory of Inference 2015/16 (MATH 35610, MATH M0022)

This course has a mission statement!

Helping people to make better choices under uncertainty.


LecturerJonathan Rougier, j.c.rougier@bristol.ac.uk
LevelH/6 and M/7, 10cp, TB2
Official unit pagelevel H/6, level M/7
Timetable 1200-1250, Wed, Maths SM3
1500-1550 Thu, Maths SM4
1000-1050 Fri, Maths SM4
Please note: All lectures start promptly on the hour and last for fifty minutes.
Office Hour 1300-1350 Thu, Maths PC1

Navigation: Course outline, details, homework and assessment.

Click on 'details' to see a lecture-by-lecture summary.

Announcements

Course outline

The basic premise of inference is our belief that the things we would like to know are related to other things that we can measure. This premise holds over the whole of the sciences. The distinguishing features of statistical science are
  1. A probabilistic approach to quantifying uncertainty, and, within that,
  2. A concern to assess the principles under which we make good inferences, and
  3. The development of tools to facilitate the making of such inferences, particularly those using data.
This course illustrates these features at a high level of generality, while also covering some of the situations in science and policy where they arise. See Details for more information.

Reading

There is a comprehensive handout for this course. The following are just suggestions if you are interested in following up the topics in the lectures.

For statistical theory, at the appropriate level of difficulty, see

For the use of probability and statistics in society,

Comment on the exam

Previous exam papers are available on Blackboard. You should be aware that the course continues to evolve, and these questions cannot be taken as a reliable guide to the questions that will be set this year.

Answers to previous exam papers will not be made available. The exam is designed to assess whether you have attended the lectures, read and thought about your lecture notes and the handouts, done the homework, and read a bit more widely in the textbooks. Diligent students who have done the above will gain no relative benefit from studying the answers to previous exam questions. On the other hand, less diligent students may suffer the illusion that they will do well in the exam, when probably they will not.

Instead, I will supply 'exam-style' questions in the homeworks for revision purposes.

Finally, please note that in the exam ALL questions will be used for assessment. The number of marks will not necessarily be the same for each question.

Course details

Here is a summary of the course, looking as far ahead as seems prudent. This plan is subject to revision. There will be some time at the end for revision of the major themes.

  1. Foundational issues A handout, extracted from a book chapter. You must read sections 1.7.3 and 1.8. You might find section 1.9 interesting, Theorem 1.12 is beautiful and mysterious.

    Updated handout. This is ch2 of my unfinished book on Statistical Inference; I have made some improvements on the handout above. The course material is in sections 2.4 and 2.5. There is a fairly complete proof of Stiemke's Theorem in section 2.8.

    9 Mar. Chatty introduction. The power of abstraction in maths. Modern maths: the manipulation of symbols according to rules. Probability theory in the abstract. What not to say when someone asks you for a probability. The need for our probabilities (which are numbers) to have clear meanings that correspond, where possible, to the 'folk' understanding. Transparency, defensibility, in the public arena.

    10 Mar. Three questions: (i) what is a probability? (ii) what are its properties? (iii) why these properties and not others? The Laws of Probability, the betting interpretation of probability, which applies to all propositions. Having a betting rate p = Pr(A) indicates a willingness to enter into a contract to gain 1-p if A is true, and to lose p if A is false (and other similar contracts). Coherence: it is irrational to give money to a bookmaker (but not irrational to give money to a charity). Vector inequalities, Stiemke's Theorem.

    11 Mar. The Dutch Book Theorem: betting rates are coherent if and only if they obey the Laws of Probability. An elegant proof using Stiemke's Theorem. To be left for the homework: the same thing, except for conditional probabilities.

  2. Styles of inference

    16 Mar. Predictands, observables, and observations. Two questions: (i) how do we represent our beliefs about uncertainy quantities? and (ii) how do we update those beliefs using the observations? Thm: conditionalisation is coherence-preserving. So as long as we can construct a joint distribution over predictands and observables, we have answers to both of these questions.

    17 Mar. The challenge of specifying a joint distribution. Sometimes causal reasoning can help. Disease testing, where the disease status is the cause of the test outcome. What to ask your doctor when your test comes back positive: what is the test sensitivity? what is the test specificity? what is the base rate for this disease for people like me? Your doctor's job is to know the answer to these questions. "Extending the conversation" (D. Lindley). Statistical modelling often involves introducing additional quantities (parameters) which have the effect of simplifying the joint distribution through conditional independence.

    There will be a homework on this topic to be contemplated over the Easter vacation.

    13 Apr. Recap, on the challenge of 'meaning'. And now we are going to move on! Specifying a 'family' of distributions for X using a statistical model f and a parameter space Ω; written X ~ {f, Ω}. Prediction versus inference. Different tribes of statisticians: Frequentists, 'Likelihood-ists', and Bayesians (neo-Bayesians and modern Bayesians). Bayesians can do both prediction and inference, but at the cost of specifying a 'prior distribution' over the parameter space. Frequentists and Likelihood-ists only really do inference. They can do without a prior, but they must be very careful to avoid the kinds of difficulties which would be automatically avoided were a prior to be used.

  3. Statistical Decision Theory

    Here is the handout.

    14 Apr. Introduction to Decision Theory. The Bayes Rule, and the Bayes Rule Theorem. Decision Theory as a framework for statistical inference: distinguishing between point estimation, set estimation, and hypothesis testing in terms of actions sets. The need for loss functions.

    15 Apr. Wald's Complete Class Theorem: a decision rule is admissible if and only if it is a Bayes Rule for some strictly positive prior probabilities. How to prove that your decision rule is admissible. What questions the client should ask the statistician.

    20 Apr, 9am. Just to be clear: not every inadmissible decision rule is dominated by every admissible rule. The complicated situation of choosing rules. Point estimation, where the action set is the parameter space. Different loss functions: asymmetric, zero-one.

    20 Apr, 12 noon. A generic loss function: convexity. Why the quadratic loss function approximates a convex loss function. Proof that the Bayes Rule for a quadratic loss function is the conditional expectation. Set estimation: the desire to have small estimators with a high probability of containing the 'true' value. Proof of the necessary condition that for a Bayes Rule, no point inside the set should have a smaller f(y; θ) than any point outside the set, for each y. Informally, set estimates should be level functions of f(y; ·).

    21 Apr. Hypothesis testing. The difficulty of specifying a loss function (outside of the rather simplistic zero-one loss function). Using set estimators to do hypothesis tests. 'Accept', 'reject', and 'undecided' have technical meanings. Limitations of using set estimators. One-tailed and two-tailed tests.

    I mentioned 'cheating'; the paper I described is A peculiar prevalence of p values just below .05. The 'crisis of reproducibility was heralded by Why most published research findings are false (this paper is controversial). For more on reproducibility studies, start here.

    More woeful news on reproducibility: "In other words, we face a replication crisis in the field of biomedicine, not unlike the one we’ve seen in psychology but with far more dire implications."

  4. Confidence sets

    Here is the handout. All of this handout is interesting, but not all of it is examinable. You may want to skip 5.3.1, 5.3.2, Bootstrap calibration' in 5.3.3, 5.5.4, 5.6.

    22 Apr. Definition of a confidence procedure: this definition is not negotiable! Exact procedures, coverage, confidence sets. Determing the level of a mapping from y to a set of Ω is easy, but we need to do it the other way around. An exact confidence procedure which is completely useless.

    23 Apr. The Marginalisation Theorem for confidence procedures. This is the intuitive result, but it has the unfortunate side effect of increasing the coverage of the confidence procedure in an unspecified way. So an exact 95% confidence procedure for θ does not give rise to an exact 95% confidence procedure for φ unless the mapping from θ to φ is bijective. We also discussed the general fact that sets in 3D+ spaces are hard to define and hard to visualise, except by enumeration.

    24 Apr. Families of confidence procedures. The representation theorem, which requires us to understand stochastic dominance. The elegant 'nesting' property. There do exist families of confidence procedures! A family which is exact, but useless. We'll find out why it is a bad choice in the next lecture.

    25 Apr. A good family of confidence procedures, based on Wilks's Theorem. This family has the Level Set Property, which we considered to be a necessary condition for admissibility under the natural loss function. Basically, you find the maximum of the log-likelihood function, drop down a specified amount that depends on the level and the size of the parameter space, and keep all of the θ values which attain this threshold. The only problem is that Wilks's Theorem is an asymptotic result which holds under some regularity conditions. That is why it gives a family of confidence procedures which are only 'approximately exact'.

    4 May. P-values for confidence procedures. Reminder about hypothesis tests with confidence procedures: simple or composite, and, if composite, zero-measure or positive measure. 'Reject' or 'not reject' at a Type 1 error level of 5% (if zero measure) is just one bit of information. P-values give more. If the P-value of H0 is 0.001 then have the following dialogue (with yourself). "For this y, the 95% confidence set was a long way from H0; I had to increase the level to 99.9% before it touched. Therefore this y does not support H0." Definition of the P-value p(y; H0). We went on to prove that p(Y; θj) was super-uniform for all θj ∈ H0.

    5 May. P-values in general. Definition: a significance procedure is a statistic which is super-uniform under a specified simple hypothesis (generalizations to composite hypotheses). How to construct a significance procedure starting from a test statistic. t(y) = c is an example of a meaningless test statistic which, nonetheless, gives a well-defined significance procedure. Why P-values? We started to address this but will come back to it next time.

    6. May. More on P-values. Two different ways to compute a P-value for f0 = Poisson(λ = 3). Two good things to say about P-values: they can be tuned to detect decision-relevant departures from the null model, and they can handle more general situations than confidence procedures. How to interpret a P-value: Fisher's dichotomy. But f0 is never true, so what does this imply? The 'sweet spot' viewpoint.

Homework and assessment

Homework

There will be a homework every week. Hand-in dates and hand-back dates will usually be Wed 5pm for hand-in, and Fri 10am for hand-back. Hand-in in the box in the Maths Dept foyer, or at the lecture.

You are strongly encouraged to do the homeworks and to hand in your efforts, to be commented on and marked.

  1. First homework. Hand in by 5pm Wed 15th Mar, in the Theory of Inference box in the foyer.

    Here are the answers.

    Feedback. Q3 was very hard; it carried an exam tariff but I doubt I would set such a question in practice. The two things to watch out for are: (i) you cannot evade the difficulty of defining 'probability' simply by referring to it as 'likelihood', and (ii) relative frequency is not always appropriate in quantifying uncertainty. So although it is tempting to say "In 2000 eruptions like this one, damage will result in 3 of them", all you are doing is passing the burden of meaning onto "eruptions like this one". No one knows what this means!

  2. Second week/vacation homework. Hand in by 5pm Wed 13th Apr.

    Here are the answers.

    Feedback. Q3 needed a bit more context. In particular, the results from Q2 were important because the treatment effect can be embedded in a bijective function of the parameters; for this reason, its MLE is equal to the difference of the MLEs of the two μ's. This is quite a deep result, because it does not apply to all functions of the parameters, but only a subset of them.

  3. Third week homework.

    Here are the answers.

    Feedback. The questions were generally done well, although in all cases more thoughtful comments would have been appreciated. There are other things in the sky bar ash. The low specificity (high number of false positives) could be due to mistaking clouds for ash. In the final question, you can infer that the optimal action is SAFE no matter what the test. This is partly because of teh low base rate, but also partly because of the low specificity (as you can check by trying out different values).

  4. Fourth week homework.

    Here are the answers.

    Feedback. In Q1 it is necessary to answer all the parts fully. In Q3 and Q4 it is necessary to state and use the Bayes Rule theorem to derive the Bayes rule, and then to discuss the result. It is not possible to draw loss functions in 2D, and it is best not to try!

  5. Fifth week homework.

    Here are the answers (minor error in picture corrected).

    Feedback. These questions were quite challenging, which is why they have a high tariff. It may be hard to get full marks on challenging questions, but there are lots of marks for a good effort! When faced with a challenging question, about half the marks are for explaining the concepts and knowing what you need to show. So somewhere in your answer there should be a "We need to show that ...".

  6. Sixth week homework.

    Here are the answers. They are a bit rough! All suggestions welcome.

Assessment (Level 7/M)

There is one assessment, counting 20% towards the final mark. This will be set in about Teaching Week 23, with the deadline two weeks later. You may want to consult the University regulations on assessed work and the Science Faculty regulations on late submissions.
  1. Here is the assignment.