SuSTaIn

fetal liver image, protein structure,
galaxy image, Bayes net for forensic DNA analysis, gene expression data

SuSTaIn About News Postgraduate degrees Events Jobs Management Contact us Statistics Group Statistics Home Research Members Seminars Statistics@Bristol Mathematics Home External Links APTS Complexity science Royal Statistical Society International Society for Bayesian Analysis

Regular seminar webpages:
Statistics | Probability | Complexity Science | RSS local group

New faces seminar

Friday 23 October 2009, room SM3, short research talks by new arrivals in the group

3.40 Jonty Rougier: Accounting for the limitations of scientific models
4.10 Nick Whiteley: Monte Carlo filtering of piecewise deterministic processes
4.40 Wine, juice and nibbles

See main statistics seminar page for details.

Past seminars

New faces seminar

Friday 24 April 2009, room SM3, short research talks by new arrivals in the group

3.40 Richard Everitt: Using interval arithmetic to help Monte Carlo algorithms discover modes
4.10 Dan Lawson: Getting information out of your gut: applying hierarchical models to complex MCMC problems
4.40 Wine, juice and nibbles

Abstracts
Richard Everitt
Using interval arithmetic to help Monte Carlo algorithms discover modes

Target densities with narrow, well separated modes are rarely explored well by MCMC algorithms. Recently in the optimisation literature, interval arithmetic has been rediscovered as a method for discovering such modes in a large class of objective functions. However, the use of this idea has mostly been restricted to use on functions in low dimensions. In this talk we introduce a method for exploiting interval arithmetic in MCMC and demonstrate that this can help MCMC to discover narrow, well separated modes in situations where other techniques fail. We also show how a simulated annealing style variant can improve on existing interval-based optimisation techniques on high dimensional objective functions.

Dan Lawson
Getting information out of your gut: applying hierarchical models to complex MCMC problems

In biology there are many cases where a lot is known about the general process producing data but obtaining large quantities of good data is difficult. In this case it is necessary to combine statistical modelling with traditional mathematical modelling in order to make meaningful inferences. Hierarchical modelling allows data from multiple experiments to be combined in a meaningful way, and process modelling allows sparse data to be interpreted in a useful way. We describe an MCMC analysis of a complex ecological model for bacterial growth and competition and examine how hierarchical modelling helps to overcome problems of identifiability and dimensionality in complex models.

New faces seminar

Friday 12 December 2008, room SM3, short research talks by new arrivals in the group

2.15 Andrew Wade: Stochastic billiards in unbounded planar domains
2.45 Oliver Zobay: Mean field inference for Dirichlet process mixture models
3.40 Jack O'Brien: Robust measures of evolutionary distance
4.10 Larissa Stanberry: Statistical inference for random sets
4.40 Wine, juice and nibbles

Abstracts
Andrew Wade
Stochastic billiards in unbounded planar domains

Motivated by ideal gas models in the low density regime, we study a randomly reflecting particle travelling at constant speed in an unbounded domain in the plane with boundary satisfying a polynomial growth condition. The growth rate of the domain, together with the reflection distribution, determine the behaviour of the process. We will mention results on recurrence vs. transience, and on almost-sure-bounds for the particle including the rate of escape in the transient case. The proofs exploit a surprising relationship to Lamperti's problem of a process on the half-line with asymptotically zero drift. This is joint work with Mikhail Menshikov and Marina Vachkovskaia.

Oliver Zobay
Mean field inference for Dirichlet process mixture models

Mean field methods have recently attracted interest as possible alternatives to Monte Carlo integration in computational Bayesian inference. Under suitable circumstances, mean field may provide useful approximations to complicated posterior distributions. In this talk, first a short general introduction to the mean field approach is given. Then, its application to some Dirichlet process mixture models is considered. Some general structural features of the approximate posterior distributions are discussed, and their use for density estimation and cluster allocation is studied by comparing to MCMC results.

Jack O'Brien
Robust measures of evolutionary distance

Model misspecification plagues biological sequence analysis, leading to severe difficulties in counting the number of substitutions between pairs of sequences. Here I show how a semi-parametric procedure integrating parametric Markov Chain estimates of distance over the empirical distribution of sites patterns found in sequence data yields distance estimates that are highly robust to model misspecification. This method encompasses a general form that allows previously unconsidered substitution types, providing a large (and statistically robust!) expansion of the "toolkit" for biological researchers. I conclude with some generalisations of this procedure and connections to other robust statistics.

Larissa Stanberry
Statistical inference for random sets

In this talk I explore the basics of statistical inference for random sets. Random sets arise in various applications including image analysis, functional analysis, support estimation, and covariate range estimation. I will give a brief overview of existing definitions for the expectation of a random set and introduce a new definition based on the oriented distance function. I will discuss the problem of boundary estimation and talk about the choice of the loss function for set inference. Time permitted, I'll show some examples and simulations.

New faces seminar

Friday 15 February 2008, room SM3, short research talks by new arrivals in the group

3.40 Adam Johansen: Point Estimation in Latent Variable Models via Sequential Monte Carlo
4.10 Ludger Evers: Locally adaptive tree-based thresholding

Abstracts
Adam Johansen
Point Estimation in Latent Variable Models via Sequential Monte Carlo

Monte Carlo methods have been widely used to approximate expectations to perform inference within a Bayesian framework, but can be applied to a much wider range of problems. A novel approach to obtain maximum likelihood or maximum a posteriori estimates in latent variable methods using Sequential Monte Carlo is presented. Performance is illustrated with several examples.
Referenced paper

New faces seminar

Friday 7 December 2007, room SM3, short research talks by new arrivals in the group

4.00 Vanessa Didelez: The Selection Effect
4.35 Søren Højsgaard: Predicting heat in dairy cows - a state-space model for cyclical progesterone measurements
5.10 Wine, juice and nibbles

Abstracts
Vanessa Didelez
The Selection Effect

In some studies the participants are selected (by design or by mistake) conditional on a certain event, e.g. in time-to-pregancy studies some designs are such that only women who have given birth in a certain time interval are included; case-control studies are another popular example where selection occurs by design. This selection can induce unexpected dependencies and bias. I will show how graphical models can be used to represent the selection effect and to address the question of whether causal inference, e.g. about the effect of an exposure on a disease outcome, is still feasible or will be biased. The same principle is further extended to the dynamic case where processes are observed conditional on reaching a certain state at a certain time.

Søren Højsgaard, visiting from Aarhus University
Predicting heat in dairy cows - a state-space model for cyclical progesterone measurements

Detecting when a cow is in heat is very difficult - even for experienced farmers. Therefore much effort is put into developing systems/models for that purpose. One approach is to use the progesterone concentration in the milk. Progesterone can me measured routinely in automatic milking systems ("milking robots") and measurements can be obtained on average every 10th hour. Shortly after the progesterone concentration drops to a low level, it is time for AI (artificial insemination). The optimal time window for insemination is about 6 hours. The drop in progesterone concentration is quite sudden and may hence not be detected before it is too late. The progesterone level exhibits a cyclicity across reproductive cycles but the cycle length and the specific form of the progesterone curves varies from cow to cow (and may also vary within cows across parities).

To cope with these problems we consider a state-space model in which the cycle length and the parameters describing the shape of the progesterone curves are updated as data comes in from the milking system.

New faces seminar

Friday 16 November 2007, room SM3, short research talks by new arrivals in the group

3.00 Geir Storvik: The DNA database search controversy revisited: Bridging the Bayesian - Frequentist gap
3.35 Li Chen: Two Problems in Environmetrics
4.10 Ana-Maria Staicu: Multilevel functional analysis with application to colon carcinogenesis
4.45 Feng Yu: Asymptotic Behaviour of the Rate of Adaptation
5.20 Wine, juice and nibbles

Abstracts and links
Geir Storvik, visiting from University of Oslo
The DNA database search controversy revisited: Bridging the Bayesian - Frequentist gap

Two different quantities have been suggested for quantification of evidence in cases where a suspect is found by a search through a database of DNA profiles. The likelihood ratio, typically motivated from a Bayesian setting, is preferred by most experts in the field. The so-called np rule (which says that the evidence should be weakened by a factor of n when a database of size n is searched) has been suggested through frequentist arguments and has been suggested by the American National Research Council (NRC) and Stockmarr (1999).
The two quantities differ substantially and have given rise to the DNA database search controversy. Although several authors have criticized the different approaches, a full explanation of why these differences appear has been lacking.
In this talk I show that a P-value in a frequentist hypothesis setting is approximately equal to the result of the np rule. I will argue, however, that a more reasonable procedure in this case is to use conditional testing, in which case a P-value directly related to posterior probabilities and the likelihood ratio is obtained. This way of viewing the problem bridges the gap between the Bayesian and frequentist approaches. At the same time it indicates that the np rule should not be used to quantify evidence.
This is based on joint work with Thore Egeland
Referenced paper in Biometrics

Li Chen
Two Problems in Environmetrics

Data from geophysical and environmental sciences are often available from both observations and numerical model output. They are rich in space/time, or both. Two problems are often of great interest: (i) How to assess the performance of the numerical model? (ii) How to use both observations and numerical model output effectively to improve prediction? In this talk, I will address these two questions from a statistical point of view. Spatial and spatial-temporal modelling is essential to much of this work. Our proposed method is illustrated by an application to air quality studies.

Ana-Maria Staicu
Multilevel functional analysis with application to colon carcinogenesis

We describe the framework and inferential tools for multilevel functional data where the functions at the lowest hierarchy level are correlated. The model is motivated by a colon carcinogenesis study, where data consists of measurements for a biomarker of interest at each cell within different colonic crypts for random rats fed with particular diets (the hierarchy: diets, random rats within diet, random crypts within rats). Previous literature to analyze such settings includes a Bayesian wavelet-based procedure (Morris and Carroll, JRSSB 2006) as well as a Bayesian semiparametric method (Veerabhadran et al., Biometrics 2007), both computationally intensive techniques. We propose a new approach, also semiparametric, but much simpler and easy to implement. Our methodology is applied to an artificial data set and the results are briefly illustrated.

Feng Yu
Asymptotic Behaviour of the Rate of Adaptation

We consider the accumulation of beneficial and deleterious mutations in large asexual populations. The rate of adaptation is affected by the total mutation rate, proportion of beneficial mutations, and population size. We show that regardless of mutation rates and proportion of beneficial mutations (as long as it is strictly positive), the adaptation rate is at least $O(\log^{1-\delta} (population size))$, if the population size is sufficiently large. Referenced paper

Special seminar

Friday 9 November 2007, room SM3

2.45 Jim Ramsay, McGill University
Estimation of the quantile function. Joint work with Giles Hooker, Cornell.

The quantile function Q(u) is the inverse of the probability density function F(x); that is, Q[F(x)] = x and F[Q(u)] = u. John Tukey championed its use, point out that ordinary folks often present us with a probability u and want to know the event x that is associated with it, rather than with an event whose probability they don't know. Our particular interest is providing helpful information about rainfall on the Canadian prairies, and we want to be able to tell a producer about extremes of precipitation that they will only see, for example, once in a century. We will review the quantile function and its many interesting properties.

Emanuel Parzen and many others have discussed the problem of estimating Q from a sample of data. The definition of a strictly monotone function developed by Ramsay (JRSS-B, 1996) leads to an especially neat formulation of this estimation problem, and to some new approaches. In particular, we are working on the problem of estimating a distributed quantile function Q(u,t,r) where t indexes time and r indexes space. This generalizes the usual data smoothing problem, which only attempts to estimate the expectation of x, and quantile regression, which estimates a single quantile value.

4.00 David Dunson, Duke University and NIEHS
Bayesian density regression via kernel stick-breaking processes

In many applications, there is interest in inference on changes in the conditional distribution of a response variable given one or more predictors. Motivated by data from reproductive and molecular epidemiology studies, we develop general nonparametric Bayes methods for conditional distribution estimation and inferences, allowing both the mean and residual distribution to change flexibly with predictors. We first propose a class of kernel stick-breaking processes (KSBP) for uncountable collections of dependent random probability measures. The KSBP generalizes the Dirichlet process to allow unknown distributions and partition structures to vary flexible with predictors. Some theoretical properties are considered, and methodology is developed for posterior computation and inferences. The methods are applied to premature delivery data from a large study of pregnancy outcomes using an infinite mixture of experts model. Priors for stochastically ordered collections of distributions are also described, and illustrated using DNA damage and repair studies.