SuSTaIn
About
News
Postgraduate degrees
Events
Jobs
Management
Contact us
Statistics Group
Statistics Home
Research
Members
Seminars
Statistics@Bristol
Mathematics Home
External Links
APTS
Complexity science
Royal Statistical Society
International Society for Bayesian Analysis
|
|
Regular seminar webpages:
Statistics |
Probability |
Complexity Science |
RSS local group
New faces seminar
Friday 23 October 2009, room SM3, short research talks by new arrivals in the group
- 3.40 Jonty Rougier: Accounting for the limitations of scientific models
- 4.10 Nick Whiteley: Monte Carlo filtering of piecewise deterministic processes
- 4.40 Wine, juice and nibbles
See main statistics seminar page for details.
Past seminars
New faces seminar
Friday 24 April 2009, room SM3, short research talks by new arrivals in the group
- 3.40 Richard Everitt: Using interval arithmetic to help Monte Carlo algorithms discover modes
- 4.10 Dan Lawson: Getting information out of your gut: applying hierarchical models to
complex MCMC problems
- 4.40 Wine, juice and nibbles
Abstracts
Richard Everitt
Using interval arithmetic to help Monte Carlo algorithms discover modes
Target densities with narrow, well separated modes are rarely explored well
by MCMC algorithms. Recently in the optimisation literature, interval
arithmetic has been rediscovered as a method for discovering such modes in
a large class of objective functions. However, the use of this idea has
mostly been restricted to use on functions in low dimensions. In this
talk we introduce a method for exploiting interval arithmetic in MCMC and
demonstrate that this can help MCMC to discover narrow, well separated
modes in situations where other techniques fail. We also show how a
simulated annealing style variant can improve on existing interval-based
optimisation techniques on high dimensional objective functions.
Dan Lawson
Getting information out of your gut: applying hierarchical models to
complex MCMC problems
In biology there are many cases where a lot is known about the general
process producing data but obtaining large quantities of good data is
difficult. In this case it is necessary to combine statistical
modelling with traditional mathematical modelling in order to make
meaningful inferences. Hierarchical modelling allows data from multiple
experiments to be combined in a meaningful way, and process modelling
allows sparse data to be interpreted in a useful way. We describe an
MCMC analysis of a complex ecological model for bacterial growth and
competition and examine how hierarchical modelling helps to overcome
problems of identifiability and dimensionality in complex models.
New faces seminar
Friday 12 December 2008, room SM3, short research talks by new arrivals in the group
- 2.15 Andrew Wade: Stochastic billiards in unbounded planar domains
- 2.45 Oliver Zobay: Mean field inference for Dirichlet process mixture models
- 3.40 Jack O'Brien: Robust measures of evolutionary distance
- 4.10 Larissa Stanberry: Statistical inference for random sets
- 4.40 Wine, juice and nibbles
Abstracts
Andrew Wade
Stochastic billiards in unbounded planar domains
Motivated by ideal gas models in the low density regime,
we study a randomly reflecting particle travelling
at constant speed in an unbounded domain in the plane
with boundary satisfying a polynomial growth condition.
The growth rate of the domain, together with
the reflection distribution, determine the behaviour
of the process. We will mention results on
recurrence vs. transience, and on almost-sure-bounds
for the particle
including the rate of escape in the transient case.
The proofs exploit a surprising relationship to
Lamperti's problem of a process on the half-line
with asymptotically zero drift.
This is joint work with Mikhail Menshikov and
Marina Vachkovskaia.
Oliver Zobay
Mean field inference for Dirichlet process mixture models
Mean field methods have recently attracted interest as possible
alternatives to Monte Carlo integration in computational Bayesian
inference. Under suitable circumstances, mean field may provide useful
approximations to complicated posterior distributions. In this talk,
first a short general introduction to the mean field approach is given.
Then, its application to some Dirichlet process mixture models is
considered. Some general structural features of the approximate
posterior distributions are discussed, and their use for density
estimation and cluster allocation is studied by comparing to MCMC results.
Jack O'Brien
Robust measures of evolutionary distance
Model misspecification plagues biological sequence analysis, leading to
severe difficulties in counting the number of substitutions between
pairs of sequences. Here I show how a semi-parametric procedure
integrating parametric Markov Chain estimates of distance over the
empirical distribution of sites patterns found in sequence data yields
distance estimates that are highly robust to model misspecification.
This method encompasses a general form that allows previously
unconsidered substitution types, providing a large (and statistically
robust!) expansion of the "toolkit" for biological researchers. I
conclude with some generalisations of this procedure and connections to
other robust statistics.
Larissa Stanberry
Statistical inference for random sets
In this talk I explore the basics of statistical inference for random sets. Random sets arise in various applications including image analysis, functional analysis, support estimation, and covariate range estimation. I will give a brief overview of existing definitions for the expectation of a random set and introduce a new definition based on the oriented distance function. I will discuss the problem of boundary estimation and talk about the choice of the loss function for set inference. Time permitted, I'll show some examples and simulations.
New faces seminar
Friday 15 February 2008, room SM3, short research talks by new arrivals in the group
- 3.40 Adam Johansen: Point Estimation in Latent Variable Models via Sequential Monte Carlo
- 4.10 Ludger Evers: Locally adaptive tree-based thresholding
Abstracts
Adam Johansen
Point Estimation in Latent Variable Models via Sequential Monte Carlo
Monte Carlo methods have been widely used to approximate expectations to perform inference within a Bayesian framework, but can be applied to a much wider range of problems. A novel approach to obtain maximum likelihood or maximum a posteriori estimates in latent variable methods using Sequential Monte Carlo is presented. Performance
is illustrated with several examples.
Referenced paper
New faces seminar
Friday 7 December 2007, room SM3, short research talks by new arrivals in the group
- 4.00 Vanessa Didelez: The Selection Effect
- 4.35 Søren Højsgaard: Predicting heat in dairy cows - a state-space model for cyclical
progesterone measurements
- 5.10 Wine, juice and nibbles
Abstracts
Vanessa Didelez
The Selection Effect
In some studies the participants are selected (by design or by mistake) conditional on a certain event, e.g. in time-to-pregancy studies some designs are such that only women who have given birth in a certain time interval are included; case-control studies are another popular example where selection occurs by design. This selection can induce unexpected dependencies and bias. I will show how graphical models can be used to represent the selection effect and to address the question of whether causal inference, e.g. about the effect of an exposure on a disease outcome, is still feasible or will be biased. The same principle is further extended to the dynamic case where processes are observed conditional on reaching a certain state at a certain time.
Søren Højsgaard, visiting from Aarhus University
Predicting heat in dairy cows - a state-space model for cyclical
progesterone measurements
Detecting when a cow is in heat is very difficult - even for
experienced farmers. Therefore much effort is put into developing
systems/models for that purpose. One approach is to use the
progesterone concentration in the milk. Progesterone can me measured
routinely in automatic milking systems ("milking robots") and
measurements can be obtained on average every 10th hour. Shortly
after the progesterone concentration drops to a low level, it is time
for AI (artificial insemination). The optimal time window for
insemination is about 6 hours. The drop in progesterone concentration
is quite sudden and may hence not be detected before it is too
late. The progesterone level exhibits a cyclicity across reproductive
cycles but the cycle length and the specific form of the progesterone
curves varies from cow to cow (and may also vary within cows across
parities).
To cope with these problems we consider a state-space model in which
the cycle length and the parameters describing the shape of the
progesterone curves are updated as data comes in from the milking
system.
New faces seminar
Friday 16 November 2007, room SM3, short research talks by new arrivals in the group
- 3.00 Geir Storvik: The DNA database search controversy revisited: Bridging the Bayesian -
Frequentist gap
- 3.35 Li Chen: Two Problems in Environmetrics
- 4.10 Ana-Maria Staicu: Multilevel functional analysis with application to colon carcinogenesis
- 4.45 Feng Yu: Asymptotic Behaviour of the Rate of Adaptation
- 5.20 Wine, juice and nibbles
Abstracts and links
Geir Storvik, visiting from University of Oslo
The DNA database search controversy revisited: Bridging the Bayesian -
Frequentist gap
Two different quantities have been suggested for quantification of
evidence in cases
where a suspect is found by a search through a database of DNA profiles.
The likelihood
ratio, typically motivated from a Bayesian setting, is preferred by most
experts in the
field. The so-called np rule (which says that the evidence should be
weakened by a
factor of n when a database of size n is searched) has been suggested
through frequentist
arguments and has been suggested by the American National Research Council
(NRC) and
Stockmarr (1999).
The two quantities differ substantially and have given rise to the DNA
database search
controversy. Although several authors have criticized the different
approaches, a full
explanation of why these differences appear has been lacking.
In this talk I show that a P-value in a frequentist hypothesis setting is
approximately
equal to the result of the np rule. I will argue, however, that a more
reasonable
procedure in this case is to use conditional testing, in which case a
P-value directly
related to posterior probabilities and the likelihood ratio is obtained.
This way of
viewing the problem bridges the gap between the Bayesian and frequentist
approaches.
At the same time it indicates that the np rule should not be used to
quantify evidence.
This is based on joint work with Thore Egeland
Referenced paper in Biometrics
Li Chen
Two Problems in Environmetrics
Data from geophysical and environmental sciences are often available from both observations and numerical model output. They are rich in space/time, or both. Two problems are often of great interest: (i) How to assess the performance of the numerical model? (ii) How to use both observations and numerical model output effectively to improve prediction? In this talk, I will address these two questions from a statistical point of view. Spatial and spatial-temporal modelling is essential to much of this work. Our proposed method is illustrated by an application to air quality studies.
Ana-Maria Staicu
Multilevel functional analysis with application to colon carcinogenesis
We describe the framework and inferential tools for multilevel functional
data where the functions at the lowest hierarchy level are correlated. The
model is motivated by a colon carcinogenesis study, where data consists of
measurements for a biomarker of interest at each cell within different
colonic crypts for random rats fed with particular diets (the hierarchy:
diets, random rats within diet, random crypts within rats). Previous
literature to analyze such settings includes a Bayesian wavelet-based
procedure (Morris and Carroll, JRSSB 2006) as well as a Bayesian
semiparametric method (Veerabhadran et al., Biometrics 2007), both
computationally intensive techniques. We propose a new approach, also
semiparametric, but much simpler and easy to implement. Our methodology is
applied to an artificial data set and the results are briefly illustrated.
Feng Yu
Asymptotic Behaviour of the Rate of Adaptation
We consider the accumulation of beneficial and deleterious
mutations in large asexual populations. The rate of adaptation is
affected by the total mutation rate, proportion of beneficial mutations,
and population size. We show that regardless of mutation rates and
proportion of beneficial mutations (as long as it is strictly positive),
the adaptation rate is at least $O(\log^{1-\delta} (population
size))$, if the population size is sufficiently large.
Referenced paper
Special seminar
Friday 9 November 2007, room SM3
2.45 Jim Ramsay, McGill University
Estimation of the quantile function. Joint work with
Giles Hooker, Cornell.
The quantile function Q(u) is the inverse of the probability density function F(x); that is, Q[F(x)] = x and F[Q(u)] = u. John Tukey championed its use, point out that ordinary folks often present us with a probability u and want to know the event x that is associated with it, rather than with an event whose probability they don't know. Our particular interest is providing helpful information about rainfall on the Canadian prairies, and we want to be able to tell a producer about extremes of precipitation that they will only see, for example, once in a century. We will review the quantile function and its many interesting properties.
Emanuel Parzen and many others have discussed the problem of estimating Q from a sample of data. The definition of a strictly monotone function developed by Ramsay (JRSS-B, 1996) leads to an especially neat formulation of this estimation problem, and to some new approaches. In particular, we are working on the problem of estimating a distributed quantile function Q(u,t,r) where t indexes time and r indexes space. This generalizes the usual data smoothing problem, which only attempts to estimate the expectation of x, and quantile regression, which estimates a single quantile value.
4.00 David Dunson, Duke University and NIEHS
Bayesian density regression via kernel stick-breaking processes
In many applications, there is interest in inference on changes in the
conditional distribution of a response variable given one or more
predictors. Motivated by data from reproductive and molecular
epidemiology studies, we develop general nonparametric Bayes methods for
conditional distribution estimation and inferences, allowing both the mean
and residual distribution to change flexibly with predictors. We first
propose a class of kernel stick-breaking processes (KSBP) for uncountable
collections of dependent random probability measures. The KSBP generalizes the Dirichlet process to allow unknown distributions and
partition structures to vary flexible with predictors. Some theoretical
properties are considered, and methodology is developed for posterior
computation and inferences. The methods are applied to premature delivery data from a large study of pregnancy outcomes using an infinite mixture of experts model. Priors for stochastically ordered collections of distributions are also described, and illustrated using DNA damage and
repair studies.
|