|
PJG Home / Software
Nmix-- a program for Bayesian analysis of univariate normal mixtures, implementing the approach of Richardson and Green, Journal of the Royal Statistical Society, B, 59, 731-792 (1997).If you download the files, please send a message to me at ,
with your email address and any comments. I can then keep you informed of any updates.
Instructions (also included among the downloaded files)
INSTALLATION
The package distributed consists of:
main source file Nmix.f
auxiliary routines in
fortran and C algama.f gauss4.f pnorm.f rgamma.f sd.c
makefile Makefile
example data files enz.dat galx.dat lnacid.dat
this file readme.txt
It is provided as both a gzipped tarfile and as a zip archive.
In the former, the files all have Unix-style newline characters,
and in the latter Dos-style line endings. The makefiles should be
appropriate for Unix and Dos respectively.
For a free Unix-like shell that runs under Windows, try Cygwin
(www.cygwin.com).
The program has been run successfully:
using f77 and cc under Solaris on Sun workstations
using GNU compilers under Linux on Intel processors
using GNU compilers and Cygwin under Windows on Intel PCs
on Compaq Alpha workstations (with edits: iseed in the main
program has to be declared as integer*8, and the random
number routines are sensitive to rounding error unless
variables are declared as double precision)
on IBM workstations using f77 and cc under AIX 4.3.2 (with edit:
remove the -DSUNF option from cc in the Makefile)
---------------------------------------------
RUNNING
Nmix is a Fortran program for Unix-like systems, implementing
the methodology for univariate normal mixture
analysis described in Richardson and Green
(J. R. Statist. Soc. B, 1997, 59, 731-792; see also the
correction in J. R. Statist. Soc. B, 1998, 60, 661).
These notes assume familiarity with this paper.
The program reads a single main data file, and produces
multiple output text files summarising the posterior
distribution of the model, for subsequent analysis
and display (R or Splus are good choices for these
tasks). (Only very limited summary information
is computed by this program itself.)
The run of the program is controlled by options, switches
and parameter settings that have sensible default values,
and can be set by command line arguments in Unix shell
style. For example, the runs on the Enzyme dataset reported
in Section 4.1 of the paper, with run-length of 100000,
burn-in of 100000, and minimal output options could be
replicated by:
Nmix -n100000 -nb100000 enz
The runs described in Section 5.1.1 leading to Fig. 6(a)
would use the -b option, -b0.08 -b0.02 and -b0.005
respectively (given that range-based hyperparameters
are in use (see Section 2.4 of paper), by default).
The principal options are as follows, listed in
several main groups. In this list, N denotes
a numerical value passed in as a parameter of the model
or sampler, other options are logical settings
that are by default off, turned on by specifying the
option.
1. Model options
-prior disregard the likelihood and the data (except possibly
in setting prior settings, etc.), thus computing the
prior instead of the posterior.
-fix suppress dimension-changing moves, thus fixing the
number of mixture components (to the value set by the
-ki option below).
2. Parameter settings
defaults, including use of range-based hyperparameters,
are as used in Section 4.1.
-norange do NOT use range-based hyperparameters
(default: use them)
-lN set lambda (hyperparameter for k) (default -1)
lambda=-1: Uniform
lambda=0: p(k)=1/k
lambda>0: Poisson(lambda)
-dN set delta (hyperparameter for weights) (default 1)
-xN set xi (hyperparameter for means) (default 0)
-kN set kappa (hyperparameter for means) (default 1)
negative value signifies Gamma(e,f)
-spN set s (spacing parameter in prior for means)
(default 1)
(see rejoinder to discussion of paper, page 786)
-aN set alpha (hyperparameter for variances) (default 2)
-bN set beta (hyperparameter for variances) (default
Gamma(g,h))
-gN set g (hyperparameter for beta) (default 0.2)
-hN set h (hyperparameter for beta) (default 10)
-eN set e (hyperparameter for kappa) (default 0)
-fN set f (hyperparameter for kappa) (default 0)
The upper limit kmax of the number of components
is hard-wired equal to 30 in the program - it can
be changed by editing line 1 of Nmix.f (where
it is termed ncmax)
3. Monte Carlo sampler options
-nN number of sweeps (default 10000)
-nbN length of burn-in (default 0)
-kiN initial number of components (default 1)
-noempty do NOT use empty-component birth-death moves
-seedN random number seed (default 0, meaning use clock
time to initialise). In the log file for the run,
the seed actually used is printed, and the run
can be repeated with the same random numbers by
specifying that value in this option on the
subsequent run.
4. Output options
-pLETTERS additional output files, if LETTERS includes
d: .avn .den .dev
c: .pcl .scl
w: .wms.*
a: .z.*
(default: none of these)
a, w, and c each force d as well
See below for contents of these files.
-full same as -pd
-nsN sets nsamp: number of (equi-spaced) Monte Carlo
samples dumped in .out file (default 100)
-nspN sets nspace: spacing in sweeps between successive
records in trace files .ent, .bk, .dev, .wms*,
.z* (default 20)
-debug turn on verbose output
Input and output file names
The input data should be in a file with extension .dat,
with the sample size n on the first line, and the data
values y_1, y_2, ..., y_n following in free format, with arbitrary
spaces and newlines.
If the input file is NAME.dat, then all output files
are in a subdirectory NAME (which must be created before
running Nmix). All these output files are given names of the
form N.ext, where N is a unique integer assigned by the program
to label successive runs (it chooses the smallest positive integer
such that N.log does not already exist), and .ext signifies the
type of output the file contains.
The extensions and their meanings are:
(a) normal output:
.log logs information about run, parameter settings,
acceptance rates, etc.
.k posterior for k, Bayes factors, prior on k,
posterior for kpos, empty-component statistics
and, optionally, average deviances, etc.
.pe parameter estimates: posterior means for weights,
means and standard deviations for each component,
conditional on each value of k
.out [*] sample of output states: k, kpos,
weight, mean, standard deviation and number of
observations assigned, for each component
.bdlog [*] changes to k or kpos: number of sweep (file not
produced when -fix option set)
.ent [nsp] k, kpos, entropy
(b) additionally, if beta or kappa are random:
.bk [nsp] beta, kappa, k, kpos
(c) optionally, if -debug specified
.db verbose records of all moves of the sampler
(d) optionally, as determined by -p... options (see above)
.avn for each k, number of visits to k, average numbers
of observations assigned to each component
.den posterior expected density, on a grid of 200
equi-spaced
y values (spanning the range of the data, expanded by
5% in each direction): output is grid, density,
conditional
on k=1,2,...,6, and unconditionally
.dev [nsp] k, kpos, deviance, modified deviance
.pcl predictive classification: for each k, for each
j=1,2,..,k-1, for each y* on grid (see .den above),
posterior probability that z*=j
.scl within-sample classification: for each k, for each
j=1,2,..,k-1, for each i = 1,2,...,n, posterior
probability that z(i)=j
.wms.K [nsp]in a separate file for each k, current values for
weights, means and standard deviations of each
component j=1,2,...,k
(thus each sample record is of k consecutive lines)
.z.K [nsp] in a separate file for each k, current values for
allocations z(i),i=1,...,n
key: kpos = number of non-empty components
[nsp] - trace files with output dumped every nspace sweeps (see
-nsp option above)
[*] - other trace files
other output files contain information aggregated over the run.
Nmix is free of charge for educational and non-commercial research purposes.
Please report any problems with the code to the author, Peter Green, via
Email link
|