Publicly available datasets
The HGDP data is a good place to
start. You can download pre-phased data in PHASE output format, which is in
a form very close to that required by ChromoPainter.
The human recombination map
HapMap B37 data obtained
from nih
can be processed with "convertrecfile.pl -M
hapmap". (this script is included in finestructure)
Our Simulated data is available for download. This is for use with the Complex Example.
Our HGDP Coancestry (i.e. chunk
counts) matrix as described in the main paper is available, in case it
is of use to anyone. In addition, you can download
the HGDP Population results, as an R object.
This contains the coancestry matrix ("chunkcounts"), the list of populations
("poplist"), the populaiton-wise average ("avemat") and SD ("sdmat")
matrices, and the tree ("hgdpdend"). This should be used with
the R library. The order of the tree,
populations and population-level matrices are the same. The individuals
are ordered using the HGDP
ordering (trivially processed
from the
HGDP page which has one line per haplotype rather than per individual), with individual POPi being the i-th
entry of that pop; for example, HGDP00791 is Japanese27.
Sampling your own data
You do not need a recombination map to use our software. You need either
sequencing data, or dense SNP chip data to perform analysis correctly. Dense
here means that the SNPs are in LD.
See the advice of
phasing.
From above, note that the human recombination map
HapMap B37 data obtained
from nih
can be processed with "convertrecfile.pl -M
hapmap".