Summary of tools for data preparation
Main tools (links to more information and download):
- ChromoPainter, which creates coancestry matrices from PHASE format raw data files containing NO MISSING DATA.
- ChromoCombine, which calculates "c" and combines multiple chromopainter output files.
- fineSTRUCTURE, which assigns individuals to populations based on the chromocombine output (using MCMC or stochastic optimization).
Utility tools provided here:
- memorycap, which allows you to monitor and
cap the memory used by a process. This is extremely helpful for managing
runs on an institutional cluster when there are both a large number of SNPs
and individuals, for which ChromoPainter reserves a lot of memory.
- makeuniformrecfile.pl, which creates a uniform recombination file for use with the linkage model of chromopainter. For usage, see the Complex Example.
- convertrecfile.pl, which converts
between CDF and PDF style recombination map files, and can take a wide
varienty of map formats and convert them into a suitable format for
ChromoPainter. For example,
the HapMap B37 data obtained
from nih
can be processed with "convertrecfile.pl -M hapmap", but other formats are supported.
- neaverage.pl, which computes the average value of the effective population size when using chromopainter in EM mode for parameter estimation. For usage, see the Complex Example.
- plink2chromopainter.pl, a conversion script for going from PLINK style PED and MAP files to ChromoPainter's PHASE and MAP files.
- impute2chromopainter.pl, a
conversion script for going from IMPUTE2 phased format (.haps files, this
includes SHAPEIT) to ChromoPainter's PHASE and MAP files.
- chromopainter2impute2.pl, a
conversion script for going from PHASE format to IMPUTE2 and SHAPEIT ".haps" files.
- transpose.pl, a tool to rotate matrices, for example if you have prepared your files in excel you might have them transposed compared to that required here.
- chromopainterindivrename.pl, a tool to add individual names into chromopainter output if you did not set this up correctly beforehand.
- phase2beagle.pl, to convert between ChromoPainter PHASE/RECOMBFILE and BEAGLE .bgl/.markers format.
- phasesubsample.pl, to extract
subsets of a phase file (e.g. to test code, or perform EM estimation on
small datasets)
- phasescreen.pl, to remove non-varying
or singletons from a PHASE file.
- ped2ippca.pl, to convert to ippca's csv format.
- FineSTRUCTURE R tools, for advanced plotting, using continent force files, and creating PCA plots from known populations.
- msms2cp.pl, to
convert msms and ms
format to ChromoPainters phase format.
- FineSTRUCTURE R tools, for advanced plotting, using continent force files, and creating PCA plots from known populations.
- finestructuregreedy.sh, to automate greedy optimisation of fineSTRUCTURE, avoiding the lengthly MCMC step (see Greedy Optimisation for details).
- hap2dip.pl, to convert a haploid
chromopainter matrix (finestructure input matrix) into a diploid, optionally
adding names.
Other software that will likely be useful:
- IMPUTE2, which both PHASES data, as well as IMPUTING MISSING SNPS. Both of these stages are necessary for the best use of our software, and the imputing stage is always necessary if your data contain missing values. This is a convenient choice of phasing software since we've provided a conversion script (impute2chromopainter.pl).
- PLINK, a popular software suite for manipulating genetics data. Although this is not the same file format as used by ChromoPainter, we have provided plink2chromopainter.pl which converts between the two (for both the UNLINKED and LINKED cases).
- The phasing pipeline provided with GERMLINE, which provides two very helpful tools: converters from PED to BGL and vice-versa. This uses BEAGLE to do the phasing.
- The FCgene
format converter can convert between many common file formats, including
PLINK, which can be converted into PHASE format for ChromoPainter using
the plink2chromopainter.pl script
(see above).