Tools for download

Finestructure Icon

Summary of tools for data preparation

Main tools (links to more information and download):


  • ChromoPainter, which creates coancestry matrices from PHASE format raw data files containing NO MISSING DATA.
  • ChromoCombine, which calculates "c" and combines multiple chromopainter output files.
  • fineSTRUCTURE, which assigns individuals to populations based on the chromocombine output (using MCMC or stochastic optimization).

Utility tools provided here:


  • memorycap, which allows you to monitor and cap the memory used by a process. This is extremely helpful for managing runs on an institutional cluster when there are both a large number of SNPs and individuals, for which ChromoPainter reserves a lot of memory.
  • makeuniformrecfile.pl, which creates a uniform recombination file for use with the linkage model of chromopainter. For usage, see the Complex Example.
  • convertrecfile.pl, which converts between CDF and PDF style recombination map files, and can take a wide varienty of map formats and convert them into a suitable format for ChromoPainter. For example, the HapMap B37 data obtained from nih can be processed with "convertrecfile.pl -M hapmap", but other formats are supported.
  • neaverage.pl, which computes the average value of the effective population size when using chromopainter in EM mode for parameter estimation. For usage, see the Complex Example.
  • plink2chromopainter.pl, a conversion script for going from PLINK style PED and MAP files to ChromoPainter's PHASE and MAP files.
  • impute2chromopainter.pl, a conversion script for going from IMPUTE2 phased format (.haps files, this includes SHAPEIT) to ChromoPainter's PHASE and MAP files.
  • chromopainter2impute2.pl, a conversion script for going from PHASE format to IMPUTE2 and SHAPEIT ".haps" files.
  • transpose.pl, a tool to rotate matrices, for example if you have prepared your files in excel you might have them transposed compared to that required here.
  • chromopainterindivrename.pl, a tool to add individual names into chromopainter output if you did not set this up correctly beforehand.
  • phase2beagle.pl, to convert between ChromoPainter PHASE/RECOMBFILE and BEAGLE .bgl/.markers format.
  • phasesubsample.pl, to extract subsets of a phase file (e.g. to test code, or perform EM estimation on small datasets)
  • phasescreen.pl, to remove non-varying or singletons from a PHASE file.
  • ped2ippca.pl, to convert to ippca's csv format.
  • FineSTRUCTURE R tools, for advanced plotting, using continent force files, and creating PCA plots from known populations.
  • msms2cp.pl, to convert msms and ms format to ChromoPainters phase format.
  • FineSTRUCTURE R tools, for advanced plotting, using continent force files, and creating PCA plots from known populations.
  • finestructuregreedy.sh, to automate greedy optimisation of fineSTRUCTURE, avoiding the lengthly MCMC step (see Greedy Optimisation for details).
  • hap2dip.pl, to convert a haploid chromopainter matrix (finestructure input matrix) into a diploid, optionally adding names.

Other software that will likely be useful:


  • IMPUTE2, which both PHASES data, as well as IMPUTING MISSING SNPS. Both of these stages are necessary for the best use of our software, and the imputing stage is always necessary if your data contain missing values. This is a convenient choice of phasing software since we've provided a conversion script (impute2chromopainter.pl).
  • PLINK, a popular software suite for manipulating genetics data. Although this is not the same file format as used by ChromoPainter, we have provided plink2chromopainter.pl which converts between the two (for both the UNLINKED and LINKED cases).
  • The phasing pipeline provided with GERMLINE, which provides two very helpful tools: converters from PED to BGL and vice-versa. This uses BEAGLE to do the phasing.
  • The FCgene format converter can convert between many common file formats, including PLINK, which can be converted into PHASE format for ChromoPainter using the plink2chromopainter.pl script (see above).