PaintMyChromosomes.com
fineSTRUCTURE v2 & GLOBETROTTER

Finestructure Icon
© 2012 Daniel Lawson.
Website template by Arcsin

12 Potential pitfalls

If your data is not correctly in the format we expect, then anything can go wrong. We try to detect this but we don’t test everything. Check that your data are valid first!

The main pitfalls that can happen with valid data are:

  1. ChromoPainter parameter estimation fails. This happens when the default parameters are too far from the true parameters, and therefore the parameter estimation converges to a suboptimal solution (usually with effectively infinite or zero recombination rate).
    • Symptoms: getting a silly value of ‘c’ (tiny), getting very many or very few chunks: row sums of the chunk count matrix being close to the number of SNPs or being about 1. The *EMprobs.out files probably aren’t converged. When running -combines2 you may get a warning about ‘c’ being out of the expected range.
    • Happens when: using data with too large or too small genetic distance between SNPs. Happens with simulated data and with non-humans, particularly when using makeuniformrecfile.pl to make a recombination map, which assumes human-like SNP density.
    • Solutions: Rerun stage1 with a different starting location for Ne. Try either very much larger or very much smaller than the default, in the opposite direction to the inferred values. The default is 400000/number of donor haplotypes. Obtain the estimate using grep Neinf <file>.cp. Set the parameter via -s1args:-in\ -iM\ --emfilesonly\ -n <value> where you replace <value> with a number, e.g. 10 or 100000. (The other arguments are defaults that only experts should change.)
  2. ChromoPainter ‘c’ estimation fails.
    • Symptoms: Usually you will get a ‘ChromoCombine’ error and be told that no regions were found. You should rerun stage2.
    • Happens when: The parameters are badly inferred. There isn’t very much data. The recombination rate is very low, resulting in high LD.
    • Solutions: Try setting -reset 2 -s2chunksperregion <value> (chromopainter’s -k) to a lower <value>, less than the rowsums of each chromosome of each individual. If that is lower than about 20, see below.
  3. ChromoPainter ‘c’ estimation went wrong, but passed tests.
    • Symptoms: MCMC results are over-split.
    • Happens when: The parameters are badly inferred. There isn’t very much data. The recombination rate is very low, resulting in high LD.
    • Solutions: As above. If that isn’t possible, you may have to resort to setting ‘c’ manually. -duplicate 3 <newroot>.cp -s34args:-c\ 1.0 will create a new MCMC run with c=1. This is typically conservative and will be a good baseline for deciding if splits are clear or not.