My methodology revolves around two themes - Data At Scale, and using Modelling the process.
I am very proud of our work on CLARITY - short for Comparing simiLARITY matrices. This developing methodology is for comparing anything, to anything. As you will see in the Applications, we have already explored comparing Genes to Language, Culture to Economics, and Methylation biomarkers to Gene Expression.
You can compare anything to anything, as long as you can measure them both for the same set of items. The approach works by decomposing a similarity matrix into a "structure" and "relationship" in one dataset - a sort of soft clustering - and asking which elements of the structure are present in a second dataset.
Bayesian Epidemic Modelling
I have been part of a University-wide team to apply high quality modelling to understand and predict the Epidemic. Our released work addressed bed capacity modelling in the South-West of England but of course out interests are much wider.
At the institute of Statisticial Sciences, our main role is to improve the quality of statistical tools that can be deployed in practice. To this end I'm leading a team of Undergraduate students exploring the application of Machine Learning tools to learn summary statistics using Approximate Bayesian Computation.
For a long time my research has used Bayesian Clustering to understand Genetics but lately I have been exploring the relationship between the now-standard approach to clustering we deployed in FineSTRUCTURE to that of the Stochastic Block Model and its many variants. With Prof Patrick Rubin-Delanchy I am looking into how Spectral methods can be used to perform clustering-like tasks, as described in CLARITY above. With Prof Robert Allison I am exploring more model-based approaches.
I've been working on genetics and evolution for my entire research career. Whilst most methodology I develop has wider application, I have always put in extra care to ensure that methodology for genetics takes into account the specialities specific to this data.
FineSTRUCTURE is a whole pipeline that deserves, and has, its own FineSTRUCTURE website. It is a sophisticated modelling tool that uses Data Science ideas - of identifying computational questions that can be answered, and wrapping them up in a statistical modelling framework that means something. The FineSTRUCTURE algorithm was developed in 2012 but is still the most accurate way to estimate fine-scale variations in Ancestry.
High Profile applications include:
Genomic Architecture is a description of how the whole genome comes together to construct a complex trait, such as height, education, body-mass-index, and so on. The relationship is extremely rich and of course depends on all sorts of variables such as cultural practice, personal circumstances, and so on.
My work focusses on population structure and how this has confounded previous analyses, as well as methods to limit this confounding. Key outputs include:
badMIXTURE is an important tool to compare the output of some claimed mixture to another dataset that may or may not show this mixture. It works by comparing mixtures generated using genome-wide unlinked markers (with tools such as ADMIXTURE) to results from FineSTRUCTURE above. These are theoretically the same if the mixture is true.
badMIXTURE is published under the title A tutorial on how not to over-interpret STRUCTURE and ADMIXTURE bar plots. As always, it turned out to be important to understand the details of what the models were doing in order to make the software appropriate for the complexity that is genetic data.
badMIXTURE is the spiritual precursor to CLARITY, which expands this idea to a much wider range of models.
I love applying data science in weird and wonderful places! This is a selection of the most notable occurances of data science I've had the pleasure to be involved with.
Did you know that Religious change preceded economic change in the 20th century? Damian Ruck wrote this up in The Conversation.
We also established the Cultural prerequisites of socioeconomic development by structuring the changes into a coherent model.
In both cases this uses a large worldwide dataset consisting of several time-points, hundreds of countries and millions of questionaire results. Sense making is done through dimensionality reduction to understandable variables, which can be modelled with Time Series methodology.
Wind Energy market models
We want Renewable energy to replace conventional fossil fuels. But how can governments use markets to make this happen? In Performance comparison of renewable incentive schemes using optimal control we showed that there are real implications to the choices made in market manipulation - for the same amount of support given to the industry, some schemes are markedly better than others!
A real out-there application of Mathematics is Historical Dynamics. We found that Apparent strength conceals instability in a model for the collapse of historical states. We tried very hard to make "qualitative data sets" from history, to assess whether our mathematical model was making consistent predictions or not.
The implications are truly fascinating: the empires and great states of the past may have failed not because of some external accident or event, but simply because human nature (game theory) says that Human political systems will evolve to an unstable tipping point!
I currently teach on the following Units:
Bootcamp is how Compass - EPSRC Centre for Doctoral Training in Computational Statistics and Data Science takes diverse mathematicians and even computer scientists and brings them all up to a uniform high level for onward teaching in the Tought Course Programme, before students undertake their main PhD.
Data Science Toolbox
Data Science Toolbox is a truly unique experience. It contains everything a mathematician needs to know to do Data Science. Tought as part of the MSc Mathematics of Cybersecurity program, it is carefully integrated to use Cyber Security examples throughout, ensuring that students learn their data and models as well as the core Data Science that is needed to analyse it. We cover everything from Exploratory Data Analysis to calibrated models, Statistics to Machine Learning, R studio to High Performance Parallel processing.
There is no doubt that Data Science is a hard topic that needs both Big Picture and gory details to be implemented and understood correctly. This 2-semester course lets students do exactly that.
School of Mathematics
University of Bristol, Fry Building, GA.06 Woodland Road Bristol, BS8 1UG.
Tel: +44 (0) 117 456 0044
dan.lawson [at] bristol.ac.uk
Alternative email address: danjlawson2000 [at] yahoo [dot] com