Palaeoanalytics

 

using data science and machine learning to develop cross-disciplinary analytical methods in human evolutionary studies

Genomics has transformed human evolutionary studies as much as it has other parts of biology. One of the reasons for this impact is the sheer scale of data now available, and power of the analytical techniques used. Machine learning and data science have in effect, swamped traditional approaches to human evolution. However, the palaeosciences – palaeontology, archaeology, earth sciences – have a major role to play, supplying hypotheses, providing, contextual information and above all, providing evidence for the evolution of the phenotype and extended phenotype.

The major challenge is to develop data structures and analytical methods for these aspects that can be integrated with genomics. The aim of this project is to take up this challenge, and develop methods drawn from machine learning and data science that would greatly enhance the quantity and quantification of the complex data of the palaeosciences – morphometrics of fossils, attributes of the millions of stones tools that reflect hominin behaviour, environmental context and more. The data are in the form of books, papers, reports, and are in text, tabular and image form. These will require advanced algorithm-based input methods. Turning these into usable data will be based on classification of the features that will form the basis of the output data. Methods used will include string-searching algorithms, deep learning and computer vision. The primary output is to produce a widely applicable protocol/workflow from raw archived data to analysable database that can be applied widely to modern human evolution relevant data. 

The project will form the platform for integrating genomics and palaeo-phenotype data, and so greatly increase the range of analyses possible on the patterns and processes of human evolution. Human evolution is a central problem in biology, both for its intrinsic interest and for the implications for both the medical and cognitive sciences, and the relationship between humans and biodiversity overall.