A team of researchers affiliated with several institutions in China and two in the U.S. has developed a way to use machine learning to get a better look at the past. In their paper published in the journal Science, the group describes how they used machine learning to analyze records of the past.

Scientists use fossils to date rocks because they have no way to test the age of rock directly. Prior research has shown that most species only exist for a certain amount of time. If scientists determine the time when a given dinosaur lived in a given area, they can use that information to date the local rocks using the fossils embedded within them. A drawback to this method of dating rocks is that it does not give scientists a very fine filter when attempting to create a timeline of historical events such as mass extinctions.

© Nanjing University

Jun-xuan Fan et al. A high-resolution summary of Cambrian to Early Triassic marine invertebrate biodiversity, Science (2020). 

DOI: 10.1126/science.aax4953

Significance: We have pressing, human-generated reasons to explore the influence of environmental change on biodiversity. Looking into the past can not only inform our understanding of this relationship but also help us to understand current change. Paleontological records depend on fossil availability and predictive modeling, however, and thus tend to give us a picture with large temporal jumps, millions of years wide. Such a scale makes it difficult to truly understand the action of environmental forces on ecological processes. Enabled by a supercomputer, Fan et al. used machine learning to analyze a large marine Paleozoic dataset, creating a record with time intervals of only ∼26,000 years (see the Perspective by Wagner). This fine-scale resolution revealed new events and important details of previously described patterns.


Abstract: One great challenge in understanding the history of life is resolving the influence of environmental change on biodiversity. Simulated annealing and genetic algorithms were used to synthesize data from 11,000 marine fossil species, collected from more than 3000 stratigraphic sections, to generate a new Cambrian to Triassic biodiversity curve with an imputed temporal resolution of 26 ± 14.9 thousand years. This increased resolution clarifies the timing of known diversification and extinction events. Comparative analysis suggests that partial pressure of carbon dioxide (PCO2) is the only environmental factor that seems to display a secular pattern similar to that of biodiversity, but this similarity was not confirmed when autocorrelation within that time series was analyzed by detrending. These results demonstrate that fossil data can provide the temporal and taxonomic resolutions necessary to test (paleo)biological hypotheses at a level of detail approaching those of long-term ecological analyses.