Biodiversity studies that analyse large numbers of specimens are being published at an unprecedented scale. The study objects, primarily insects, are often obtained using Malaise traps. The traps were invented by the Swedish entomologist Rene Malaise to collect sawflies, but they are very efficient for other insects, in particular flying insects like Diptera and Hymenoptera. Since the traps operate 24/7, they are able to collect enormous amounts of specimens ranging from hundreds to thousands of specimens during a collecting period of 1-2 weeks for a single trap. With multiple traps placed in a habitat to account for local variation, that operate for several seasons to account for temporal variation in insect diversity and abundance, a near-to-complete sample of the species diversity of insects at particular location can be obtained.
The analysis of organisms from Malaise trap samples primarily requires two major steps, the separation of individual specimens from bulk samples, containing thousands of often small-sized insects like tiny parasitoid wasps or flies and midges. A second step is their identification, i.e. the assignment to known species. Traditional procedures and techniques that try to cope with insects and other arthropods from Malaise trap samples suffered from the problem of scalability, and allowed only to utilize a small fraction of specimens (and therefore species). Any attempt to fully explore the diversity would be doomed by the sheer quantity of insects, even from only a single bulk sample. Apart from the labour required for processing, the vast majority of specimens in a sample, in particular megadiverse groups like Diptera Hymenoptera, cannot be assigned to a known species due to the severe lack of taxonomists. This is even the case for countries like Germany with a taxonomic history of over 200 years, and in certain groups of Diptera hundreds or even thousands of species still await discovery and documentation (link to article in German).
DNA barcoding for biodiversity discovery and monitoring
For the identification of unknown specimens of animals and for the discovery of unknown species, DNA barcoding has become the de-facto standard. And the method is scalable. The Centre for Biodiversity Genomics in Guelph, Canada, has been processing about 10 million specimens of over 700,000 species worldwide. A new global project, called BIOSCAN, aims at barcoding 10 million specimens and assembling barcode coverage for two million species. Other technological approaches set out to simplify the sequencing process by minimizing required lab equipment and costs, thus lowering the threshold for using DNA barcoding as the method of choice for species identification, discovery, and monitoring, requiring a minimum of resources and expertise.
While sequencing technology continues to advance at a fast pace, preparing specimens from bulk samples still poses a major obstacle in large-scale biodiversity projects that employ mass-collecting devices like Malaise traps. Attempts to circumvent the problem exist, namely metabarcoding, but the method has several drawbacks, including incomplete coverage due to primer bias and/or due to low DNA concentrations of rare species.
DiversityScanner – automated sorting of smaller insects using artificial intelligence methods
Sorting specimens from Malaise trap samples is extremely labour intensive and requires the expertise of trained taxonomists. Since this problem is still a key issue with any large-scale biodiversity project, entomologists and machine learning specialists teamed up to remedy the situation. The biodiversity researcher Rudolf Meier from the Museum für Naturkunde in Berlin and the group of Christian Pytuliak from the Karlsruhe Institute of Technology developed, in collaboration with entomologists from the Zoologische Staatssammlung München and the Sapienza University of Rome, developed ‘DiversitScanner’, a robot for the automated sorting of small insects into different classes using artificial intelligence.
The DiversityScanner is able to pick individual insects from samples and photograph them. A computer then uses a type of artificial intelligence known as machine learning to compare the wings, antennae, legs, and other characters of each individual to known specimens. The warmer the color, e.g. red, the more important the body parts are for the identification. In a further step, each insect is individually transferred to a plate with 95 wells. The samples can then be genetically analyzed, whereby a “DNA barcode” is generated for each insect, which is then compared with known species in a public reference database.
The accuracy of the robot currently is around 91%, i.e. about 9 out of 10 insects are correctly classified. According to the researchers, who recently published their study on the preprint server bioRxiv, the accuracy can be improved if more samples are available for training the robot. The diversity scanner software and 3D printing plans have been made publicly available.
One major advantage of the DiversityScanner is its scalability. It therefore addresses one of the major problems of biodiversity studies that deal with large quantities of specimens and species, including many insects that cannot be assigned to any known species, either because of lack of taxonomic expertise or because specimens belong to species that are still awaiting discovery. Identification of organisms through DNA barcoding has become quick, reliable and inexpensive. The DiversityScanner holds great promise for expediting the task of sorting at a similar scale.
German Barcode of Life – GBOL III: Dark Taxa
Although great progress has been made in assembling the genetic reference database for German animals, many species, in particular insects, are not yet included in the genetic reference database. The new project ‘GBOL: Dark Taxa’, aims to remedy the situation and focuses on unknown species, so-called ‘dark taxa’, including several groups of megadiverse and little known groups of insects of Diptera and parasitoid Hymenoptera. The three-year project that is supported by the Federal Ministry of Education and Research (BMBF), is devoted to the discovery of unknown species, so-called “dark taxa”, in our native fauna using an integrative taxonomic approach, including DNA barcoding.
GBOL III aims at contributing to the BIOSCAN initiative of the Centre for Biodiversity at a global and a national scale by laying the foundations for a large-scale biomonitoring system to record the biodiversity of our planet. In times of rising temperatures, increasing weather extremes and receding ice, and unprecedented loss of biodiversity, new technologies like the DiversityScanner are urgently needed to overcome the obstacles any research encounters when trying to record and monitor the species biodiversity on earth, most of which is still little known or even unknown to the present day.
Wuehrl L, Pylatiuk C, Giersch M, Lapp F, von Rintelen T, Balke M, Schmidt S, Cerretti P, Meier R (2021) DiversityScanner: Robotic discovery of small invertebrates with machine learning methods. bioRxiv: 2021.05.17.444523. https://doi.org/10.1101/2021.05.17.444523. YouTube: https://youtu.be/ElJ5VSHa4OI
Science News (4 June 2921): Artificial intelligence could help biologists classify the world’s tiny creatures. doi:10.1126/science.abj8374
Morinière J, Balke M, Doczkal D, Geiger MF, Hardulak LA, Haszprunar G, Hausmann A, Hendrich L, Regalado L, Rulik B, Schmidt S, Wägele J, Hebert PDN (2019) A DNA barcode library for 5,200 German flies and midges (Insecta: Diptera) and its implications for metabarcoding‐based biomonitoring. Molecular Ecology Resources 19: 900–928. https://doi.org/10.1111/1755-0998.13022
Stefan Schmidt – ZSM