The taxonomic attribution of the skeletal material of Sima de los Huesos (SH, Spain) has puzzled researchersfor decades. At the time of their discovery, the SH hominins were considered as pre-Neanderthal based on their morphology [1]. Later, tooth dimensions were found to link the SH sample to middle Pleistocene specimens such as Mauer, Montmaurin, and Arago II [2], which are commonly ascribed to Homo heidelbergensis. More recently, a close relationship between Neanderthals and the SH specimens was suggested based on the morphology of cranial vault and face, in particular regarding the masticatory apparatus [3]. Decades of unresolved debate produced three plausible scenarios for the status of SH in the human ancestry: (I) SH has to be regarded as part of the variability of H. heidelbergensis, (II) SH should be considered H. neanderthalensis, (III) SH andH. heidelbergensiscould represent two separate evolutionary lineages co-existing during middle Pleistocene in Eurasia, with SH being phylogenetically closer to Neanderthals. In this work, we applied supervised machine learning to test for the affinity of SH (N: 126) to H. heidelbergensis (N: 13), H. neanderthalensis (N: 73), and H. sapiens (N: 403). The sample included Mesio-Distal and Bucco-Lingual diameters of mandibular postcanine dentition. Because of the fragmentary nature of the dental fossil and archaeological records, several missing data were included in the dataset and were estimated using Multiple Imputation via Predictive Mean Matching [4]. To avoid biases due to disproportions in sample size, missing data were estimated separately for each group considered. A recursive partitioning algorithm known as Conditional Inference Tree (CIT) [5] was used to build a hierarchical model for predicting the classification of dental metric data into each of the categorical groups considered (SH, H. heidelbergensis, H. neanderthalensis, and H. sapiens). The data was divided in one training set and one test set. The training set (N: 105) was used to allow the CIT algorithm to determine the best classification solution for the dental data. The test set (N: 497) was necessary to control the accuracy of classification. Our results show that when SH is considered as part of H. heidelbergensis it is easily misclassified as Neanderthal (33.6% accuracy). On the other side, SH is recognised as an independent group when other H. heidelbergensis specimens are discarded from the analysis (95.6% accuracy). In both cases, the classification is based on a major contribution of the Bucco-Lingual diameter of the third premolar in association with the dimensions of the second and third molars. The accuracy associated with the H. sapiens and H. neanderthalensis samples indicates that the model is capable of classifying discrete taxonomic groups. Our results support the hypothesis that SH represents a distinct evolutionary lineage from H. heidelbergensis and suggest that SH could be closely related to “classic” Neanderthals. This work represents the first application of supervised machine learning to Palaeoanthropology and highlights the potential of this field to the study of human evolution. References:[1] Aguirre, E., & De Lumley, M. A. (1977). Fossil men from Atapuerca, Spain: their bearing on human evolution in the Middle Pleistocene. Journal of human evolution, 6(8), 681-688.[2] de Castro, J. M. B. (1986). Dental remains from Atapuerca (Spain) I. Metrics. Journal of Human Evolution, 15(4), 265-287.[3] Arsuaga, J. L., Martínez, I., Arnold, L. J., Aranburu, A., Gracia-Téllez, A., Sharp, W. D., ... & Poza-Rey, E. (2014). Neandertal roots: Cranial and chronological evidence from Sima de los Huesos. Science, 344(6190), 1358-1363.[4] Morris, T. P., White, I. R., & Royston, P. (2014). Tuning multiple imputation by predictive mean matching and local residual draws. BMC medical research methodology, 14(1), 75.[5] Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological methods, 14(4), 323

Of teeth and algorithms: machine learning reveals the taxonomy of Sima de los Huesos / Veneziano, Alessio; Towle, Ian; De Groote, Isabelle; Raia, Pasquale. - (2018). (Intervento presentato al convegno Annual Meeting of the European Society for the Study of Human Evolution 8 tenutosi a Faro, Portugal).

Of teeth and algorithms: machine learning reveals the taxonomy of Sima de los Huesos

Pasquale Raia
2018

Abstract

The taxonomic attribution of the skeletal material of Sima de los Huesos (SH, Spain) has puzzled researchersfor decades. At the time of their discovery, the SH hominins were considered as pre-Neanderthal based on their morphology [1]. Later, tooth dimensions were found to link the SH sample to middle Pleistocene specimens such as Mauer, Montmaurin, and Arago II [2], which are commonly ascribed to Homo heidelbergensis. More recently, a close relationship between Neanderthals and the SH specimens was suggested based on the morphology of cranial vault and face, in particular regarding the masticatory apparatus [3]. Decades of unresolved debate produced three plausible scenarios for the status of SH in the human ancestry: (I) SH has to be regarded as part of the variability of H. heidelbergensis, (II) SH should be considered H. neanderthalensis, (III) SH andH. heidelbergensiscould represent two separate evolutionary lineages co-existing during middle Pleistocene in Eurasia, with SH being phylogenetically closer to Neanderthals. In this work, we applied supervised machine learning to test for the affinity of SH (N: 126) to H. heidelbergensis (N: 13), H. neanderthalensis (N: 73), and H. sapiens (N: 403). The sample included Mesio-Distal and Bucco-Lingual diameters of mandibular postcanine dentition. Because of the fragmentary nature of the dental fossil and archaeological records, several missing data were included in the dataset and were estimated using Multiple Imputation via Predictive Mean Matching [4]. To avoid biases due to disproportions in sample size, missing data were estimated separately for each group considered. A recursive partitioning algorithm known as Conditional Inference Tree (CIT) [5] was used to build a hierarchical model for predicting the classification of dental metric data into each of the categorical groups considered (SH, H. heidelbergensis, H. neanderthalensis, and H. sapiens). The data was divided in one training set and one test set. The training set (N: 105) was used to allow the CIT algorithm to determine the best classification solution for the dental data. The test set (N: 497) was necessary to control the accuracy of classification. Our results show that when SH is considered as part of H. heidelbergensis it is easily misclassified as Neanderthal (33.6% accuracy). On the other side, SH is recognised as an independent group when other H. heidelbergensis specimens are discarded from the analysis (95.6% accuracy). In both cases, the classification is based on a major contribution of the Bucco-Lingual diameter of the third premolar in association with the dimensions of the second and third molars. The accuracy associated with the H. sapiens and H. neanderthalensis samples indicates that the model is capable of classifying discrete taxonomic groups. Our results support the hypothesis that SH represents a distinct evolutionary lineage from H. heidelbergensis and suggest that SH could be closely related to “classic” Neanderthals. This work represents the first application of supervised machine learning to Palaeoanthropology and highlights the potential of this field to the study of human evolution. References:[1] Aguirre, E., & De Lumley, M. A. (1977). Fossil men from Atapuerca, Spain: their bearing on human evolution in the Middle Pleistocene. Journal of human evolution, 6(8), 681-688.[2] de Castro, J. M. B. (1986). Dental remains from Atapuerca (Spain) I. Metrics. Journal of Human Evolution, 15(4), 265-287.[3] Arsuaga, J. L., Martínez, I., Arnold, L. J., Aranburu, A., Gracia-Téllez, A., Sharp, W. D., ... & Poza-Rey, E. (2014). Neandertal roots: Cranial and chronological evidence from Sima de los Huesos. Science, 344(6190), 1358-1363.[4] Morris, T. P., White, I. R., & Royston, P. (2014). Tuning multiple imputation by predictive mean matching and local residual draws. BMC medical research methodology, 14(1), 75.[5] Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological methods, 14(4), 323
2018
Of teeth and algorithms: machine learning reveals the taxonomy of Sima de los Huesos / Veneziano, Alessio; Towle, Ian; De Groote, Isabelle; Raia, Pasquale. - (2018). (Intervento presentato al convegno Annual Meeting of the European Society for the Study of Human Evolution 8 tenutosi a Faro, Portugal).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11588/720753
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact