In Italy, the buffalo Mediterranean (MED) breed is reared to mainly produce the famous “Mozzarella di Bufala Campana” PDO. A better knowledge of the buffalo genomic structure of the Mediterranean in comparison with other buffalo breeds is important to link it with its phenotypic characteristics. Buffalo types are rivers, of which MED is part, and swamps. Here we studied the MED variability, in comparison with other river and swamp breeds using short (Illumina) and long (Oxford nanopore—ONT) reads of WGS data. The UOA_WB_1 reference sequence, supplemented with Yak’s Y chromosome, was used. About the former, 20 MED together with 5 “river” and 7 “swamp” males were analyzed to investigate small variants, in particular SNPs. BWA was used for the alignment, and BCFTOOLS for variant calling and filtering. Variant annotation was performed using snpEff. About the latter, eight male MEDs were analyzed. Minimap2 was used to perform the alignment, and Clair3, GATK, and GLnexus for the variant calling (SNPs and small InDels). Using the short-read data, almost 35 million biallelic SNPs were identified after quality control. Almost 15 million, more than 12 million, and less than 10 million were monomorphic for reference alleles in MED, river, and swamp, respectively. The mean alternative allele frequency (BAF) was 0.18, 0.23, and 0.35 for MED, river, and swamp, respectively. The mean missingness was 0.08, 0.01, and 0.02 for MED, river, and swamp, respectively. The variants identified using ONT data after quality control were almost 18 million, divided into SNPs (almost 16 millions) and small InDels (almost 2 millions). Annotation analyses identified variants with “HIGH” and “MODERATE/Loss of Function” impact, having a potential functional effect on the protein. This preliminary analysis was fundamental for the improvement of the existing buffalo SNP array. Indeed, almost 50 k SNPs were selected to fill the available spot in a new version of the array after existing variant filtering.
Genomic Structure of Mediterranean Buffalo Breed: A Variant Analysis for a New Dedicated SNP Array / Milanesi, M.; Pietrucci, D.; Picarone, L.; Vignali, G.; Cosenza, G.; Gomez, M.; Chillemi, G. - In: BIOLOGY AND LIFE SCIENCES FORUM. - ISSN 2673-9976. - 33:1(2024). [10.3390/blsf2024033001]
Genomic Structure of Mediterranean Buffalo Breed: A Variant Analysis for a New Dedicated SNP Array
Cosenza G.;
2024
Abstract
In Italy, the buffalo Mediterranean (MED) breed is reared to mainly produce the famous “Mozzarella di Bufala Campana” PDO. A better knowledge of the buffalo genomic structure of the Mediterranean in comparison with other buffalo breeds is important to link it with its phenotypic characteristics. Buffalo types are rivers, of which MED is part, and swamps. Here we studied the MED variability, in comparison with other river and swamp breeds using short (Illumina) and long (Oxford nanopore—ONT) reads of WGS data. The UOA_WB_1 reference sequence, supplemented with Yak’s Y chromosome, was used. About the former, 20 MED together with 5 “river” and 7 “swamp” males were analyzed to investigate small variants, in particular SNPs. BWA was used for the alignment, and BCFTOOLS for variant calling and filtering. Variant annotation was performed using snpEff. About the latter, eight male MEDs were analyzed. Minimap2 was used to perform the alignment, and Clair3, GATK, and GLnexus for the variant calling (SNPs and small InDels). Using the short-read data, almost 35 million biallelic SNPs were identified after quality control. Almost 15 million, more than 12 million, and less than 10 million were monomorphic for reference alleles in MED, river, and swamp, respectively. The mean alternative allele frequency (BAF) was 0.18, 0.23, and 0.35 for MED, river, and swamp, respectively. The mean missingness was 0.08, 0.01, and 0.02 for MED, river, and swamp, respectively. The variants identified using ONT data after quality control were almost 18 million, divided into SNPs (almost 16 millions) and small InDels (almost 2 millions). Annotation analyses identified variants with “HIGH” and “MODERATE/Loss of Function” impact, having a potential functional effect on the protein. This preliminary analysis was fundamental for the improvement of the existing buffalo SNP array. Indeed, almost 50 k SNPs were selected to fill the available spot in a new version of the array after existing variant filtering.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.