Browsed by
Category: Bioinformatics

New Bovine Genome Comparison

New Bovine Genome Comparison

Since 2014 the standard for genome studies in bovine was the UMD 3.1 genome, e.g. for download here:

 

However, a few days ago a new assembly was release, called ARS_UCD1.2.This assembly can be downloaded here:

ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/263/795/GCA_002263795.2_ARS-UCD1.2/

Just to get a quick impression, I compared both assemblies and checked for their similarities using LAST http://last.cbrc.jp/ .

First, I created a LAST database like this

lastdb -P0 -uNEAR -R01 $FOLDER/UMD31/UMD31-NEAR $FOLDER/UMD31/UMD3.1_chromosomes.fa

Then, I determined the substitution and gap frequencies

last-train -P0 --revsym --matsym --gapsym -E0.05 -C2 $FOLDER/UMD31/UMD31-NEAR $FOLDER/ARS/ARS_UCD12.fna > $FOLDER/UMD-ARS.mat

After the training, the blasting was performed (here is the parallel part of the slurm script)

FOLDER="/wrk/daniel/References/";
chr=($(ls $FOLDER/ARS/chr*));

lastal -m50 -E0.05 -C2 -p $FOLDER/UMD-ARS.mat $FOLDER/UMD31/UMD31-NEAR ${chr[$SLURM_ARRAY_TASK_ID]} | last-split -m1 > UMD-ARS-$SLURM_ARRAY_TASK_ID.maf

As I ran the blasting parallel for each chromosome, the header of the files needed to be removed

cat *.maf > all.maf
sed '/^#/ d' < all.maf > temp.maf
head -n 22 all.maf > headerLines
cat headerLines temp.maf > alignments.maf

Finally, the merged simple-sequence alignments were discarded, the alignments were converted to tabular format, and alignments with error probability > 10^-5 were discarded:

last-postmask alignments.maf |
maf-convert -n tab |
awk -F'=' '$2 <= 1e-5' > alignments.tab

And for that tab file was then the dotplot created

last-dotplot -x 4000 -y 4000 alignment.tab alignment.png

This is how the dotplot looks like, it seems pretty much the same genome, but has in some areas clearly changed it! (Open it and zoom to the diagonal to see the differences)

 

For the steps, I followed the tutorial here: https://github.com/mcfrith/last-genome-alignments