Since 2014 the standard for genome studies in bovine was the UMD 3.1 genome, e.g. for download here:
However, a few days ago a new assembly was release, called ARS_UCD1.2.This assembly can be downloaded here:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/263/795/GCA_002263795.2_ARS-UCD1.2/
Just to get a quick impression, I compared both assemblies and checked for their similarities using LAST http://last.cbrc.jp/ .
First, I created a LAST database like this
lastdb -P0 -uNEAR -R01 $FOLDER/UMD31/UMD31-NEAR $FOLDER/UMD31/UMD3.1_chromosomes.fa
Then, I determined the substitution and gap frequencies
last-train -P0 --revsym --matsym --gapsym -E0.05 -C2 $FOLDER/UMD31/UMD31-NEAR $FOLDER/ARS/ARS_UCD12.fna > $FOLDER/UMD-ARS.mat
After the training, the blasting was performed (here is the parallel part of the slurm script)
FOLDER="/wrk/daniel/References/";
chr=($(ls $FOLDER/ARS/chr*));
lastal -m50 -E0.05 -C2 -p $FOLDER/UMD-ARS.mat $FOLDER/UMD31/UMD31-NEAR ${chr[$SLURM_ARRAY_TASK_ID]} | last-split -m1 > UMD-ARS-$SLURM_ARRAY_TASK_ID.maf
As I ran the blasting parallel for each chromosome, the header of the files needed to be removed
cat *.maf > all.maf
sed '/^#/ d' < all.maf > temp.maf
head -n 22 all.maf > headerLines
cat headerLines temp.maf > alignments.maf
Finally, the merged simple-sequence alignments were discarded, the alignments were converted to tabular format, and alignments with error probability > 10^-5 were discarded:
last-postmask alignments.maf |
maf-convert -n tab |
awk -F'=' '$2 <= 1e-5' > alignments.tab
And for that tab file was then the dotplot created
last-dotplot -x 4000 -y 4000 alignment.tab alignment.png
This is how the dotplot looks like, it seems pretty much the same genome, but has in some areas clearly changed it! (Open it and zoom to the diagonal to see the differences)
For the steps, I followed the tutorial here: https://github.com/mcfrith/last-genome-alignments