New Bovine Genome Comparison

Since 2014 the standard for genome studies in bovine was the UMD 3.1 genome, e.g. for download here:


However, a few days ago a new assembly was release, called ARS_UCD1.2.This assembly can be downloaded here:

Just to get a quick impression, I compared both assemblies and checked for their similarities using LAST .

First, I created a LAST database like this

lastdb -P0 -uNEAR -R01 $FOLDER/UMD31/UMD31-NEAR $FOLDER/UMD31/UMD3.1_chromosomes.fa

Then, I determined the substitution and gap frequencies

last-train -P0 --revsym --matsym --gapsym -E0.05 -C2 $FOLDER/UMD31/UMD31-NEAR $FOLDER/ARS/ARS_UCD12.fna > $FOLDER/UMD-ARS.mat

After the training, the blasting was performed (here is the parallel part of the slurm script)

chr=($(ls $FOLDER/ARS/chr*));

lastal -m50 -E0.05 -C2 -p $FOLDER/UMD-ARS.mat $FOLDER/UMD31/UMD31-NEAR ${chr[$SLURM_ARRAY_TASK_ID]} | last-split -m1 > UMD-ARS-$SLURM_ARRAY_TASK_ID.maf

As I ran the blasting parallel for each chromosome, the header of the files needed to be removed

cat *.maf > all.maf
sed '/^#/ d' < all.maf > temp.maf
head -n 22 all.maf > headerLines
cat headerLines temp.maf > alignments.maf

Finally, the merged simple-sequence alignments were discarded, the alignments were converted to tabular format, and alignments with error probability > 10^-5 were discarded:

last-postmask alignments.maf |
maf-convert -n tab |
awk -F'=' '$2 <= 1e-5' >

And for that tab file was then the dotplot created

last-dotplot -x 4000 -y 4000 alignment.png

This is how the dotplot looks like, it seems pretty much the same genome, but has in some areas clearly changed it! (Open it and zoom to the diagonal to see the differences)


For the steps, I followed the tutorial here:


Some spring attunements

Okay, we have still snow in Finland, but still, one can feel the Spring already looming behind the clouds. And in the spring there will also be again plenty of musicians here and there. I haven’t taken this video here, I ran into it on YouTube, and it is somewhat an attunement for spring!

Another update of bitools

During the last two weeks, I updated the bitools container twice. Two new tools were added to it:

1. velvet

An easy to apply de-novo assembler that I use for metagenome studies

2. Bandage

A tool to visualize the graphs that are provided from velvet

Great book on statistical inference

During the last days I noticed a really nice book that gives an updated respective a refreshment on statistical inference. It is ‘Computer Age Statistical Inference: Algorithms, Evidenve and Data Science’ by Bradley Efron and Trevor Hastie

On the webpage of Trevor Hastie is even a download link to the pdf of the book, I can highly recommend it!
New version of the bitools docker container

I updated the docker container that I maintain (bitools) to keep all the bioinformatics tools together that I recently use. Yesterday I added the tool FEELnc to it, a tool to detect lncRNAs from RNA-seq data.

UPDATE: Apparently there was an issue with the Forkmanages perl module in the docker container, I fixed that on 7.12.2017 and udated the docker image v0.1.6 on Docker Hub.