Highly recommended artist
Just discovered, but highly recommended (Sarah Lesch):
Just discovered, but highly recommended (Sarah Lesch):
Since 2014 the standard for genome studies in bovine was the UMD 3.1 genome, e.g. for download here:
However, a few days ago a new assembly was release, called ARS_UCD1.2.This assembly can be downloaded here:
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/002/263/795/GCA_002263795.2_ARS-UCD1.2/
Just to get a quick impression, I compared both assemblies and checked for their similarities using LAST http://last.cbrc.jp/ .
First, I created a LAST database like this
lastdb -P0 -uNEAR -R01 $FOLDER/UMD31/UMD31-NEAR $FOLDER/UMD31/UMD3.1_chromosomes.fa
Then, I determined the substitution and gap frequencies
last-train -P0 --revsym --matsym --gapsym -E0.05 -C2 $FOLDER/UMD31/UMD31-NEAR $FOLDER/ARS/ARS_UCD12.fna > $FOLDER/UMD-ARS.mat
After the training, the blasting was performed (here is the parallel part of the slurm script)
FOLDER="/wrk/daniel/References/";
chr=($(ls $FOLDER/ARS/chr*));
lastal -m50 -E0.05 -C2 -p $FOLDER/UMD-ARS.mat $FOLDER/UMD31/UMD31-NEAR ${chr[$SLURM_ARRAY_TASK_ID]} | last-split -m1 > UMD-ARS-$SLURM_ARRAY_TASK_ID.maf
As I ran the blasting parallel for each chromosome, the header of the files needed to be removed
cat *.maf > all.maf sed '/^#/ d' < all.maf > temp.maf head -n 22 all.maf > headerLines cat headerLines temp.maf > alignments.maf
Finally, the merged simple-sequence alignments were discarded, the alignments were converted to tabular format, and alignments with error probability > 10^-5 were discarded:
last-postmask alignments.maf | maf-convert -n tab | awk -F'=' '$2 <= 1e-5' > alignments.tab
And for that tab file was then the dotplot created
last-dotplot -x 4000 -y 4000 alignment.tab alignment.png
This is how the dotplot looks like, it seems pretty much the same genome, but has in some areas clearly changed it! (Open it and zoom to the diagonal to see the differences)
For the steps, I followed the tutorial here: https://github.com/mcfrith/last-genome-alignments
Okay, we have still snow in Finland, but still, one can feel the Spring already looming behind the clouds. And in the spring there will also be again plenty of musicians here and there. I haven’t taken this video here, I ran into it on YouTube, and it is somewhat an attunement for spring!
During the last two weeks, I updated the bitools container twice. Two new tools were added to it:
1. velvet
An easy to apply de-novo assembler that I use for metagenome studies
2. Bandage
A tool to visualize the graphs that are provided from velvet
During the last days I noticed a really nice book that gives an updated respective a refreshment on statistical inference. It is ‘Computer Age Statistical Inference: Algorithms, Evidenve and Data Science’ by Bradley Efron and Trevor Hastie
On the webpage of Trevor Hastie is even a download link to the pdf of the book, I can highly recommend it!