I just came back from a very interesting FAANG workshop in Prague, three days of talks about the functional annotation of animal genomes. Thanks a lot to the organizers!
More information about the even can be found here.
After the event we still had some time to discover the city.
Today I noticed for the first time the concept of submodules in git. While cloning a repository from GitHub I noticed that one folder in it remained empty. After having a closer look, I noticed a reference to another repository tree like this:
Here, the folder htslib is actually from a tree in a different repository. After I cloned the repository like this (I forked it before):
git clone https://github.com/fischuu/SE-MEI.git
the folder htslib remained empty. That is because files from submodules are not fetched by default. This needs to be done separately by first initializing the submodules (first, cd into the cloned repository)
git submodule init
and then update the files from it
git submodule update
After these steps, the repository should be complete. However, instead of initializing the submodule separately, there is also a shortcut to fetch them all in one step by adding an additional parameter to the cloning like this:
Today I wrote a bash script that creates a random subset of a paired-end FASTQ file pair. It requires the names of the two FASTQ-files as input and also the amount of reads that the sample should have.
The script is mainly based on this Blog post. This is a rather rough code and it could be more user-friendly and allow for more options, but in its current form, it does what I need it to do.
#!/bin/bash
round() {
printf "%.2f" "$1"
}
file1=$1
file2=$2
sample=$3
# Input test
if ! [[ $sample =~ ^-?[0-9]+([.][0-9]+)?$ ]]; then
>&2 echo "$sample is not a number"; exit 1;
fi
extension1="${file1##*.}"
extension2="${file2##*.}"
filename1="${file1%.*}"
filename2="${file2%.*}"
fn1=$filename1"_"$sample".fastq"
fn2=$filename2"_"$sample".fastq"
if [ $extension1 == "gz" ]; then
gunzip $file1;
file1=$filename1;
filename1="${file1%.*}"
fn1=$filename1"_"$sample".fastq"
fi
if [ $extension2 == "gz" ]; then
gunzip $file2;
file2=$filename2;
filename2="${file2%.*}"
fn2=$filename2"_"$sample".fastq"
fi
lines=$(wc -l < $file1)
echo $lines
echo $sample
if (( $(awk 'BEGIN {print ("'$sample'" <= 1)}') )); then
sample=$(awk 'BEGIN {printf("%.0f", "'$sample'" * "'$lines'")}')
fi
echo $sample
paste $file1 $file1 | \
awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t");} }' | \
awk -v k=$sample 'BEGIN{srand(systime() + PROCINFO["pid"]); }{ s=x++<k?x- 1:int(rand()*x);
if(s<k)R[s]=$0}END{for(i in R)print R[i]}' | \
awk -F"\t" -v file1=$fn1 -v file2=$fn2 '{print $1"\n"$3"\n"$5"\n"$7 > file1;\
print $2"\n"$4"\n"$6"\n"$8 > file2}'
if [ $extension1 == "gz" ]; then
gzip $fn1;
gzip $file1;
fi
if [ $extension2 == "gz" ]; then
gzip $fn2;
gzip $file2;
fi
Here, the values for <UUID> and <filesytem> we get from the blkid command, the mount point is ‘free choice’ and as option, I choose e.g. errors=remount-ro
Once the fstab is populated like this, just try to mount the disc by typing
Sometimes it happens that we have running a whole bunch of slurm jobs from different projects, some of them are running already for days, while others are just fired – and then we noticed, damn, the 100 jobs that I just fired are wrong and they need to be canceled. Unfortunately, there is no slurm command that can do that, it requires some kind of scripting to do that.
The following script takes as an input a slurm job ID and cancels all jobs larger than that (that belong to the logged in user…).
#!/bin/bash
declare -a jobs=()
if [ -z "$1" ] ; then
echo "Minimum Job Number argument is required. Run as '$0 jobnum'"
exit 1
fi
minjobnum="$1"
myself="$(id -u -n)"
for j in $(squeue --user="$myself" --noheader --format='%i') ; do
if [ "$j" -gt "$minjobnum" ] ; then
jobs+=($j)
fi
done
scancel "${jobs[@]}"
If you store this e.g as killLarger.sh in your PATH somewhere, you can just use it from anywhere and cancel slurm jobs that are larger than this ID.