Browsed by
Category: Bioinformatics

eggNOG7 annotator script

eggNOG7 annotator script

For our metagenomics pipeline I wrote a quick and dirty annotator script that can be found here and which might be useful for one or another.

https://github.com/fischuu/eggnog7_annotator

The purpose of the tool is that it takes protein sequences from gene predictions and annotates (in the sense of intersects) them with known proteins, their functions and taxonomic origin. That way, the tool can be used to shed a bit of light on plain protein sequences.

The current eggNOG annotator uses only the database version 5 and was not available for version 6. Now eggNOG 7 was pulbished and we thought within out MetaG pipeline group that it would be good to have a db7 annotator, so we added it to the pipeline and have from that also this stand-alone version linked here.

Understanding Whitefish in the Northern Baltic Sea

Understanding Whitefish in the Northern Baltic Sea

A new article I contributed to has been published in Fishery Research.

The study focuses on European whitefish in the northern Baltic Sea, which exist as two types, or ecotypes: one that migrates between rivers and the sea (anadromous) and one that spawns entirely in the sea. These fish are caught together in fisheries, but the river-migrating type is endangered, while the sea-spawning type is doing better.

We looked at ways to tell the two types apart, using both physical traits and genetic data. This isn’t always straightforward — some fish show mixed traits, likely due to past stocking, habitat changes, or variable migration patterns. We also studied how fishing depth, season, and net size affect which type of fish is caught. The results show that fishing practices can strongly influence catch composition, highlighting that adjusting fishing rules by location and timing could help protect the vulnerable whitefish population.

The full article is available on ScienceDirect:
https://www.sciencedirect.com/science/article/pii/S0165783626000019

New R-package started

New R-package started

I just started a new R package called ‘SnakebiteTools’. I would like to collect there small helper functions to better analyse and monitor Snakemake runs. The output can later be used for resource optimization, checking the status of an ongoing Snakemake run (which might be messy for runs with plenty of jobs) etc.

Creating Drop-Down Menus in Excel

Creating Drop-Down Menus in Excel

I often do not remember how to create simple drop down menus in Excel and so I decided to write a short note here. The thing I want to have:

  1. In one tab, I want to have a column with possible values for my drop down menu, e.g my project names
  2. In another tab, I want to have in each button of a column a drop down button that allows me to chose from these values.

This is in principle rather easy to achieve:

Step 1: Prepare the List on Another Sheet

  1. Open your Excel file and go to the sheet where you want to store the drop-down values (e.g., Projects).
  2. Enter the list of values in a column (e.g., A1:A10 in Projects).

Step 2: Name the List (Optional but Recommended)

  1. Select the range of values in Projects (e.g., A1:A10 or the whole column).
  2. Click on the Formula tab → Define Name.
  3. Enter a name (e.g., MyProjects) and click OK.

Step 3: Create the Drop-Down List

  1. Go to the sheet where you want the drop-down (e.g., Tasks).
  2. Select the cell(s) where you want the drop-down.
  3. Click on the Data tab → Data Validation.
  4. In the Allow box, choose List.
  5. In the Source box:
    • If you named the range: enter =MyProjects
    • If not: enter e.g.  =Projects!A1:A10
  6. Click OK.

In case you have in the first line a header (e.g. with column names) you want to remove this line from the drop down options. You can do that like this:

Method 1: Remove Data Validation from One Cell (Header Only)

  1. Click on the first cell of the column (e.g., A1).
  2. Go to the Data tab → Click Data Validation.
  3. In the pop-up, click Clear All → Click OK.
Finding Files in My Folders

Finding Files in My Folders

Managing disk space efficiently is essential, especially when working on systems with strict file quotas. Recently, I encountered a situation where I had exceeded my file limit and needed a quick way to determine which folders contained the most files. To analyze my storage usage, I used the following command:

for d in .* *; do [ -d "$d" ] && echo "$d: $(find "$d" -type f | wc -l)"; done | sort -nr -k2

Breaking Down the Command

This one-liner efficiently counts files across all directories in the current location, including hidden ones. Here’s how it works:

  • for d in .* * – Loops through all files and directories, including hidden ones.
  • [ -d "$d" ] – Ensures that only directories are processed.
  • find "$d" -type f | wc -l – Counts all files (not directories) inside each folder, including subdirectories.
  • sort -nr -k2 – Sorts the results in descending order based on the number of files.

Why This is Useful

With this command, I quickly identified the directories consuming the most inodes and was able to take action, such as cleaning up unnecessary files. It’s an efficient method for understanding file distribution and managing storage limits effectively.

Alternative Approaches

If you only want to count files directly inside each folder (without subdirectories), you can modify the command like this:

for d in .* *; do [ -d "$d" ] && echo "$d: $(find "$d" -maxdepth 1 -type f | wc -l)"; done | sort -nr -k2

This variation is useful when you need a more localized view of file distribution.