Finally on time

Finally on time

Winter is coming and winter tires are even coming faster!! Normally, I’m part of the crowd who rushes to the workshop in first snow, now I’m part of the (surprisingly small) crowd of people who rush on night frost already…

Holiday read arrived

Holiday read arrived

With the upcoming autumn holiday ahead, my autumn holiday read just arrived in time! Looking forward to having a light read, as my Python become a little rusty lately it is a nice kick-start again to read about it from an introductory perspective. First glance looks really nice!

New updates

New updates

After quite some time I decided (once again) to start working on the updates for this webpage. For now, I added information regarding the different Snakemake pipelines I wrote for the most common bioinformatics use cases, have a look here: Pipelines

Lets see how long this flow will go, though.

Creating weighted tables with R / sum of numerics associated to some categorical variable

Creating weighted tables with R / sum of numerics associated to some categorical variable

The normal table command table() calculates the frequency of each element of a vector like this:

R> df <- data.frame(var = c("A", "A", "B", "B", "C", "C", "C"))
R> table(df)
df
A B C 
2 2 3 

So, it tells us, we have two times A and B and three times C, accordingly.

However, if we have now the situation like this:

df <- data.frame(var = c("A", "A", "B", "B", "C", "C", "C"), value = c(10, 20, 20, 40, 15, 25, 35))

Meaning, we have a categorical variable var and a numeric variable value and for each categorical variable we would like to get the sum over the numerical variable, we can simply use the base-R command aggregate like this

R> aggregate(value ~ var, data = df, FUN = sum)
  var wt
1   A 40
2   B 60
3   C 70

As I often use also the data.table package, here is also a simple solution using this package, assuming we do (respective have a data table from some other source, like fread)

library("data.table")
dt <- data.table(df)

Then we can just sume over a column name with respect to another column like this (and assign the value into a new variable tot) :

setDT(dt)[, .(n = sum(value)), var]