Creating weighted tables with R / sum of numerics associated to some categorical variable

Creating weighted tables with R / sum of numerics associated to some categorical variable

The normal table command table() calculates the frequency of each element of a vector like this:

R> df <- data.frame(var = c("A", "A", "B", "B", "C", "C", "C"))
R> table(df)
df
A B C 
2 2 3 

So, it tells us, we have two times A and B and three times C, accordingly.

However, if we have now the situation like this:

df <- data.frame(var = c("A", "A", "B", "B", "C", "C", "C"), value = c(10, 20, 20, 40, 15, 25, 35))

Meaning, we have a categorical variable var and a numeric variable value and for each categorical variable we would like to get the sum over the numerical variable, we can simply use the base-R command aggregate like this

R> aggregate(value ~ var, data = df, FUN = sum)
  var wt
1   A 40
2   B 60
3   C 70

As I often use also the data.table package, here is also a simple solution using this package, assuming we do (respective have a data table from some other source, like fread)

library("data.table")
dt <- data.table(df)

Then we can just sume over a column name with respect to another column like this (and assign the value into a new variable tot) :

setDT(dt)[, .(n = sum(value)), var] 

Leave a Reply

%d bloggers like this: