Creating weighted tables with R / sum of numerics associated to some categorical variable
The normal table command table()
calculates the frequency of each element of a vector like this:
R> df <- data.frame(var = c("A", "A", "B", "B", "C", "C", "C"))
R> table(df)
df
A B C
2 2 3
So, it tells us, we have two times A and B and three times C, accordingly.
However, if we have now the situation like this:
df <- data.frame(var = c("A", "A", "B", "B", "C", "C", "C"), value = c(10, 20, 20, 40, 15, 25, 35))
Meaning, we have a categorical variable var
and a numeric variable value
and for each categorical variable we would like to get the sum over the numerical variable, we can simply use the base-R command aggregate
like this
R> aggregate(value ~ var, data = df, FUN = sum)
var wt
1 A 40
2 B 60
3 C 70
As I often use also the data.table
package, here is also a simple solution using this package, assuming we do (respective have a data table from some other source, like fread
)
library("data.table")
dt <- data.table(df)
Then we can just sume over a column name with respect to another column like this (and assign the value into a new variable tot
) :
setDT(dt)[, .(n = sum(value)), var]