Couple The Data In All Possible Combinations
Solution 1:
To return a character matrix with these combinations using base R, try
do.call(rbind, t(sapply(split(df, df$Id), function(i) t(combn(i$Value, 2)))))
[,1] [,2]
[1,] "a" "c"
[2,] "a" "h"
[3,] "c" "h"
[4,] "f" "a"
Each row is a desired combination.
To break this down a bit, split
splits the data.frame by Id into a list of two data.frames. Then sapply
is fed this list and the combn
function to find the pairwise combinations within these data.frames. The result from each data.frame (which is a matrix) is transposed to fit your desired structure using t
. Finally, this list of matrices is fed to do.call
which uses rbind
to return the final matrix.
Note: There is an assumption that the value column is character (not the pesky factor variable type). This is easily accomplished in the read.
family of functions, like read.csv
and read.table
by adding the as.is=TRUE argument to your read function (or the longer stringsAsFactors=FALSE). If the variable is already a factor, you can wrap the i$Value
statement near the end in as.character
: as.character(i$Value)
and it will run as desired.
Solution 2:
Using R
you could try:
library(purrr)
df %>%
split(.$Id) %>%
map(~ t(combn(.$Value, 2)))
Which gives:
#$`1`
# [,1] [,2]
#[1,] "a" "c"
#[2,] "a" "h"
#[3,] "c" "h"
#
#$`2`
# [,1] [,2]
#[1,] f a
#Levels: a c f h
Solution 3:
Just another way (possibly slightly faster as it exploits the fact that you're looking for all pairs, and avoids combn
and t
):
require(data.table)
dt[, .( c1 = rep(Value, (.N:1)-1L), c2 = rep(Value, (1:.N)-1L) ), by=Id]
# Id c1 c2
# 1: 1 a c
# 2: 1 a h
# 3: 1 c h
# 4: 2 f a
.N
contains the number of observations for each group.
where dt is:
dt = fread('Id Value
1 a
2 f
1 c
1 h
2 a')
Post a Comment for "Couple The Data In All Possible Combinations"