Wide To Long Data Table Transformation With Variables In Columns And Rows
Solution 1:
The sample data set provided by the OP suggests that all data frames within the csv file
- have the same structure, i.e., the same number, names, and positions of columns
- and the monthly columns
V4
toV8
refer to the same months 1 to 5 for all "sub frames".
If this is true then we can treat the whole csv file as one data frame and convert it to the desired format by reshaping using melt()
and dcast()
from the data.table
package:
library(data.table)
setDT(df3)[, melt(.SD, id.vars = paste0("V", 1:3), na.rm = TRUE)][
V3 != "month", dcast(.SD, V1 + V2 + rleid(variable) ~ forcats::fct_inorder(V3))][
, setnames(.SD, 1:3, c("city", "address", "month"))]
cityaddressmonthxyzabc1:la63mainst1NANANA87035345866902:la63mainst2NANANA746745645674670003:la63mainst3NANANA3363 34530184:la63mainst4NANANA8636781855:la63mainst5NANANA43673345433286:nyc123mainst1585685345 53223NANANA7:nyc123mainst25675673673 563894NANANA8:nyc123mainst35679093453 564456NANANA9:nyc123mainst4358763467 32409NANANA10:nyc123mainst55694378856155NANANA11:sf953mainst1457456NA452111NA5345 NA12:sf953mainst23455 NA-218NA3673 NA13:sf953mainst3345345NA341892NA3453 NA14:sf953mainst456457NA52990NA3467 NA15:sf953mainst53634 NA2846 NA788NA
The fct_inorder()
function from Hadley's forcats
package is used here to order the columns by their first appearance instead of alphabetical order a, b, c, x, y, z.
Note that also the cities have been ordered alphabetically. If this is crcuial (but I doubt it is) the original order can be preserved as well by using
forcats::fct_inorder(V1) + V2 + rleid(variable) ~ forcats::fct_inorder(V3)
as dcast()
formula.
Data
Unfortunately, the OP didn't supply the result of dput(df3)
which made it unnecessarily difficult to reproduce the data set as printed in the question:
df3 <- readr::read_table(
" V1 V2 V3 V4 V5 V6 V7 V8
1 nyc 123 main st month 1 2 3 4 5
2 nyc 123 main st x 58568 567567 567909 35876 56943
3 nyc 123 main st y 5345 3673 3453 3467 788
4 nyc 123 main st z 53223 563894 564456 32409 56155
5
6 la 63 main st month 1 2 3 4 5
7 la 63 main st a 87035 7467456 3363 863 43673
8 la 63 main st b 345 456 345 678 345
9 la 63 main st c 86690 7467000 3018 185 43328
10
11 sf 953 main st month 1 2 3 4 5
12 sf 953 main st x 457456 3455 345345 56457 3634
13 sf 953 main st b 5345 3673 3453 3467 788
14 sf 953 main st z 452111 -218 341892 52990 2846"
)
library(data.table)
setDT(df3)[, V2 := paste(X3, V2)][, c("X1", "X3") := NULL]
setDF(df3)[]
V1 V2 V3 V4 V5 V6 V7 V8
1 nyc 123main st month 123452 nyc 123main st x 5856856756756790935876569433 nyc 123main st y 53453673345334677884 nyc 123main st z 5322356389456445632409561555 NA NA NA NA NA NA
6 la 63main st month 123457 la 63main st a8703574674563363863436738 la 63main st b3454563456783459 la 63main st c 86690746700030181854332810 NA NA NA NA NA NA
11 sf 953main st month 1234512 sf 953main st x 457456345534534556457363413 sf 953main st b534536733453346778814 sf 953main st z 452111 -218341892529902846
Solution 2:
It would help first if you had proper column names for your df, please insert column names once you read in the data.
I have use the following libraries, dplyr
and stringr
for this analysis and also renamed the first 3 columns:
df <- data.frame(stringsAsFactors=FALSE,
city = c("nyc", "nyc", "nyc"),
address = c("123 main st", "123 main st", "123 main st"),
month= c("x", "y", "z"),
X1 = c(58568L, 5345L, 53223L),
X2 = c(567567L, 3673L, 563894L),
X3 = c(567909L, 3453L, 564456L),
X4 = c(35876L, 3467L, 32409L),
X5 = c(56943L, 788L, 56155L)
)
df %>% gather(Type, Value, -c(city:month)) %>%
spread(month, Value) %>%
mutate(month= str_sub(Type, 2, 2)) %>%select(-Type) %>%select(c(city, address, month, x:z))
city address month x y z
1 nyc 123 main st 1585685345532232 nyc 123 main st 256756736735638943 nyc 123 main st 356790934535644564 nyc 123 main st 4358763467324095 nyc 123 main st 55694378856155
Post a Comment for "Wide To Long Data Table Transformation With Variables In Columns And Rows"