Skip to content Skip to sidebar Skip to footer

Wide To Long Data Table Transformation With Variables In Columns And Rows

I have a csv with multiple tables with variables stored in both rows and columns. About this csv: I'd want to go 'wide' to 'long' There are multiple 'data frames' in one csv There

Solution 1:

The sample data set provided by the OP suggests that all data frames within the csv file

  1. have the same structure, i.e., the same number, names, and positions of columns
  2. and the monthly columns V4to V8 refer to the same months 1 to 5 for all "sub frames".

If this is true then we can treat the whole csv file as one data frame and convert it to the desired format by reshaping using melt() and dcast() from the data.table package:

library(data.table)
setDT(df3)[, melt(.SD, id.vars = paste0("V", 1:3), na.rm = TRUE)][
  V3 != "month", dcast(.SD, V1 + V2 + rleid(variable) ~ forcats::fct_inorder(V3))][
    , setnames(.SD, 1:3, c("city", "address", "month"))]
cityaddressmonthxyzabc1:la63mainst1NANANA87035345866902:la63mainst2NANANA746745645674670003:la63mainst3NANANA3363  34530184:la63mainst4NANANA8636781855:la63mainst5NANANA43673345433286:nyc123mainst1585685345  53223NANANA7:nyc123mainst25675673673 563894NANANA8:nyc123mainst35679093453 564456NANANA9:nyc123mainst4358763467  32409NANANA10:nyc123mainst55694378856155NANANA11:sf953mainst1457456NA452111NA5345      NA12:sf953mainst23455   NA-218NA3673      NA13:sf953mainst3345345NA341892NA3453      NA14:sf953mainst456457NA52990NA3467      NA15:sf953mainst53634   NA2846      NA788NA

The fct_inorder() function from Hadley's forcats package is used here to order the columns by their first appearance instead of alphabetical order a, b, c, x, y, z.

Note that also the cities have been ordered alphabetically. If this is crcuial (but I doubt it is) the original order can be preserved as well by using

forcats::fct_inorder(V1) + V2 + rleid(variable) ~ forcats::fct_inorder(V3)

as dcast() formula.

Data

Unfortunately, the OP didn't supply the result of dput(df3) which made it unnecessarily difficult to reproduce the data set as printed in the question:

df3 <- readr::read_table(
  "     V1          V2    V3     V4      V5     V6      V7    V8
  1   nyc 123 main st month      1       2      3       4     5
  2   nyc 123 main st     x  58568  567567 567909   35876 56943
  3   nyc 123 main st     y   5345    3673   3453    3467   788
  4   nyc 123 main st     z  53223  563894 564456   32409 56155
  5                                                            
  6    la  63 main st month      1       2      3       4     5
  7    la  63 main st     a  87035 7467456   3363     863 43673
  8    la  63 main st     b    345     456    345     678   345
  9    la  63 main st     c  86690 7467000   3018     185 43328
  10                                                           
  11   sf 953 main st month      1       2      3       4     5
  12   sf 953 main st     x 457456    3455 345345   56457  3634
  13   sf 953 main st     b   5345    3673   3453    3467   788
  14   sf 953 main st     z 452111    -218 341892   52990  2846"
)
library(data.table)
setDT(df3)[, V2 := paste(X3, V2)][, c("X1", "X3") := NULL]
setDF(df3)[]
    V1          V2    V3     V4      V5     V6    V7    V8
1  nyc 123main st month      123452  nyc 123main st     x  5856856756756790935876569433  nyc 123main st     y   53453673345334677884  nyc 123main st     z  5322356389456445632409561555              NA            NA      NA     NA    NA    NA
6   la  63main st month      123457   la  63main st     a8703574674563363863436738   la  63main st     b3454563456783459   la  63main st     c  86690746700030181854332810             NA            NA      NA     NA    NA    NA
11  sf 953main st month      1234512  sf 953main st     x 457456345534534556457363413  sf 953main st     b534536733453346778814  sf 953main st     z 452111    -218341892529902846

Solution 2:

It would help first if you had proper column names for your df, please insert column names once you read in the data.

I have use the following libraries, dplyr and stringr for this analysis and also renamed the first 3 columns:

df <- data.frame(stringsAsFactors=FALSE,
        city = c("nyc", "nyc", "nyc"),
     address = c("123 main st", "123 main st", "123 main st"),
       month= c("x", "y", "z"),
          X1 = c(58568L, 5345L, 53223L),
          X2 = c(567567L, 3673L, 563894L),
          X3 = c(567909L, 3453L, 564456L),
          X4 = c(35876L, 3467L, 32409L),
          X5 = c(56943L, 788L, 56155L)
)

df %>% gather(Type, Value, -c(city:month)) %>% 
        spread(month, Value) %>%
        mutate(month= str_sub(Type, 2, 2)) %>%select(-Type) %>%select(c(city, address, month, x:z))

city     address month      x    y      z
1  nyc 123 main st     1585685345532232  nyc 123 main st     256756736735638943  nyc 123 main st     356790934535644564  nyc 123 main st     4358763467324095  nyc 123 main st     55694378856155

Post a Comment for "Wide To Long Data Table Transformation With Variables In Columns And Rows"