QBoard » Advanced Visualizations » Viz - Tableau » How can I transfer to a "tableau" format in R?

How can I transfer to a "tableau" format in R?

  • Is there an easy way to transfer to a tableau format(only keeps the parent level once)? I understand that the target format is not good in data science. But it makes the report easy to read. I can easily write it to excel and send the report to my boss..

    library(data.table)
    (dt <- data.table(Parent_Product=c("A","A","A","B","B","B"),
                     Sub_Product=c("red","red","blue","yellow","pink","pink"),
                     Sub_Product1=c(1,2,3,4,5,6),
                     Value=c(100,200,300,400,500,600)))
    
    #>    Parent_Product Sub_Product Sub_Product1 Value
    #> 1:              A         red            1   100
    #> 2:              A         red            2   200
    #> 3:              A        blue            3   300
    #> 4:              B      yellow            4   400
    #> 5:              B        pink            5   500
    #> 6:              B        pink            6   600
    
    (target_dt <- data.table(Parent_Product=c("A",NA,NA,"B",NA,NA),
                            Sub_Product=c("red",NA,"blue","yellow","pink",NA),
                            Sub_Product1=c(1,2,3,4,5,6),
                            Value=c(100,200,300,400,500,600)))
    
    #>    Parent_Product Sub_Product Sub_Product1 Value
    #> 1:              A         red            1   100
    #> 2:           <NA>        <NA>            2   200
    #> 3:           <NA>        blue            3   300
    #> 4:              B      yellow            4   400
    #> 5:           <NA>        pink            5   500
    #> 6:           <NA>        <NA>            6   600​
      September 3, 2021 1:42 PM IST
    0
  • This could probably be consolidated into fewer steps, but at least this is transparent what's going on at each step:

    library(data.table)
    dt <- data.table(Parent_Product=c("A","A","A","B","B","B"),
                      Sub_Product=c("red","red","blue","yellow","pink","pink"),
                      Sub_Product1=c(1,2,3,4,5,6),
                      Value=c(100,200,300,400,500,600))
    
    
    #create index 1:.N by Parent Product
    dt[, idx_Parent := 1:.N, by = .(Parent_Product)]
    #ditto for combo of parent product and sub product
    dt[, idx_Sub := 1:.N, by = .(Parent_Product, Sub_Product)]
    #Update idx > 1 to NA
    dt[idx_Parent > 1, Parent_Product := NA]
    dt[idx_Sub > 1, Sub_Product := NA]
    #clean up idx
    dt[, `:=` (idx_Parent = NULL, idx_Sub = NULL)]
    #report values
    dt
    #>    Parent_Product Sub_Product Sub_Product1 Value
    #> 1:              A         red            1   100
    #> 2:           <NA>        <NA>            2   200
    #> 3:           <NA>        blue            3   300
    #> 4:              B      yellow            4   400
    #> 5:           <NA>        pink            5   500
    #> 6:           <NA>        <NA>            6   600
      December 23, 2021 1:56 PM IST
    0
  • This could probably be consolidated into fewer steps, but at least this is transparent what's going on at each step:

    library(data.table)
    dt <- data.table(Parent_Product=c("A","A","A","B","B","B"),
                      Sub_Product=c("red","red","blue","yellow","pink","pink"),
                      Sub_Product1=c(1,2,3,4,5,6),
                      Value=c(100,200,300,400,500,600))
    
    
    #create index 1:.N by Parent Product
    dt[, idx_Parent := 1:.N, by = .(Parent_Product)]
    #ditto for combo of parent product and sub product
    dt[, idx_Sub := 1:.N, by = .(Parent_Product, Sub_Product)]
    #Update idx > 1 to NA
    dt[idx_Parent > 1, Parent_Product := NA]
    dt[idx_Sub > 1, Sub_Product := NA]
    #clean up idx
    dt[, `:=` (idx_Parent = NULL, idx_Sub = NULL)]
    #report values
    dt
    #>    Parent_Product Sub_Product Sub_Product1 Value
    #> 1:              A         red            1   100
    #> 2:           <NA>        <NA>            2   200
    #> 3:           <NA>        blue            3   300
    #> 4:              B      yellow            4   400
    #> 5:           <NA>        pink            5   500
    #> 6:           <NA>        <NA>            6   600​

      September 4, 2021 12:32 PM IST
    0
  • Here is a combination of alistaire's and Frank's suggestions:

    dt[, lapply(.SD, function(x) {x[rowid(rleid(x)) > 1] <- NA; x})]
    ​


     Parent_Product Sub_Product Sub_Product1 Value
    1:              A         red            1   100
    2:           <NA>        <NA>            2   200
    3:           <NA>        blue            3   300
    4:              B      yellow            4   400
    5:           <NA>        pink            5   500
    6:           <NA>        <NA>            6   600
    7:              C         red            1   100
    8:           <NA>        <NA>            2    NA​


    Note that the second streak of "red" values in column Sub_Product is handled correctly (as opposed to duplicated()).

    To make the visual effect much more standing out we can use "" instead of NA:

    dt[, lapply(.SD, function(x) {x[rowid(rleid(x)) > 1] <- ""; x})]
    

     

    Parent_Product Sub_Product Sub_Product1 Value
    1:              A         red            1   100
    2:                                       2   200
    3:                       blue            3   300
    4:              B      yellow            4   400
    5:                       pink            5   500
    6:                                       6   600
    7:              C         red            1   100
    8:                                       2


    Unfortunately, this turns any repeating values in a streak of successive values to NA (or "", resp.) in all columns. For production use we probably want to control which columns should be included, e.g., only the second column

    cols <- "Sub_Product"   # apply to second column for demonstration
    dt[, (cols) := lapply(.SD, function(x) {x[rowid(rleid(x)) > 1] <- ""; x}), .SDcols = cols][]

     

     Parent_Product Sub_Product Sub_Product1 Value
    1:              A         red            1   100
    2:              A                        2   200
    3:              A        blue            3   300
    4:              B      yellow            4   400
    5:              B        pink            5   500
    6:              B                        6   600
    7:              C         red            1   100
    8:              C                        2   100


    Data

    OP's data have been extended to include a Parent_Product "C" with a second streak of "red" values in Sub_Product:

    library(data.table)
    dt <- fread("
    Parent_Product Sub_Product Sub_Product1 Value
                 A         red            1   100
                 A         red            2   200
                 A        blue            3   300
                 B      yellow            4   400
                 B        pink            5   500
                 B        pink            6   600
                 C         red            1   100
                 C         red            2   100
    ")



      September 13, 2021 1:37 PM IST
    0