How can I transfer to a "tableau" format in R?

QBoard » Advanced Visualizations » Viz - Tableau » How can I transfer to a "tableau" format in R?

User Dashboard

How can I transfer to a "tableau" format in R?

Back To Topics

Tags : R dplyr data.table tidyverse

Vaibhav Mali

259

Is there an easy way to transfer to a tableau format(only keeps the parent level once)? I understand that the target format is not good in data science. But it makes the report easy to read. I can easily write it to excel and send the report to my boss..

library(data.table)
(dt <- data.table(Parent_Product=c("A","A","A","B","B","B"),
                 Sub_Product=c("red","red","blue","yellow","pink","pink"),
                 Sub_Product1=c(1,2,3,4,5,6),
                 Value=c(100,200,300,400,500,600)))

#>    Parent_Product Sub_Product Sub_Product1 Value
#> 1:              A         red            1   100
#> 2:              A         red            2   200
#> 3:              A        blue            3   300
#> 4:              B      yellow            4   400
#> 5:              B        pink            5   500
#> 6:              B        pink            6   600

(target_dt <- data.table(Parent_Product=c("A",NA,NA,"B",NA,NA),
                        Sub_Product=c("red",NA,"blue","yellow","pink",NA),
                        Sub_Product1=c(1,2,3,4,5,6),
                        Value=c(100,200,300,400,500,600)))

#>    Parent_Product Sub_Product Sub_Product1 Value
#> 1:              A         red            1   100
#> 2:           <NA>        <NA>            2   200
#> 3:           <NA>        blue            3   300
#> 4:              B      yellow            4   400
#> 5:           <NA>        pink            5   500
#> 6:           <NA>        <NA>            6   600

September 3, 2021 1:42 PM IST

Maryam Bains

317

This could probably be consolidated into fewer steps, but at least this is transparent what's going on at each step:

library(data.table)
dt <- data.table(Parent_Product=c("A","A","A","B","B","B"),
                  Sub_Product=c("red","red","blue","yellow","pink","pink"),
                  Sub_Product1=c(1,2,3,4,5,6),
                  Value=c(100,200,300,400,500,600))


#create index 1:.N by Parent Product
dt[, idx_Parent := 1:.N, by = .(Parent_Product)]
#ditto for combo of parent product and sub product
dt[, idx_Sub := 1:.N, by = .(Parent_Product, Sub_Product)]
#Update idx > 1 to NA
dt[idx_Parent > 1, Parent_Product := NA]
dt[idx_Sub > 1, Sub_Product := NA]
#clean up idx
dt[, `:=` (idx_Parent = NULL, idx_Sub = NULL)]
#report values
dt
#>    Parent_Product Sub_Product Sub_Product1 Value
#> 1:              A         red            1   100
#> 2:           <NA>        <NA>            2   200
#> 3:           <NA>        blue            3   300
#> 4:              B      yellow            4   400
#> 5:           <NA>        pink            5   500
#> 6:           <NA>        <NA>            6   600

December 23, 2021 1:56 PM IST

Samar Patil

346 3

This could probably be consolidated into fewer steps, but at least this is transparent what's going on at each step:

library(data.table)
dt <- data.table(Parent_Product=c("A","A","A","B","B","B"),
                  Sub_Product=c("red","red","blue","yellow","pink","pink"),
                  Sub_Product1=c(1,2,3,4,5,6),
                  Value=c(100,200,300,400,500,600))


#create index 1:.N by Parent Product
dt[, idx_Parent := 1:.N, by = .(Parent_Product)]
#ditto for combo of parent product and sub product
dt[, idx_Sub := 1:.N, by = .(Parent_Product, Sub_Product)]
#Update idx > 1 to NA
dt[idx_Parent > 1, Parent_Product := NA]
dt[idx_Sub > 1, Sub_Product := NA]
#clean up idx
dt[, `:=` (idx_Parent = NULL, idx_Sub = NULL)]
#report values
dt
#>    Parent_Product Sub_Product Sub_Product1 Value
#> 1:              A         red            1   100
#> 2:           <NA>        <NA>            2   200
#> 3:           <NA>        blue            3   300
#> 4:              B      yellow            4   400
#> 5:           <NA>        pink            5   500
#> 6:           <NA>        <NA>            6   600

September 4, 2021 12:32 PM IST

Viaan Prakash

461

Here is a combination of alistaire's and Frank's suggestions:

dt[, lapply(.SD, function(x) {x[rowid(rleid(x)) > 1] <- NA; x})]

 Parent_Product Sub_Product Sub_Product1 Value
1:              A         red            1   100
2:           <NA>        <NA>            2   200
3:           <NA>        blue            3   300
4:              B      yellow            4   400
5:           <NA>        pink            5   500
6:           <NA>        <NA>            6   600
7:              C         red            1   100
8:           <NA>        <NA>            2    NA

Note that the second streak of "red" values in column Sub_Product is handled correctly (as opposed to duplicated()).

To make the visual effect much more standing out we can use "" instead of NA:

dt[, lapply(.SD, function(x) {x[rowid(rleid(x)) > 1] <- ""; x})]

Parent_Product Sub_Product Sub_Product1 Value
1:              A         red            1   100
2:                                       2   200
3:                       blue            3   300
4:              B      yellow            4   400
5:                       pink            5   500
6:                                       6   600
7:              C         red            1   100
8:                                       2

Unfortunately, this turns any repeating values in a streak of successive values to NA (or "", resp.) in all columns. For production use we probably want to control which columns should be included, e.g., only the second column

cols <- "Sub_Product"   # apply to second column for demonstration
dt[, (cols) := lapply(.SD, function(x) {x[rowid(rleid(x)) > 1] <- ""; x}), .SDcols = cols][]

 Parent_Product Sub_Product Sub_Product1 Value
1:              A         red            1   100
2:              A                        2   200
3:              A        blue            3   300
4:              B      yellow            4   400
5:              B        pink            5   500
6:              B                        6   600
7:              C         red            1   100
8:              C                        2   100

Data

OP's data have been extended to include a Parent_Product "C" with a second streak of "red" values in Sub_Product:

library(data.table)
dt <- fread("
Parent_Product Sub_Product Sub_Product1 Value
             A         red            1   100
             A         red            2   200
             A        blue            3   300
             B      yellow            4   400
             B        pink            5   500
             B        pink            6   600
             C         red            1   100
             C         red            2   100
")

September 13, 2021 1:37 PM IST

Member Sign In

Member Sign In

Create Account

How can I transfer to a "tableau" format in R?

Connect With Us