QBoard » Artificial Intelligence & ML » AI and ML - R » Split data frame string column into multiple columns

Split data frame string column into multiple columns

  • I'd like to take data of the form
    before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2'))
      attr          type
    1    1   foo_and_bar
    2   30 foo_and_bar_2
    3    4   foo_and_bar
    4    6 foo_and_bar_2
    

    and use split() on the column "type" from above to get something like this:

    attr type_1 type_2
    1    1    foo    bar
    2   30    foo  bar_2
    3    4    foo    bar
    4    6    foo  bar_2

    I came up with something unbelievably complex involving some form of apply that worked, but I've since misplaced that. It seemed far too complicated to be the best way. I can use strsplit as below, but then unclear how to get that back into 2 columns in the data frame.

    > strsplit(as.character(before$type),'_and_')
    [[1]]
    [1] "foo" "bar"
    
    [[2]]
    [1] "foo"   "bar_2"
    
    [[3]]
    [1] "foo" "bar"
    
    [[4]]
    [1] "foo"   "bar_2"

    Thanks for any pointers. I've not quite groked R lists just yet.

     

      December 8, 2020 11:30 AM IST
    0
  • Another option is to use the new tidyr package.
    library(dplyr)
    library(tidyr)
    
    before <- data.frame(
      attr = c(1, 30 ,4 ,6 ), 
      type = c('foo_and_bar', 'foo_and_bar_2')
    )
    
    before %>%
      separate(type, c("foo", "bar"), "_and_")
    
    ##   attr foo   bar
    ## 1    1 foo   bar
    ## 2   30 foo bar_2
    ## 3    4 foo   bar
    ## 4    6 foo bar_2
      December 8, 2020 11:33 AM IST
    0
  • Yet another approach: use rbind on out:
    before <- data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2'))  
    out <- strsplit(as.character(before$type),'_and_') 
    do.call(rbind, out)
    
         [,1]  [,2]   
    [1,] "foo" "bar"  
    [2,] "foo" "bar_2"
    [3,] "foo" "bar"  
    [4,] "foo" "bar_2"​

    And to combine:

    data.frame(before$attr, do.call(rbind, out))
      December 16, 2020 1:02 PM IST
    0
  • To add to the options, you could also use my splitstackshape::cSplit function like this:
    library(splitstackshape)
    cSplit(before, "type", "_and_")
    #    attr type_1 type_2
    # 1:    1    foo    bar
    # 2:   30    foo  bar_2
    # 3:    4    foo    bar
    # 4:    6    foo  bar_2
      December 16, 2020 1:17 PM IST
    0
  • base but probably slow:

    n <- 1
    for(i in strsplit(as.character(before$type),'_and_')){
         before[n, 'type_1'] <- i[[1]]
         before[n, 'type_2'] <- i[[2]]
         n <- n + 1
    }
    
    ##   attr          type type_1 type_2
    ## 1    1   foo_and_bar    foo    bar
    ## 2   30 foo_and_bar_2    foo  bar_2
    ## 3    4   foo_and_bar    foo    bar
    ## 4    6 foo_and_bar_2    foo  bar_2
      December 9, 2021 12:43 PM IST
    0
  • This question is pretty old but I'll add the solution I found the be the simplest at present.

    library(reshape2)
    before = data.frame(attr = c(1,30,4,6), type=c('foo_and_bar','foo_and_bar_2'))
    newColNames <- c("type1", "type2")
    newCols <- colsplit(before$type, "_and_", newColNames)
    after <- cbind(before, newCols)
    after$type <- NULL
    after
      December 15, 2021 12:31 PM IST
    0