QBoard » Artificial Intelligence & ML » AI and ML - R » R error using package tm (text-mining)

R error using package tm (text-mining)

  •  

    I am attempting to use the tm package to convert a vector of text strings to a corpus element.

    My code looks something like this

    Corpus(d1$Yes)

    where d1$Yes is a factor with 124 levels, each containing a text string.

    For example, d1$Yes[246] = "So we can get the boat out!"

    I'm receiving the following error: "Error: inherits(x, "Source") is not TRUE"

    I'm not sure how to remedy this.

      June 11, 2019 4:46 PM IST
    0
  • You have to tell Corpus what kind of source you are using. Try:

    Corpus(VectorSource(d1$Yes))
      June 11, 2019 4:50 PM IST
    0
  • I met the same problem when I updated the tm package to 0.7-2 version. I looked for details of DataframeSource(), it mentioned:

    The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text".

    Details

    A data frame source interprets each row of the data frame x as a document. The first column must be named "doc_id" and contain a unique string identifier for each document. The second column must be named "text" and contain a "UTF-8" encoded string representing the document's content. Optional additional columns are used as document level metadata.

    I solved it with the following code:

    df_cmp<- read.csv("test_file.csv",stringsAsFactors = F)
    
    df_title <- data.frame(doc_id=row.names(df_cmp),
                           text=df_cmp$English.title)​

    You can try and change the column names to doc_id and text.

      August 27, 2021 1:02 PM IST
    0
  • For the case you have created a corpus via manipulating other objects in R, thus do not have the texts already
    stored on a hard disk, and want to save the text documents to disk, you can simply use writeCorpus()

    > writeCorpus(ovid)

    which writes a character representation of the documents in a corpus to multiple files on disk
      August 14, 2021 1:17 PM IST
    0