I'd like to apply qdap's polarity function to a vector of documents, each of which could contain multiple sentences, and obtain the corresponding polarity for each document.... moreI'd like to apply qdap's polarity function to a vector of documents, each of which could contain multiple sentences, and obtain the corresponding polarity for each document. For example:library(qdap)polarity(DATA$state)$all$polarity# Results: -0.8165 -0.4082 0.0000 -0.8944 0.0000 0.0000 0.0000 -0.5774 0.0000 0.4082 0.0000Warning message:In polarity(DATA$state) : Some rows contain double punctuation. Suggested use of `sentSplit` function.This warning can't be ignored, as it seems to add the polarity scores of each sentence in the document. This can result in document-level polarity scores outside the bounds.I'm aware of the option to first run sentSplit and then average across the sentences, perhaps weighting polarity by word count, but this is (1) inefficient (takes roughly 4x as long as running on the full documents with the warning), and (2) unclear how to weight sentences. This option would look something like this:DATA$id <- seq(nrow(DATA)) # For identifying and aggregating documents... less
Having a lot of text documents (in natural language, unstructured), what are the possible ways of annotating them with some semantic meta-data? For example, consider a short... moreHaving a lot of text documents (in natural language, unstructured), what are the possible ways of annotating them with some semantic meta-data? For example, consider a short document:I saw the company's manager last day.To be able to extract information from it, it must be annotated with additional data to be less ambiguous. The process of finding such meta-data is not in question, so assume it is done manually. The question is how to store these data in a way that further analysis on it can be done more conveniently/efficiently?A possible approach is to use XML tags (see below), but it seems too verbose, and maybe there are better approaches/guidelines for storing such meta-data on text documents.I saw the company'smanager last day. less
I am attempting to use the tm package to convert a vector of text strings to a corpus element.
My code looks something like... more
I am attempting to use the tm package to convert a vector of text strings to a corpus element.
My code looks something like this
Corpus(d1$Yes)
where d1$Yes is a factor with 124 levels, each containing a text string.
For example, d1$Yes = "So we can get the boat out!"
I'm receiving the following error: "Error: inherits(x, "Source") is not TRUE"
I'm not sure how to remedy this.
I had a tough evening today trying to convince one of my colleagues that NLP or Natural Language Processing is the super set and Text Analyticsis a sub set of it. At the best... moreI had a tough evening today trying to convince one of my colleagues that NLP or Natural Language Processing is the super set and Text Analyticsis a sub set of it. At the best probably both are synonymous and can be used interchangeably.
Is that correct? Anybody who has a crystal clarity as to whether these terms have a boundary well defined or can be used interchangeably?