QBoard » Artificial Intelligence & ML » AI and ML - Conceptual » How to convert text data to numerical vectors to train a CNN classifier?

How to convert text data to numerical vectors to train a CNN classifier?

  • To train a CNN classifier on text data, are there any different/better methods like one-hot encoding?
      January 11, 2021 5:06 PM IST
    0
  • If more training data is available, word2vec is a better choice to convert text to vector embeddings as it generates more accurate vectors when there is more data available.
    If the data available is less, one hot encoding is preferable. It is simple and just replaces all the occurrences of a word with a unique integer. But when used on a large amount of data, it leads to the curse of dimensionality as it generates a new dimension for each unique word available in the corpus.
      January 11, 2021 5:10 PM IST
    0
  • we count occurrence of word in a document w.r.t list. For example- vector conversion of sentence “There used to be Stone Age” can be represented as :

    “There” = 1

    ”was”= 0

    ”to”= 1

    ”be” =1

    ”used” = 1

    ”Stone”= 1

    ”Bronze” =0

    “Iron” =0

    ”Revolution”= 0

    ”Digital”= 0

    ”Age”=1

    ”of”=0

    ”Now”=0

    ”it”=0

    ”is”=0

      July 30, 2021 10:29 PM IST
    0