QBoard » Artificial Intelligence & ML » AI and ML - Conceptual » Do neural networks handle different input sizes?

Do neural networks handle different input sizes?

  • When working with image or text-based data, are there any neural network architectures that can handle input with different sizes? If not, what are the best ways to handle these kinds of data?
      January 9, 2021 4:29 PM IST
    0
  • For images we can use crop or resize for preprocessing image size, but we cannot train random sized images in a single model since the first layer considers predifined input size
      August 3, 2021 10:15 PM IST
    0
  • Three possibilities come to mind.

    The easiest is the zero-padding. Basically, you take a rather big input size and just add zeroes if your concrete input is too small. Of course, this is pretty limited and certainly not useful if your input ranges from a few words to full texts.

    Recurrent NNs (RNN) are a very natural NN to choose if you have texts of varying size as input. You input words as word vectors (or embeddings) just one after another and the internal state of the RNN is supposed to encode the meaning of the full string of words. This is one of the earlier papers.

    Another possibility is using recursive NNs. This is basically a form of preprocessing in which a text is recursively reduced to a smaller number of word vectors until only one is left - your input, which is supposed to encode the whole text. This makes a lot of sense from a linguistic point of view if your input consists of sentences (which can vary a lot in size), because sentences are structured recursively. For example, the word vector for "the man", should be similar to the word vector for "the man who mistook his wife for a hat", because noun phrases act like nouns, etc. Often, you can use linguistic information to guide your recursion on the sentence. If you want to go way beyond the Wikipedia article, this is probably a good start.

      December 18, 2021 11:42 AM IST
    0
  • If the data type is image, resizing all the images to a fixed shape is necessary as the model architecture expects a particular input shape. 

    When dealing with text data, one option is to zero-padding the images i.e. add zeroes to the inputs which are small to convert all the data to a fixed shape. Alternatively, we can create word embeddings using libraries like word2vec, where all the words/sentences (irrespective of the length of the text) are mapped to vectors of real numbers.
      January 9, 2021 4:31 PM IST
    0