QBoard » Artificial Intelligence & ML » AI and ML - Python » Unable to allocate array with shape and data type

Unable to allocate array with shape and data type

  • I'm facing an issue with allocating huge arrays in numpy on Ubuntu 18 while not facing the same issue on MacOS.

    I am trying to allocate memory for a numpy array with shape (156816, 36, 53806) with

    np.zeros((156816, 36, 53806), dtype='uint8')

    and while I'm getting an error on Ubuntu OS

    >>> import numpy as np
    >>> np.zeros((156816, 36, 53806), dtype='uint8')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    numpy.core._exceptions.MemoryError: Unable to allocate array with shape (156816, 36, 53806) and data type uint8

    I'm not getting it on MacOS:

    >>> import numpy as np 
    >>> np.zeros((156816, 36, 53806), dtype='uint8')
    array([[[0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            ...,
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0]],
    
           [[0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            ...,
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0]],
    
           [[0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            ...,
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0]],
    
           ...,
    
           [[0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            ...,
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0]],
    
           [[0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            ...,
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0]],
    
           [[0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            ...,
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0],
            [0, 0, 0, ..., 0, 0, 0]]], dtype=uint8)

    I've read somewhere that np.zeros shouldn't be really allocating the whole memory needed for the array, but only for the non-zero elements. Even though the Ubuntu machine has 64gb of memory, while my MacBook Pro has only 16gb.

    versions:

    Ubuntu
    os -> ubuntu mate 18
    python -> 3.6.8
    numpy -> 1.17.0
    
    mac
    os -> 10.14.6
    python -> 3.6.4
    numpy -> 1.17.0

    PS: also failed on Google Colab

     

     

      December 8, 2020 11:11 AM IST
    0
  • This is likely due to your system's overcommit handling mode.

    In the default mode, 0,

    Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slightly more memory in this mode. This is the default.

    The exact heuristic used is not well explained here, but this is discussed more on Linux over commit heuristic and on this page.

    You can check your current overcommit mode by running

    $ cat /proc/sys/vm/overcommit_memory
    0​

    In this case you're allocating
    >>> 156816 * 36 * 53806 / 1024.0**3
    282.8939827680588

    ~282 GB, and the kernel is saying well obviously there's no way I'm going to be able to commit that many physical pages to this, and it refuses the allocation.

    If (as root) you run:

    $ echo 1 > /proc/sys/vm/overcommit_memory

     

    This will enable "always overcommit" mode, and you'll find that indeed the system will allow you to make the allocation no matter how large it is (within 64-bit memory addressing at least).

    I tested this myself on a machine with 32 GB of RAM. With overcommit mode 0 I also got a MemoryError, but after changing it back to 1 it works:
    >>> import numpy as np
    >>> a = np.zeros((156816, 36, 53806), dtype='uint8')
    >>> a.nbytes
    303755101056

    You can then go ahead and write to any location within the array, and the system will only allocate physical pages when you explicitly write to that page. So you can use this, with care, for sparse arrays.

      December 15, 2020 11:43 AM IST
    0
  • I had this same problem on Window's and came across this solution. So if someone comes across this problem in Windows the solution for me was to increase the pagefile size, as it was a Memory overcommitment problem for me too.

    Windows 8

    On the Keyboard Press the WindowsKey + X then click System in the popup menu
    Tap or click Advanced system settings. You might be asked for an admin password or to confirm your choice
    On the Advanced tab, under Performance, tap or click Settings.
    Tap or click the Advanced tab, and then, under Virtual memory, tap or click Change
    Clear the Automatically manage paging file size for all drives check box.
    Under Drive [Volume Label], tap or click the drive that contains the paging file you want to change
    Tap or click Custom size, enter a new size in megabytes in the initial size (MB) or Maximum size (MB) box, tap or click Set, and then tap or click OK
    Reboot your system
    Windows 10

    Press the Windows key
    Type SystemPropertiesAdvanced
    Click Run as administrator
    Under Performance, click Settings
    Select the Advanced tab
    Select Change...
    Uncheck Automatically managing paging file size for all drives
    Then select Custom size and fill in the appropriate size
    Press Set then press OK then exit from the Virtual Memory, Performance Options, and System Properties Dialog
    Reboot your system
    Note: I did not have the enough memory on my system for the ~282GB in this example but for my particular case this worked.

    EDIT

    From here the suggested recommendations for page file size:

    There is a formula for calculating the correct pagefile size. Initial size is one and a half (1.5) x the amount of total system memory. Maximum size is three (3) x the initial size. So let's say you have 4 GB (1 GB = 1,024 MB x 4 = 4,096 MB) of memory. The initial size would be 1.5 x 4,096 = 6,144 MB and the maximum size would be 3 x 6,144 = 18,432 MB.

    Some things to keep in mind from here:

    However, this does not take into consideration other important factors and system settings that may be unique to your computer. Again, let Windows choose what to use instead of relying on some arbitrary formula that worked on a different computer.

    Also:

    Increasing page file size may help prevent instabilities and crashing in Windows. However, a hard drive read/write times are much slower than what they would be if the data were in your computer memory. Having a larger page file is going to add extra work for your hard drive, causing everything else to run slower. Page file size should only be increased when encountering out-of-memory errors, and only as a temporary fix. A better solution is to adding more memory to the computer.
      August 7, 2021 1:32 PM IST
    0
  • change the data type to another one which uses less memory works. For me, I change the data type to numpy.uint8:

    data['label'] = data['label'].astype(np.uint8)
    
      January 7, 2022 12:43 PM IST
    0
  • I came across this problem on Windows too. The solution for me was to switch from a 32-bit to a 64-bit version of Python. Indeed, a 32-bit software, like a 32-bit CPU, can adress a maximum of 4 GB of RAM (2^32). So if you have more than 4 GB of RAM, a 32-bit version cannot take advantage of it.

    With a 64-bit version of Python (the one labeled x86-64 in the download page), the issue disappears.

    You can check which version you have by entering the interpreter. I, with a 64-bit version, now have: Python 3.7.5rc1 (tags/v3.7.5rc1:4082f600a5, Oct 1 2019, 20:28:14) [MSC v.1916 64 bit (AMD64)], where [MSC v.1916 64 bit (AMD64)] means "64-bit Python".

      August 10, 2021 4:20 PM IST
    0