Bonus: See https://twitter.com/karpathy/status/1013322763790999552
reshape tries to return a view if possible, otherwise copies to data to a contiguous tensor and returns the view on it. From the docs:,Returns a tensor with the same data and number of elements as input , but with the specified shape. When possible, the returned tensor will be a view of input . Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior. See torch.Tensor.view() on when it is possible to return a view. A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in input .,No, it should not be a bug regarding the gradient flow. However, as shown with the reshape vs. permute example, the wrong operator might of course cause problems in your training. E.g. if you would like to swap some axes of an image tensor from NCHW to NHWC, you should use permute., No, it should not be a bug regarding the gradient flow. However, as shown with the reshape vs. permute example, the wrong operator might of course cause problems in your training. E.g. if you would like to swap some axes of an image tensor from NCHW to NHWC, you should use permute.
x = torch.arange(4 * 10 * 2).view(4, 10, 2)
y = x.permute(2, 0, 1)
# View works on contiguous tensors
print(x.is_contiguous())
print(x.view(-1))
# Reshape works on non - contugous tensors(contiguous() + view)
print(y.is_contiguous())
try:
print(y.view(-1))
except RuntimeError as e:
print(e)
print(y.reshape(-1))
print(y.contiguous().view(-1))
permute is quite different to view and reshape:
# View vs.permute
x = torch.arange(2 * 4).view(2, 4)
print(x.view(4, 2)) >
tensor([
[0, 1],
[2, 3],
[4, 5],
[6, 7]
])
print(x.permute(1, 0)) >
tensor([
[0, 4],
[1, 5],
[2, 6],
[3, 7]
])
In [7]: import torch
In [8]: a = torch.tensor([[1,2],[3,4]])
In [9]: a
Out[9]:
tensor([[ 1, 2],
[ 3, 4]])
In [11]: a.permute(1,0)
Out[11]:
tensor([[ 1, 3],
[ 2, 4]])
In [12]: a.view(4,1)
Out[12]:
tensor([[ 1],
[ 2],
[ 3],
[ 4]])
In [13]:
tensor.permute()
permutes the order of the axes of a tensor.tensor.view()
reshapes the tensor (analogous to numpy.reshape
) by reducing/expanding the size of each dimension (if one increases, the others must decrease).In [12]: aten = torch.tensor([[1, 2, 3], [4, 5, 6]])
In [13]: aten
Out[13]:
tensor([[ 1, 2, 3],
[ 4, 5, 6]])
In [14]: aten.shape
Out[14]: torch.Size([2, 3])
torch.view() reshapes the tensor to a different but compatible shape. For example, our input tensor aten has the shape (2, 3). This can be viewed as tensors of shapes (6, 1), (1, 6) etc.,
# reshaping (or viewing) 2x3 matrix as a column vector of shape 6x1
In [15]: aten.view(6, -1)
Out[15]:
tensor([[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6]])
In [16]: aten.view(6, -1).shape
Out[16]: torch.Size([6, 1])
Alternatively, it can also be reshaped or viewed as a row vector of shape (1, 6)
as in:
In [19]: aten.view(-1, 6)
Out[19]: tensor([[ 1, 2, 3, 4, 5, 6]])
In [20]: aten.view(-1, 6).shape
Out[20]: torch.Size([1, 6])
Whereas tensor.permute() is only used to swap the axes. The below example will make things clear:
In [39]: aten
Out[39]:
tensor([[ 1, 2, 3],
[ 4, 5, 6]])
In [40]: aten.shape
Out[40]: torch.Size([2, 3])
# swapping the axes/dimensions 0 and 1
In [41]: aten.permute(1, 0)
Out[41]:
tensor([[ 1, 4],
[ 2, 5],
[ 3, 6]])
# since we permute the axes/dims, the shape changed from (2, 3) => (3, 2)
In [42]: aten.permute(1, 0).shape
Out[42]: torch.Size([3, 2])
You can also use negative indexing to do the same thing as in:
In [45]: aten.permute(-1, 0)
Out[45]:
tensor([[ 1, 4],
[ 2, 5],
[ 3, 6]])
In [46]: aten.permute(-1, 0).shape
Out[46]: torch.Size([3, 2])