省流
碰到這種問題,尤其是平常運行的好好的,換個數(shù)據(jù)集就報錯,那大概率就是數(shù)據(jù)集本身有問題。順著這個思路去debug即可。
問題描述
dataloader在設置num_workers為任何大于0的數(shù)時出現(xiàn)如下報錯:
Traceback (most recent call last):
File "/home/username/distort/main.py", line 131, in <module>
model, perms, accs = train_model(dinfos, args.mid, args.pretrained, args.num_classes, args.treps, args.testep, args.test_dist, device, args.distort)
File "/home/username/distort/main.py", line 65, in train_model
for img, y in train_dataloader:
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1376, in _next_data
return self._process_data(data)
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1402, in _process_data
data.reraise()
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/_utils.py", line 461, in reraise
raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
data = fetcher.fetch(index)
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 175, in default_collate
return [default_collate(samples) for samples in transposed] # Backwards compatibility.
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 175, in <listcomp>
return [default_collate(samples) for samples in transposed] # Backwards compatibility.
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 140, in default_collate
out = elem.new(storage).resize_(len(batch), *list(elem.size()))
RuntimeError: Trying to resize storage that is not resizable
num_workers設置為0時則出現(xiàn)新的報錯:
Traceback (most recent call last):
File "/home/username/distort/main.py", line 130, in <module>
model, perms, accs = train_model(dinfos, args.mid, args.pretrained, args.num_classes, args.treps, args.testep, args.test_dist, device, args.distort)
File "/home/username/distort/main.py", line 64, in train_model
for img, y in train_dataloader:
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 721, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
return self.collate_fn(data)
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 175, in default_collate
return [default_collate(samples) for samples in transposed] # Backwards compatibility.
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 175, in <listcomp>
return [default_collate(samples) for samples in transposed] # Backwards compatibility.
File "/home/username/miniconda3/envs/round11/lib/python3.9/site-packages/torch/utils/data/_utils/collate.py", line 141, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: stack expects each tensor to be equal size, but got [3, 64, 64] at entry 0 and [1, 64, 64] at entry 32
問題排查
第二個報錯還是比較容易排查的。在自定義dataset類的__getitem__()函數(shù)中加入代碼:當讀取的tensor的shape[0]為1時打印該tensor對應原始數(shù)據(jù)文件的路徑。
發(fā)現(xiàn)數(shù)據(jù)集中確實有通道數(shù)為1的圖片(我用的tiny-imagenet-200),沒想到真的是數(shù)據(jù)集的鍋。
問題解決
在__getitem__()函數(shù)使用tensor類的expand,對于通道數(shù)不對的tensor,調(diào)用expand(3,-1,-1)
即可。之后num_workers設置為0或者其他正數(shù)時都能正常加載數(shù)據(jù)集。文章來源:http://www.zghlxwxcb.cn/news/detail-507222.html
另外需要注意,有的博客說num_workers需要匹配GPU核心的數(shù)量,這邏輯屬實離譜。從上面的第一個報錯就能看出來,出錯點和CUDA庫毫無關系,因此不可能是GPU相關的問題。至少按照常用的加載數(shù)據(jù)集的方法,num_workers就是規(guī)定dataloader使用CPU線程的最大數(shù)量。文章來源地址http://www.zghlxwxcb.cn/news/detail-507222.html
到了這里,關于解決pytorch dataloader報錯:Trying to resize storage that is not resizable的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章,希望大家以后多多支持TOY模板網(wǎng)!