[数据归一化]均值和方差设置

PyTorch提供了函数torchvision.transforms.Normalize用于标准化图像数据取值,其计算公式如下

# torchvision.transforms.Normalize(mean, std, inplace=False)
output[channel] = (input[channel] - mean[channel]) / std[channel]

在实践过程中,发现有好几种均值和方差的推荐

ToTensor

Normalize通常配合ToTensor一起使用。ToTensor的功能是将PIL Image或者numpy.ndarray格式的数据转换成tensor格式,同时将取值范围[0, 255]转换成[0.0, 1.0]

Convert a PIL Image or numpy.ndarray to tensor.

Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0] if the PIL Image belongs to one of the modes (L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1) or if the numpy.ndarray has dtype = np.uint8

In the other cases, tensors are returned without scaling.

transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))

参考:Understanding transform.Normalize( )

最常用操作,将数据缩放到[-1, 1]之间

transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))

参考:

How to get these data(means = [0.485, 0.456, 0.406] stds = [0.229, 0.224, 0.225]) in your code? #6

Why Pytorch officially use mean=[0.485, 0.456, 0.406] and std=[0.229, 0.224, 0.225] to normalize images?

上述均值和标准差来源于ImageNet数据集,如果使用PyTorch提供的预训练模型,推荐该设置

自定义均值和标准差

对于特定的数据集,可以直接通过训练集计算。具体实现参考:

Normalization in the mnist example

特征缩放

参考About Normalization using pre-trained vgg16 networkspytorch 计算图像数据集的均值和标准差实现了数据集的均值和标准差的计算,以CIFAR100为例

import torch
from tqdm import tqdm
from torchvision.datasets.cifar import CIFAR100
from torch.utils.data import DataLoader
import torchvision.transforms as transforms


def main():
    transform = transforms.Compose(
        [
            transforms.Resize(224),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
        ]
    )
    data_set = CIFAR100('./data/cifar', train=True, transform=transform, download=True)
    data_loader = DataLoader(data_set, batch_size=24, num_workers=8, shuffle=False)

    nb_samples = 0.
    channel_mean = torch.zeros(3)
    channel_std = torch.zeros(3)
    for images, targets in tqdm(data_loader):
        # scale image to be between 0 and 1
        N, C, H, W = images.shape[:4]
        data = images.view(N, C, -1)

        channel_mean += data.mean(2).sum(0)
        channel_std += data.std(2).sum(0)
        nb_samples += N

    channel_mean /= nb_samples
    channel_std /= nb_samples
    print(channel_mean, channel_std)


if __name__ == '__main__':
    main()
################ 输出
tensor([0.5071, 0.4865, 0.4409]) tensor([0.1942, 0.1918, 0.1958])

注意:不同图像转换操作会带来不一样标准差结果