Pytorch学习笔记

PyTorch

核心有两个特征:

An n-dimensional Tensor, similar to numpy but can run on GPUs
Automatic differentiation for building and training neural networks

PyTorch的基本运算单元是tensors

Autograd

autograd包提供Tensor所有操作的自动求导方法
运算时的节点是tensor,边是函数
运行时定义的框架

1. autograd.Variable

Variable is a thin wrapper around a Tensor object. Variable包装在tensor之上. This reference allows retracing the whole chain of operations that created the data.Variable的参考跟踪了计算得到Variable的data的整个链路.Variable是计算图中的一个节点

属性
- Variable的data属性,封装了任何类型的tensor
- Variable的creator属性,是只读属性.对于叶节点的Variable,creator是None
- Variable的grad属性
- Variable的requires_grad属性.This is especially useful when you want to freeze part of your model, or you know in advance that you’re not going to use gradients w.r.t. some parameters. Can be changed only on leaf Variables.requires_grad=False意味着不用计算对这个变量的梯度
- Variable的volatile属性.– Boolean indicating that the Variable should be used in inference mode, i.e. don’t save the history.
参数
- data Tensor to wrap
- requires_grad
- volatile
函数
- backward(gradient=None, retain_variables=False)
  参数:gradient(当out_put不是标量时需要);retain_variables(在使用后不释放数据的缓存)
- directed acyclic graph (DAG) consisting of Function objects as nodes, and references between them being the edges
- .creator是有向无环图的入口,同时运算的结果才有creator属性
- by following the path from any Variable to the leaves, it is possible to reconstruct the sequence of operations that has created the data, and automatically compute the gradients.
- what you run is what you differentiate
  - torch.autograd.backward(variables, grad_variables, retain_variables=False)
  1. The graph is differentiated using the chain rule
  2. If any of variables are non-scalar (i.e. their data has more than one element) and require gradient, the function additionaly requires specifying grad_variables.Variable只支持对标量计算梯度.

2. autograd.function

Records operation history and defines formulas for differentiating ops. Every operation performed on Variable s creates a new function object. The history is retained in the form of a DAG of functions, with edges denoting data dependencies

PyTorch的自动求导和TensorFlow类似:都是定义一个计算图,然后用自动微分的方法计算梯度的框架. 但是TensorFlow是静态的, 只定义计算图一次,然后一遍一遍地把数据喂给计算图,计算梯度. 而对于PyTorch则是每一次前向传播都定义一个新的计算图. 静态模型非常nice的特点是可以提前优化好模型.

属性
- save_tensors
- needs_input_grad
- nums_inputs-Number of inputs given to forward()
- num_outputs – Number of tensors returned by forward()
- requires_grad
- previous_functions
方法
- backward()
- forward()

Neural Networks

即从简单的计算图基础上更高层次的抽象

1. Containers

nn.Module - 神经网络模块,建立的模型应该是这个类的子类

parameters Returns an iterator over module parameters.

nn.Parameter
class torch.nn.Sequential
odules will be added to it in the order they are passed in the constructor.
class torch.nn.ModuleList
Holds submodules in a list.
class torch.nn.ParameterList

2. Convolution Layers

class torch.nn.Conv1d or Conv2d or Conv2d的卷基层
参数,以2d卷积为例
input size为(N, C_in, (rows, columns)), 其中N为输入数据的样本数量,C_in为输入数据的通道数,rows 和 columns为单个样本单个通道的tensor的size。即为nSamples x nChannels x Height x Width

3. Loss functions

torch.nn.L1Loss(参数标记是否取平均数）
$loss(x, y) = 1/n\sum|x_{i}-y_{i}|$
torch.nn.MSELoss
$loss(x, y) = 1/n\sum|x_{i}-y_{i}|^{2}$
torch.nn.CrossEntropyLoss
$loss(x, class) = -log(exp(x[class]) / (\sum_j exp(x[j]))) = -x[class] + log(\sum_j exp(x[j]))$

4. Non-linear Activations

class torch.nn.ReLU
$ReLU(x)=max(0, x)$

class torch.nn.ELU
$ELU(x)=max(0, x) + min(0, alpha*(exp(x) - 1))$
alpha – the alpha value for the ELU formulation
torch.nn.PReLU
$PReLU(x)=max(0,x)+a∗min(0,x)$
torch.nn.LeakyReLU
$LeakyReLU=max(0,x)+negative_slope∗min(0,x)$
nn.Sigmoid

few
torch.nn.Softplus
$f(x)=1/beta∗log(1+exp(beta∗xi))$

5. Dropout layers

6. Recurrent layers

7. Pooling Layers

8. Vision 函数

Upsample 函数，和 downsampling 相反，和 interpolation 插值的意思类似
Upsample函数根据概念，采样后的尺度(或者放大比例)和采样方法是非常重要的。相对应的就有PyTorch的参数，size(或者 scale_factor) 和 mode(nearest | bilinear | trilinear. Default: nearest)。实现的细节图像处理基础有实现，当使用Upsample时只需要考虑[尺度]和[采样方法]即可
因为Upsample是一种只关注待采样点周围像素点的方法，所以很可能失真或者其他问题，Deconvolution是一种非常好的替代方法。

权重更新

torch.optim 抽象了优化算法,并且实现了常用的优化算法
more sophisiticated optimizers like AdaGrad, RMSProp, Adam, etc.

数据转换

先转为numpy数组，再转换为torch.Tensor