表示学习(representation learning)

典型结构
输入
函数
输出

理论上:运用线性代数中的矩阵乘法


承载数据的基本类型:Tensor。那什么是 Tensor?
torch.tensor(3.14) # shape = []
torch.tensor([1, 2, 3]) # shape = [3]
torch.tensor([[1, 2], [3, 4]]) # shape = [2, 2]
x = torch.randn(2, 3, 4, 5) # shape = [2, 3, 4, 5]
矩阵乘
y = x@w
y = x.matmul(w)
y = torch.matmul(x, w)
元素乘
y = x*w
y = x.mul(w)
y = torch.mul(x, w)
矩阵乘法
einsum 表达式
import torch
X = torch.randn(2, 3) # B=2, D=3
W = torch.randn(3, 4) # D=3, E=4
Y = torch.einsum("bd,de->be", X, W)
print(Y.shape) # torch.Size([2, 4])
A = torch.randn(10, 3, 4) # batch=10
B = torch.randn(10, 4, 5)
Y = torch.einsum("bik,bkj->bij", A, B)
print(Y.shape) # torch.Size([10, 3, 5])
线性层 (torch.nn.Linear):
torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)
PyTorch中的输入/输出: 都是tensor
self.in_features = in_features
self.out_features = out_features
self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
def forward(self, input: Tensor) -> Tensor:
return F.linear(input, self.weight, self.bias)
Torch docs中的官方示例
m = nn.Linear(20, 30)
input = torch.randn(128, 20)
output = m(input)
print(output.size())
m1 = nn.Linear(20, 30)
m2 = nn.Linear(30, 40)
x = torch.randn(128, 20)
y1 = m1(x)
y2 = m2(y1)
x = torch.randn(128, 4096, 30, 20)
y = m1(x)
y = m2(y)
https://marp.app/
--- # GPU 上的矩阵乘法 - GPU 将矩阵划分为很多小块(thread block) - 每个 block 在显存和共享内存之间做高速数据交换 - CUDA 核函数中: - 每个线程负责计算 $Y$ 中的一个或一小片元素 - 通过 warp/wavefront 协同提高吞吐