Sequence |
Class |
[1, 1, -1, -1, 1, -1] = (())() |
1 |
[1, -1, 1, -1] = ()() |
1 |
[1, -1, 1, 1, -1, 1, -1, -1] = ()(()()) |
1 |
[1, 1, -1, -1, -1, 1, -1, 1] = (()))()( |
0 |
[1, -1, -1, 1, 1, -1] = ())(() |
0 |
[1, -1, -1, 1] = ())( |
0 |
---
## A simple binary sequence classification problem
We will make it a bit more complicated with colored parenthesis, example with 10 colors.
Rule: opening parenthesis $i\in [0,4]$ with corresponding closing parenthesis $j\in [5,9]$ such that $i+j=9$.
Sequence |
Class |
[2, 0, 9, 7, 0, 9] = (())() |
1 |
[1, 8, 3, 6] = ()() |
1 |
[0, 9, 2, 4, 5, 2, 7, 7] = ()(()()) |
1 |
[0, 2, 7, 9, 7, 2, 7, 3] = (()))()( |
0 |
[1, 8, 9, 0, 1, 9] = ())(() |
0 |
[1, 8, 7, 1] = ())( |
0 |
---
# Elman network (1990)
Initial hidden state: $h_0 =0$
Update:
$$
h\_t = \mathrm{ReLU}(W\_{xh} x\_t + W\_{hh} h\_{t-1} + b\_h)
$$
Final prediction:
$$
y\_T = W\_{hy} h\_T + b\_y.
$$
--
```
class RecNet(nn.Module):
def __init__(self, dim_input, dim_recurrent, dim_output):
super(RecNet, self).__init__()
self.fc_x2h = nn.Linear(dim_input, dim_recurrent)
self.fc_h2h = nn.Linear(dim_recurrent, dim_recurrent, bias = False)
self.fc_h2y = nn.Linear(dim_recurrent, dim_output)
def forward(self, x):
h = x.new_zeros(1, self.fc_h2y.weight.size(1))
for t in range(x.size(0)):
h = torch.relu(self.fc_x2h(x[t,:]) + self.fc_h2h(h))
return self.fc_h2y(h)
```
---
# Training
We encode the symbol at time $t$ as a one-hot vector $x_t$.
To simplify the processing of variable-length sequences, we are processing samples (i.e. sequences) one at a time.
```
RNN = RecNet(dim_input = nb_symbol, dim_recurrent=50, dim_output=2)
cross_entropy = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(RNN.parameters(), lr=learning_rate)
for k in range(nb_train):
x,l = generator.generate_input()
y = RNN(x)
loss = cross_entropy(y,l)
optimizer.zero_grad()
loss.backward()
optimizer.step()
```
---
## Results
.left-column[