calculate perplexity in pytorch

QBoard » Artificial Intelligence & ML » AI and ML - PyTorch » calculate perplexity in pytorch

User Dashboard

calculate perplexity in pytorch

Back To Topics

Tags : None

Laksh Nath

126

I've just trained an LSTM language model using pytorch. The main body of the class is this:

class LM(nn.Module): def __init__(self, n_vocab, seq_size, embedding_size, lstm_size, pretrained_embed): super(LM, self).__init__() self.seq_size = seq_size self.lstm_size = lstm_size self.embedding = nn.Embedding.from_pretrained(pretrained_embed, freeze = True) self.lstm = nn.LSTM(embedding_size, lstm_size, batch_first=True) self.fc = nn.Linear(lstm_size, n_vocab) def forward(self, x, prev_state): embed = self.embedding(x) output, state = self.lstm(embed, prev_state) logits = self.fc(output) return logits, state

Now I want to write a function which calculates how good a sentence is, based on the trained language model (some score like perplexity, etc.).

I'm a bit confused and I don't know how should I calculate this.
A similar sample would be of greate use.

November 3, 2021 5:26 PM IST

0
Viaan Prakash

461

Perplexity (PPL) is one of the most common metrics for evaluating language models. Before diving in, we should note that the metric applies specifically to classical language models (sometimes called autoregressive or causal language models) and is not well defined for masked language models like BERT (see summary of the models).

Perplexity is defined as the exponentiated average negative log-likelihood of a sequence. If we have a tokenized sequence $(x_0, x_1, \dots, x_t)$ , then the perplexity of $X$ is, $\text{PPL}(X) = \exp \left\{ {-\frac{1}{t}\sum_i^t \log p_\theta (x_i|x_{<i}) } \right\}$

where $\log p_\theta (x_i|x_{<i})$ is the log-likelihood of the ith token conditioned on the preceding tokens $x_{<i}$ according to our model. Intuitively, it can be thought of as an evaluation of the model’s ability to predict uniformly among the set of specified tokens in a corpus. Importantly, this means that the tokenization procedure has a direct impact on a model’s perplexity which should always be taken into consideration when comparing different models.

This is also equivalent to the exponentiation of the cross-entropy between the data and model predictions. For more intuition about perplexity and its relationship to Bits Per Character (BPC) and data compression, check out this fantastic blog post on The Gradient.

Calculating PPL with fixed-length models

If we weren’t limited by a model’s context size, we would evaluate the model’s perplexity by autoregressively factorizing a sequence and conditioning on the entire preceding subsequence at each step, as shown below.

When working with approximate models, however, we typically have a constraint on the number of tokens the model can process. The largest version of GPT-2, for example, has a fixed length of 1024 tokens, so we cannot calculate $p_\theta(x_t|x_{<t})$ directly when $t$ is greater than 1024.

Instead, the sequence is typically broken into subsequences equal to the model’s maximum input size. If a model’s max input size is $k$ , we then approximate the likelihood of a token $x_t$ by conditioning only on the $k - 1$ tokens that precede it rather than the entire context. When evaluating the model’s perplexity of a sequence, a tempting but suboptimal approach is to break the sequence into disjoint chunks and add up the decomposed log-likelihoods of each segment independently.

This is quick to compute since the perplexity of each segment can be computed in one forward pass, but serves as a poor approximation of the fully-factorized perplexity and will typically yield a higher (worse) PPL because the model will have less context at most of the prediction steps.

December 7, 2021 12:41 PM IST

0
Advika Banerjee

319 1
When using Cross-Entropy loss you just use the exponential function torch.exp() calculate perplexity from your loss.
(pytorch cross-entropy also uses the exponential function resp. log_n)

So here is just some dummy example:
```
import torch
import torch.nn.functional as F
num_classes = 10
batch_size  = 1

# your model outputs / logits
output      = torch.rand(batch_size, num_classes) 

# your targets
target      = torch.randint(num_classes, (batch_size,))

# getting loss using cross entropy
loss        = F.cross_entropy(output, target)

# calculating perplexity
perplexity  = torch.exp(loss)
print('Loss:', loss, 'PP:', perplexity)  
```
In my case the output is:
```
Loss: tensor(2.7935) PP: tensor(16.3376)
```
You just need to be beware of that if you want to get the per-word-perplexity you need to have per word loss as well.

Here is a neat example for a language model that might be interesting to look at that also computes the perplexity from the output:

https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/02-intermediate/language_model/main.py#L30-L50
November 11, 2021 2:33 PM IST

0

Maryam Bains

317

I was surfing around at PyTorch's website and found a calculation of perplexity. You can examine how they calculated it as ppl as follows:

criterion = nn.CrossEntropyLoss()
total_loss = 0.
...
for batch, i in enumerate(range(0, train_data.size(0) - 1, bptt)):
    ...
    loss = criterion(output.view(-1, ntokens), targets)
    loss.backward()
    total_loss += loss.item()
    log_interval = 200
    if batch % log_interval == 0 and batch > 0:
        cur_loss = total_loss / log_interval
        ...
        print('ppl {:8.2f}'.format(math.exp(cur_loss)))
        ...

As @SpiderRico reminded, I got it from this link

November 19, 2021 12:04 PM IST

Cluzters.ai

Cluzters.ai is the first step towards uniting various Industry participants in the field of Applied Data Innovations. It is a gamified community geared towards creating a level playing turf for Data science professionals.

Member Sign In

Member Sign In

Create Account

calculate perplexity in pytorch

Calculating PPL with fixed-length models

Connect With Us