Brushing up on my PyTorch skills every week. Starting from scratch. Not in a hurry. The goal is to follow along TorchLeet and go up to karpathy/nanoGPT or karpathy/nanochat. Summary of the 1st three weeks.

Week 1

  • Create a Linear Regression model.
    • torch.nn.Linear to define a learnable model.
    • forward() for forward pass.
    • model.parameters containing all the learned weights and also passed to the optimizer (SGD, Adam, etc.)
    • Use torch.no_grad() during inferencing.
  • Log the training logs to TensorBoard. (not a part of the TorchLeet repo)
    • SummaryWriter from torch.utils.tensorboard. Tensorflow can directly use a callback inside the fit function to push all the relevant logs. The SummaryWriter gives a fine-grained control to log anything.
    • add_scalar() to log the training loss.
    • add_graph() to log the graph itself.
    • Load the Jupyter tensorboard extension so that we don’t have to leave the notebook to look at the logs and and pretty plots.

        %load_ext tensorboard
      
    • Load the tensorboard UI inside the notebook.

        %tensorboard --logdir PATH_TO_LOG_DIR
      

Week 2

  • Create a Dataset
    • Dataset class from torch.utils.data.
    • Create a subclass of Dataset for my specific dataset. Added data, X and y attributes to the class.
    • Since we will iterate through the rows of this dataset, defined __len__ and __getitem__ functions. These overloaded functions enable code like len(dataset) and dataset[i], respectively.
  • Dataloader
    • Dataset only defines the dataset. Dataloader from torch.utils.data creates an iterator. It also brings other capabilities like batching and shuffling. Eg:

        dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)
      
    • We can run a for loop on this dataloader now.

  • Good intro to the topic from PyTorch - Datasets and DataLoaders.
  • Trained the Linear Regression model using the dataloader. Faced some issues due to dtype mismatches - used torch.float32 everywhere to fix it.
  • The exercise only asked for a single column dataset. Played around with a dataset with multiple columns.
  • Use tensorboard for all the logging.

Week 3

  • Two types of activation functions – with learnable parameters and without.
  • Activation function with learnable parameters will require a nn.Module subclass. It is required to do the gradient calculations using forward and backward functions and get the final trained weights.
  • Created a custom activation without learnable parameters: \(\text{tanh}(x) + x\).
  • Updated the Linear Regression class to have the final output go through \(\text{tanh}(x) + x\) using torch.tanh().

      return self.custom_activation(self.linear*(x))
    
  • This SO answer talks about how to write a custom activation function in different scenarios: non-learnable, learnable, learnable with PyTorch functions, and learnable without PyTorch functions.
  • Also learned about torch.nn.Parameter and torch.nn.Variable.
  • Kept using Tensorboard for all the logging.


Next 2 weeks: Custom Loss Function (Huber Loss) and Deep Neural Network