<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://trigonaminima.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://trigonaminima.github.io/" rel="alternate" type="text/html" /><updated>2026-03-08T11:29:02+00:00</updated><id>https://trigonaminima.github.io/feed.xml</id><title type="html">Playground</title><subtitle></subtitle><author><name>Shivam Rana</name></author><entry><title type="html">Move US Stocks from INDMoney to Interactive Brokers</title><link href="https://trigonaminima.github.io/2026/03/indmoney-to-ibkr/" rel="alternate" type="text/html" title="Move US Stocks from INDMoney to Interactive Brokers" /><published>2026-03-08T00:00:00+00:00</published><updated>2026-03-08T00:00:00+00:00</updated><id>https://trigonaminima.github.io/2026/03/indmoney-to-ibkr</id><content type="html" xml:base="https://trigonaminima.github.io/2026/03/indmoney-to-ibkr/"><![CDATA[<p>I wanted to move my US stocks (securities) from INDMoney to Interactive Brokers. I didn’t find any clear steps documented anywhere. It took me multiple attempts to finally be able to move over my portfolio.</p>

<p>This post is divided into two parts: steps followed and FAQs.</p>

<h2 id="steps">Steps</h2>

<p>Absolute Pre-requisites</p>

<ol>
  <li>Have an active account on IBKR. That means account verified and ready to buy stocks.</li>
  <li>Make your fractional shares whole on INDMoney. That means either going from 1.6 qty to 1 qty or 2 qtys.</li>
  <li>Keep at least <em>USD 70</em> in the INDMoney wallet.</li>
</ol>

<p>Now, here are the steps to transfer:</p>

<ol>
  <li>Log into IBKR.</li>
  <li>In the menu, go to <code class="language-plaintext highlighter-rouge">Tansfers (Deposit, Withdraw and Transfer History)</code> (or whatever they are calling it now).</li>
  <li>Go to <code class="language-plaintext highlighter-rouge">Transfer Positions</code>.</li>
  <li>Select <code class="language-plaintext highlighter-rouge">Incoming</code></li>
  <li>Select region as <code class="language-plaintext highlighter-rouge">United States</code> as DriveWealth, the broker used by INDMoney, is in US.</li>
  <li>Select <code class="language-plaintext highlighter-rouge">ACATS</code> as the transfer method. (usually the very first option and the most convenient)</li>
  <li>Choose <code class="language-plaintext highlighter-rouge">DriveWealth</code> in the broker dropdown.</li>
  <li>My account number looked like: <code class="language-plaintext highlighter-rouge">IFSC-XXX-&lt;Account number on the INDMoney app&gt;</code> (17 characters). Account number is the crucial part. One of my requests failed becuase of this. INDMoney doesn’t make it easy to find your account number. DON’T use the one shown on the app. Follow: <code class="language-plaintext highlighter-rouge">US Stocks Tab</code> → <code class="language-plaintext highlighter-rouge">Manage</code> → <code class="language-plaintext highlighter-rouge">US Stocks Reports</code> → <code class="language-plaintext highlighter-rouge">US Trades Report</code> to find your account numner.</li>
  <li>Account Title and Tax Identification Number are pre-selected. Choose your Account Type. Mine was <code class="language-plaintext highlighter-rouge">Individual</code>.</li>
  <li>Decide <code class="language-plaintext highlighter-rouge">FULL ACATS</code> or <code class="language-plaintext highlighter-rouge">PARTIAL ACATS</code> transfer. I did FULL ACATS. You can’t have any fractional shares for full transfer. Refer FAQs for more details.</li>
  <li>Give authorisation to IBKR to take appropriate actions when the positions of you are transferring are not the products IBKR supports. IBKR has given details on what you are giving them the authorisation for. I selected <code class="language-plaintext highlighter-rouge">Yes</code> for everything.</li>
  <li>Verify your details. Sign and submit.</li>
</ol>

<p>IBKR will validate and confirm the request, submit it to the DriveWealth, receive the stocks. You will get an email from IBKR about the completion or failure.</p>

<h2 id="faqs">FAQs</h2>

<p><strong>Q:</strong> What are the tax implications of the transfer process?<br />
<strong>A:</strong> My research told me that transferring stocks incurs <em>no capital gains tax</em>, as it’s a dealer transfer, not a sale. Verify with your CA. Notwithstanding, maintain your transaction records with the earlier platform for tax filing purposes.</p>

<p><strong>Q:</strong> Who is INDMoney’s broker?<br />
<strong>A:</strong> DriveWealth</p>

<p><strong>Q:</strong> Region to select for the transfer.<br />
<strong>A:</strong> DriveWealth is a US broker. So region is United States of America.</p>

<p><strong>Q:</strong> What is incoming ACATS and outgoing ACATS?<br />
<strong>A:</strong> Since I wanted to get the securities out of DriveWealth, it is going to be an <em>outgoing ACATS</em> for DriveWealth and <em>incoming ACATS</em> for Interactive Brokers.</p>

<p><strong>Q:</strong> Who initiates the ACATS (Automated Customer Account Transfer Service) transfer request?<br />
<strong>A:</strong> Always the receiving broker. So, there is no need to interact with INDMoney unless there is some snag in the process and you need help.</p>

<p><strong>Q:</strong> Is DriveWealth (INDMoney’s underlying broker) ACATS transfer enabled?<br />
<strong>A:</strong> Yes, I confirmed this with INDMoney.</p>

<p><strong>Q:</strong> Does DriveWealth charge a fee for the (outbound) transfer?<br />
<strong>A:</strong> Yes, I was charged <em>USD 65</em>. I confirmed this with an INDMoney customer support request. Their explanation:</p>

<blockquote>
  <p>For outgoing ACATS transfers, a fee is typically applied by your U.S. broker. This charge is levied by the broker currently holding your assets to facilitate the transfer out of their system.</p>
</blockquote>

<p><strong>Q:</strong> How do I pay for the DriveWealth fee?<br />
<strong>A:</strong> Keep at least USD 70 in the INDMoney wallet.</p>

<p><strong>Q:</strong> Does IBKR charge a fee for the (inbound) transfer?<br />
<strong>A:</strong> I was <em>not</em> charged anything.</p>

<p><strong>Q:</strong> Is there a penalty for transfer failures due to any errors?<br />
<strong>A:</strong> I was <em>not</em> charged anything. I successfully transfer in my third attempt.</p>

<p><strong>Q:</strong> How long does it take to complete the transfer after submission.<br />
<strong>A:</strong> It took 4 working days from submission to completion for me. Under standard conditions, ACATS transfers to IBKR complete in about 4–8 business days after submission.</p>

<blockquote>
  <p>Gemini says that some industry sources note 3–5 business days in smooth cases. </p>
</blockquote>

<p><strong>Q:</strong> Can I trade on INDMoney during my transfer process?<br />
<strong>A:</strong> I avoided it after I raised my transfer request. INDMoney’s response:</p>

<blockquote>
  <p>During the outgoing ACATS transfer, trading activity (including deposits, buys, and sells) will be temporarily restricted to ensure the process completes without errors.</p>
</blockquote>

<p><strong>Q:</strong> Can I trade on IBKR during my transfer process?<br />
<strong>A:</strong> Yes.</p>

<p><strong>Q:</strong> How does it work if the residency status has changed - INDMoney app having residency A and IBKR having residency B?<br />
<strong>A:</strong> Many people on this <a href="https://www.reddit.com/r/INDmoneyApp/comments/1o1wxdj/how_to_handle_us_stock_investments_after_becoming/">reddit thread</a> seemed to think so. I tried, and it worked for me.</p>

<p><strong>Q:</strong> My address is different on INDMoney and IBKR. Will it work?<br />
<strong>A:</strong> It worked for me. According to Gemini, the ACATS system relies on an <strong>exact match</strong> of key identifying information between the delivering account (INDmoney’s partner broker) and the receiving account (IBKR) to prevent unauthorized transfers. The key matching criteria is:</p>

<ul>
  <li>Account Title/Registration: Your name must be identical on both accounts.</li>
  <li>Account Type: (e.g., Individual, Joint, etc.) must match.</li>
  <li>Tax ID: (e.g., Social Security Number, or other tax identification used) must match.</li>
</ul>

<p>All three were supplied by me. So, I guess that’s enough. I think it would have mattered if my country of residence had been US, because then my Tax ID would have changed.</p>

<p><strong>Q:</strong> How do I track my transfer?<br />
<strong>A:</strong> INDMoney doesn’t tell you anything about the transfer or the status. You will not even get a notification/email after the transfer is complete. Only place to track the transfer is on IBKR. Follow the same steps that you followed to initiate the transfer. IBKR will guide to the the status window.</p>

<p><strong>Q:</strong> How are fractional shares handled?<br />
<strong>A:</strong> This caused a lot of research for me. Everyone online mentioned that you can’t transfer fractional shares. INDMoney also confirmed this:</p>

<blockquote>
  <p>Please note that fractional shares cannot be transferred. These may be liquidated as part of the transfer process.</p>
</blockquote>

<p>So this was clear. What was not clear was: do I need to make them whole or will it only transfer the complete units and leave the fractional units in INDMoney? The latter part of INDMoney’s response gave the impression that it will handle the liquidation on it’s own. This assumption was incorrect. My request was rejected because of fractional shares.</p>

<p>So, as I mentioned in the pre-requisites, make your fractional shares whole or exit entirely. That is, turn 3.55 units of NVDA into 3 or 4 or 0 units or NVDA.</p>

<p><strong>Q:</strong> Can I use partial transfer to avoid selling my fractional shares?<br />
<strong>A:</strong> I was always trying for full transfer. When the request was rejected due to fractional shares, I contemplated using partial transfer to avoid exiting. I was avoiding selling for two reasons:</p>

<ul>
  <li>I wasn’t sure if selling would lead to any tax implications;</li>
  <li>Some of my stocks were bought at very attractive prices. I want to protect my avg price.</li>
</ul>

<p>After some thought, I decided to make them whole. Exited some of the fractional shares and bought some of the others (when there was a dip).</p>

<p><strong>Q:</strong> Does the ACATS transfer also take care of transfering my buy/sell history, avg price and other associated history?<br />
<strong>A:</strong> Not sure what is supposed to be transfered. What I got was only my avg price. All the past buy-sell history was gone. Since it was gone, my CAGR type metrics were also gone.</p>

<p><strong>Q:</strong> What resources did you refer to understand the process?<br />
<strong>A:</strong> Here are some links</p>

<ul>
  <li><a href="https://paasa.com/blog/transfer-indmoney-to-paasa">Paasa Blog</a></li>
  <li><a href="https://support.vestedfinance.com/portal/en/kb/articles/how-do-i-migrate-my-us-brokerage-account-from-indmoney-stockal-to-vested-31-1-2024">Vested support</a></li>
  <li><a href="https://groww.in/blog/how-to-transfer-us-stocks-from-groww-to-external-platform">Groww blog</a></li>
  <li><a href="https://x.com/thetrickytrade/status/1749719803777147123">X/Twitter: 🚛 How to transfer your US Holdings out of Groww?</a></li>
  <li><a href="https://www.reddit.com/r/INDmoneyApp/comments/1meyz4u/has_anyone_successfully_transferred_their/">Reddit Question 1</a></li>
  <li><a href="https://www.reddit.com/r/INDmoneyApp/comments/1o1wxdj/how_to_handle_us_stock_investments_after_becoming/">Reddit discussion: how to handle INDMoney US investments after becoming NRI</a></li>
</ul>

<p><br /></p>]]></content><author><name>Shivam Rana</name></author><category term="Investing" /><summary type="html"><![CDATA[I wanted to move my US stocks (securities) from INDMoney to Interactive Brokers. I didn’t find any clear steps documented anywhere. It took me multiple attempts to finally be able to move over my portfolio.]]></summary></entry><entry><title type="html">PyTorch Fundamentals - Week 5</title><link href="https://trigonaminima.github.io/2025/11/pytorch-fundamentals3/" rel="alternate" type="text/html" title="PyTorch Fundamentals - Week 5" /><published>2025-11-29T00:00:00+00:00</published><updated>2025-11-29T00:00:00+00:00</updated><id>https://trigonaminima.github.io/2025/11/pytorch-fundamentals3</id><content type="html" xml:base="https://trigonaminima.github.io/2025/11/pytorch-fundamentals3/"><![CDATA[<p>Brushing up on my PyTorch skills every week. Starting from scratch. Not in a hurry. The goal is to follow along <a href="https://github.com/Exorust/TorchLeet">TorchLeet</a> and go up to <a href="https://github.com/karpathy/nanoGPT">karpathy/nanoGPT</a> or <a href="https://github.com/karpathy/nanochat">karpathy/nanochat</a>. Previously,</p>

<ol>
  <li><a href="/2025/11/pytorch-fundamentals2/">PyTorch Fundamentals - Week 4</a></li>
  <li><a href="/2025/11/pytorch-fundamentals/">PyTorch Fundamentals - Week 1, 2, &amp; 3</a></li>
</ol>

<p>Now, summary of the week 4.</p>

<ul>
  <li>Linear Regression through a custom DNN with a non-linearity
    <ul>
      <li>A <code class="language-plaintext highlighter-rouge">nn.Module</code> subclass that used a nn.Sequential to have the dense linear layer.
        <div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="n">dense</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span>
      <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">input_dim</span><span class="p">,</span> <span class="n">dense_dims</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span>
      <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(),</span>
      <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="o">*</span><span class="n">dense_dims</span><span class="p">),</span>
      <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(),</span>
      <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">dense_dims</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">output_dim</span><span class="p">),</span>
  <span class="p">)</span>
</code></pre></div>        </div>
      </li>
    </ul>
  </li>
  <li>Learnt about <a href="https://docs.pytorch.org/docs/stable/generated/torch.nn.Sequential.html"><code class="language-plaintext highlighter-rouge">nn.Sequential</code></a> vs <code class="language-plaintext highlighter-rouge">nn.Module</code>.
    <ul>
      <li><code class="language-plaintext highlighter-rouge">nn.Sequential</code> is a subclass of <code class="language-plaintext highlighter-rouge">nn.Module</code>.</li>
      <li>It’s a convinient way to chain functions.</li>
      <li>Think: replacing <code class="language-plaintext highlighter-rouge">self.l1(self.l2(self.l3(x)))</code> with <code class="language-plaintext highlighter-rouge">self.layer123(x)</code> where <code class="language-plaintext highlighter-rouge">layer123</code> is a <code class="language-plaintext highlighter-rouge">nn.Sequential</code> of <code class="language-plaintext highlighter-rouge">l1</code>, <code class="language-plaintext highlighter-rouge">l2</code>, and <code class="language-plaintext highlighter-rouge">l1</code>.</li>
      <li>Also read: <a href="https://stackoverflow.com/q/68606661/2650427">What is difference between nn.Module and nn.Sequential</a></li>
    </ul>
  </li>
  <li>Tried three different dense layer hidden unit config with each of the below loss functions:
    <ol>
      <li><code class="language-plaintext highlighter-rouge">nn.MSELoss()</code></li>
      <li><code class="language-plaintext highlighter-rouge">HuberLoss()</code> - created <a href="/2025/11/pytorch-fundamentals2/">last week</a></li>
      <li><code class="language-plaintext highlighter-rouge">nn.L1Loss()</code></li>
    </ol>

    <p>Dense layer hidden unit variants: <code class="language-plaintext highlighter-rouge">(3, 7)</code>, <code class="language-plaintext highlighter-rouge">(4, 8)</code>, <code class="language-plaintext highlighter-rouge">(5, 9)</code>.</p>
  </li>
  <li>Huber Loss threw MSE Loss and L1 Loss out of the water. (Figure from TensorBoard.)
    <figure class="image">
  <img src="https://trigonaminima.github.io/assets/2025-11/dnn_loss.png" alt="" style="text-align: center; margin: auto" width="300" />
  <!-- <figcaption style="text-align: center">Figure 1:</figcaption> -->
  </figure>
  </li>
  <li>TensorBoard at work
    <ul>
      <li>I was running a remote training pipeline with tensorboard logging. This pipline dumps the logs on S3.</li>
      <li>The pipeline was failing with the following error: <code class="language-plaintext highlighter-rouge">File system scheme 's3' not implemented</code>.</li>
      <li>Solution:
        <ul>
          <li>After multiple loops of [add + commit + push + job run] found the <a href="https://stackoverflow.com/a/71628326/2650427">solution</a>: <code class="language-plaintext highlighter-rouge">pip install tensorflow-io</code>. This is not the end of the solution.</li>
          <li>The <a href="https://github.com/tensorflow/tensorboard/issues/5480#issuecomment-2251363802">very last comment</a> on the <a href="https://github.com/tensorflow/tensorboard/issues/5480">tensorboard github issue</a> gives the final trick: Add <code class="language-plaintext highlighter-rouge">import tensorflow_io</code> to the pipeline even if not using it anywhere.</li>
        </ul>
      </li>
    </ul>
  </li>
</ul>

<p><br /></p>]]></content><author><name>Shivam Rana</name></author><category term="DL" /><summary type="html"><![CDATA[Brushing up on my PyTorch skills every week. Starting from scratch. Not in a hurry. The goal is to follow along TorchLeet and go up to karpathy/nanoGPT or karpathy/nanochat. Previously,]]></summary></entry><entry><title type="html">PyTorch Fundamentals - Week 4</title><link href="https://trigonaminima.github.io/2025/11/pytorch-fundamentals2/" rel="alternate" type="text/html" title="PyTorch Fundamentals - Week 4" /><published>2025-11-22T00:00:00+00:00</published><updated>2025-11-22T00:00:00+00:00</updated><id>https://trigonaminima.github.io/2025/11/pytorch-fundamentals2</id><content type="html" xml:base="https://trigonaminima.github.io/2025/11/pytorch-fundamentals2/"><![CDATA[<p>Brushing up on my PyTorch skills every week. Starting from scratch. Not in a hurry. The goal is to follow along <a href="https://github.com/Exorust/TorchLeet">TorchLeet</a> and go up to <a href="https://github.com/karpathy/nanoGPT">karpathy/nanoGPT</a> or <a href="https://github.com/karpathy/nanochat">karpathy/nanochat</a>. Previously,</p>

<ol>
  <li><a href="/2025/11/pytorch-fundamentals/">PyTorch Fundamentals - Week 1, 2, &amp; 3</a></li>
</ol>

<p>Now, summary of the week 4.</p>

<ul>
  <li>Custom Loss function: Huber Loss
    <ul>
      <li>A <code class="language-plaintext highlighter-rouge">torch.nn.Module</code> class having with <code class="language-plaintext highlighter-rouge">forward()</code> method to compute the loss.</li>
      <li>
        <p>Huber Loss is defined as:</p>

\[L_{\delta}(y, \hat{y}) =
      \begin{cases}
      \frac{1}{2}(y - \hat{y})^2 &amp; \text{for } |y - \hat{y}| \leq \delta, \\
      \delta \cdot (|y - \hat{y}| - \frac{1}{2} \delta) &amp; \text{for } |y - \hat{y}| &gt; \delta,
      \end{cases}\]

        <p>where:</p>
        <ul>
          <li>\(y\) is the true value,</li>
          <li>\(\hat{y}\) is the predicted value,</li>
          <li>\(\delta\) is a threshold parameter that controls the transition between L1 and L2 loss.</li>
        </ul>
      </li>
      <li>More details about the Huber Loss</li>
    </ul>
  </li>
  <li>Some custom losses in Keras and PyTorch: <a href="https://www.kaggle.com/code/bigironsphere/loss-function-library-keras-pytorch/notebook">Loss Function Library - Keras &amp; PyTorch</a></li>
  <li>Used the Linear Regression model to test the custom loss.</li>
  <li>Error 1: <code class="language-plaintext highlighter-rouge">RuntimeError: grad can be implicitly created only for scalar outputs</code>
    <ul>
      <li>Reason: the <code class="language-plaintext highlighter-rouge">forward()</code> function was returning a tensor with lenght &gt; 1. Got the hint from <a href="https://discuss.pytorch.org/t/loss-backward-raises-error-grad-can-be-implicitly-created-only-for-scalar-outputs/12152">PyTorch forums</a>.</li>
      <li>Fix: returned <code class="language-plaintext highlighter-rouge">loss.mean()</code> instead of <code class="language-plaintext highlighter-rouge">loss</code>.</li>
    </ul>
  </li>
  <li>Error 2: All the losses were <code class="language-plaintext highlighter-rouge">nan</code>: this was a genuine bug in my code.</li>
  <li>Implementation approach 1: Use masks (my approach)
    <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">,</span> <span class="n">y_true</span><span class="p">):</span>
      <span class="n">error</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">y_true</span> <span class="o">-</span> <span class="n">y_pred</span><span class="p">)</span>

      <span class="n">flag1</span> <span class="o">=</span> <span class="n">error</span> <span class="o">&lt;=</span> <span class="bp">self</span><span class="p">.</span><span class="n">d</span>
      <span class="n">flag2</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">error</span>

      <span class="n">l2_loss</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">error</span><span class="o">**</span><span class="mi">2</span> <span class="o">*</span> <span class="n">flag1</span>
      <span class="n">l1_loss</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">d</span> <span class="o">*</span> <span class="p">(</span><span class="n">error</span> <span class="o">-</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">d</span><span class="p">)</span> <span class="o">*</span> <span class="n">flag2</span>
      <span class="n">loss</span> <span class="o">=</span> <span class="n">l2_loss</span> <span class="o">+</span> <span class="n">l1_loss</span>
      <span class="k">return</span> <span class="n">loss</span><span class="p">.</span><span class="n">mean</span><span class="p">()</span>
</code></pre></div>    </div>
  </li>
  <li>Implementation approach 2: Use <a href="https://docs.pytorch.org/docs/stable/generated/torch.where.html"><code class="language-plaintext highlighter-rouge">torch.where()</code></a> (solution provided in <a href="https://github.com/Exorust/TorchLeet/blob/main/torch/basic/custom-loss/custom-loss_SOLN.ipynb">TorchLeet</a>)
    <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">y_pred</span><span class="p">,</span> <span class="n">y_true</span><span class="p">):</span>
      <span class="n">error</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">y_true</span> <span class="o">-</span> <span class="n">y_pred</span><span class="p">)</span>

      <span class="n">condition</span> <span class="o">=</span> <span class="n">error</span> <span class="o">&lt;=</span> <span class="bp">self</span><span class="p">.</span><span class="n">d</span>
      <span class="n">loss</span> <span class="o">=</span> <span class="n">torch</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="n">condition</span><span class="p">,</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="n">error</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">d</span> <span class="o">*</span> <span class="p">(</span><span class="n">error</span> <span class="o">-</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">d</span><span class="p">))</span>
      <span class="k">return</span> <span class="n">loss</span><span class="p">.</span><span class="n">mean</span><span class="p">()</span>
</code></pre></div>    </div>
  </li>
  <li>Turns out, <code class="language-plaintext highlighter-rouge">torch.where()</code> is the most optimised way of doing this. It is vectorised and GPU-friendly. It is also a cleaner implementation of the same logic. Masking will require extra memory and extra operations (two multiplications, and one addition).</li>
  <li>Used tensorboard to visualise the training results.</li>
  <li>Read up more on <a href="https://docs.pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html#torch.optim.Optimizer.zero_grad"><code class="language-plaintext highlighter-rouge">optimizer.zero_grad()</code></a>.
    <ul>
      <li>PyTorch accumulates gradients by default. The <code class="language-plaintext highlighter-rouge">loss.backward()</code> will add to the previous gradients (can be accessed by <code class="language-plaintext highlighter-rouge">weight.grad</code>).</li>
      <li>If we don’t reset the gradients using <code class="language-plaintext highlighter-rouge">zero_grad()</code>, the new gradient will be a combination of the old and the newly-computed gradient. Since the old gradient was already used to update the model in the last iteration, the combined gradient will point in a different direction than the minimum (or maximum.) [<a href="https://stackoverflow.com/q/48001598/2650427">ref</a>]</li>
    </ul>
  </li>
  <li><strong>Q:</strong> When should we use <code class="language-plaintext highlighter-rouge">zero_grad()</code>? <strong>A:</strong> When we want gradient accumulation on purponse.</li>
  <li>
    <p><strong>Q:</strong> When do we want gradient accumulation on purpose? <strong>A:</strong> In the following scenarios:</p>

    <ol>
      <li>Large batch size with limited gpu memory. Split the batch into mini-batches. Accumulate gradients for all the mini-batches and then run <code class="language-plaintext highlighter-rouge">optimizer.step()</code>. Used during training on smaller GPUs.</li>
      <li>Multiple loss components before a single update. Useful for multi-task learning. Losses that require multiple passes.</li>
      <li>Parallel training. When model is split across devices -&gt; accumulate the gradients across the micro-batches and then update parameters once.</li>
      <li>Training with noisy gradients. Accumulate over multiple steps with noisy gradients to smooth the gradients before updating.</li>
    </ol>
  </li>
</ul>]]></content><author><name>Shivam Rana</name></author><category term="DL" /><summary type="html"><![CDATA[Brushing up on my PyTorch skills every week. Starting from scratch. Not in a hurry. The goal is to follow along TorchLeet and go up to karpathy/nanoGPT or karpathy/nanochat. Previously,]]></summary></entry><entry><title type="html">PyTorch Fundamentals - Week 1, 2, &amp;amp; 3</title><link href="https://trigonaminima.github.io/2025/11/pytorch-fundamentals/" rel="alternate" type="text/html" title="PyTorch Fundamentals - Week 1, 2, &amp;amp; 3" /><published>2025-11-10T00:00:00+00:00</published><updated>2025-11-10T00:00:00+00:00</updated><id>https://trigonaminima.github.io/2025/11/pytorch-fundamentals</id><content type="html" xml:base="https://trigonaminima.github.io/2025/11/pytorch-fundamentals/"><![CDATA[<p>Brushing up on my PyTorch skills every week. Starting from scratch. Not in a hurry. The goal is to follow along <a href="https://github.com/Exorust/TorchLeet">TorchLeet</a> and go up to <a href="https://github.com/karpathy/nanoGPT">karpathy/nanoGPT</a> or <a href="https://github.com/karpathy/nanochat">karpathy/nanochat</a>. Summary of the 1st three weeks.</p>

<h3 id="week-1">Week 1</h3>

<ul>
  <li>Create a Linear Regression model.
    <ul>
      <li><code class="language-plaintext highlighter-rouge">torch.nn.Linear</code> to define a learnable model.</li>
      <li><code class="language-plaintext highlighter-rouge">forward()</code> for forward pass.</li>
      <li><code class="language-plaintext highlighter-rouge">model.parameters</code> containing all the learned weights and also passed to the optimizer (<code class="language-plaintext highlighter-rouge">SGD</code>, <code class="language-plaintext highlighter-rouge">Adam</code>, etc.)</li>
      <li>Use <code class="language-plaintext highlighter-rouge">torch.no_grad()</code> during inferencing.</li>
    </ul>
  </li>
  <li>Log the training logs to TensorBoard. (not a part of the TorchLeet repo)
    <ul>
      <li><code class="language-plaintext highlighter-rouge">SummaryWriter</code> from <code class="language-plaintext highlighter-rouge">torch.utils.tensorboard</code>. Tensorflow can directly use a callback inside the fit function to push all the relevant logs. The <code class="language-plaintext highlighter-rouge">SummaryWriter</code> gives a fine-grained control to log anything.</li>
      <li><code class="language-plaintext highlighter-rouge">add_scalar()</code> to log the training loss.</li>
      <li><code class="language-plaintext highlighter-rouge">add_graph()</code> to log the graph itself.</li>
      <li>
        <p>Load the Jupyter tensorboard extension so that we don’t have to leave the notebook to look at the logs and and pretty plots.</p>

        <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  %load_ext tensorboard
</code></pre></div>        </div>
      </li>
      <li>
        <p>Load the tensorboard UI inside the notebook.</p>

        <div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  %tensorboard <span class="nt">--logdir</span> PATH_TO_LOG_DIR
</code></pre></div>        </div>
      </li>
    </ul>
  </li>
</ul>

<h3 id="week-2">Week 2</h3>

<ul>
  <li>Create a Dataset
    <ul>
      <li><code class="language-plaintext highlighter-rouge">Dataset</code> class from <code class="language-plaintext highlighter-rouge">torch.utils.data</code>.</li>
      <li>Create a subclass of <code class="language-plaintext highlighter-rouge">Dataset</code> for my specific dataset. Added <code class="language-plaintext highlighter-rouge">data</code>, <code class="language-plaintext highlighter-rouge">X</code> and <code class="language-plaintext highlighter-rouge">y</code> attributes to the class.</li>
      <li>Since we will iterate through the rows of this dataset, defined <code class="language-plaintext highlighter-rouge">__len__</code> and <code class="language-plaintext highlighter-rouge">__getitem__</code> functions. These overloaded functions enable code like <code class="language-plaintext highlighter-rouge">len(dataset)</code> and <code class="language-plaintext highlighter-rouge">dataset[i]</code>, respectively.</li>
    </ul>
  </li>
  <li>Dataloader
    <ul>
      <li>
        <p><code class="language-plaintext highlighter-rouge">Dataset</code> only defines the dataset. <code class="language-plaintext highlighter-rouge">Dataloader</code> from <code class="language-plaintext highlighter-rouge">torch.utils.data</code> creates an iterator. It also brings other capabilities like batching and shuffling. Eg:</p>

        <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="n">dataloader</span> <span class="o">=</span> <span class="n">DataLoader</span><span class="p">(</span><span class="n">dataset</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div>        </div>
      </li>
      <li>
        <p>We can run a <code class="language-plaintext highlighter-rouge">for</code> loop on this <code class="language-plaintext highlighter-rouge">dataloader</code> now.</p>
      </li>
    </ul>
  </li>
  <li>Good intro to the topic from PyTorch - <a href="https://docs.pytorch.org/tutorials/beginner/basics/data_tutorial.html">Datasets and DataLoaders</a>.</li>
  <li>Trained the Linear Regression model using the dataloader. Faced some issues due to <code class="language-plaintext highlighter-rouge">dtype</code> mismatches - used <code class="language-plaintext highlighter-rouge">torch.float32</code> everywhere to fix it.</li>
  <li>The exercise only asked for a single column dataset. Played around with a dataset with multiple columns.</li>
  <li>Use tensorboard for all the logging.</li>
</ul>

<h3 id="week-3">Week 3</h3>

<ul>
  <li>Two types of activation functions – with learnable parameters and without.</li>
  <li>Activation function with learnable parameters will require a <code class="language-plaintext highlighter-rouge">nn.Module</code> subclass. It is required to do the gradient calculations using <code class="language-plaintext highlighter-rouge">forward</code> and <code class="language-plaintext highlighter-rouge">backward</code> functions and get the final trained weights.</li>
  <li>Created a custom activation <em>without</em> learnable parameters: \(\text{tanh}(x) + x\).</li>
  <li>
    <p>Updated the Linear Regression class to have the final output go through \(\text{tanh}(x) + x\) using <code class="language-plaintext highlighter-rouge">torch.tanh()</code>.</p>

    <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">custom_activation</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">linear</span><span class="o">*</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
</code></pre></div>    </div>
  </li>
  <li>This <a href="https://stackoverflow.com/a/57013056/2650427">SO answer</a> talks about how to write a custom activation function in different scenarios: non-learnable, learnable, learnable with PyTorch functions, and  learnable without PyTorch functions.</li>
  <li>Also learned about <code class="language-plaintext highlighter-rouge">torch.nn.Parameter</code> and <code class="language-plaintext highlighter-rouge">torch.nn.Variable</code>.</li>
  <li>Kept using Tensorboard for all the logging.</li>
</ul>

<p><br /></p>

<p>Next 2 weeks: Custom Loss Function (Huber Loss) and Deep Neural Network</p>

<hr />]]></content><author><name>Shivam Rana</name></author><category term="DL" /><summary type="html"><![CDATA[Brushing up on my PyTorch skills every week. Starting from scratch. Not in a hurry. The goal is to follow along TorchLeet and go up to karpathy/nanoGPT or karpathy/nanochat. Summary of the 1st three weeks.]]></summary></entry><entry><title type="html">Life Logging: Calls</title><link href="https://trigonaminima.github.io/2025/09/life-logging-calls-tasker/" rel="alternate" type="text/html" title="Life Logging: Calls" /><published>2025-09-28T00:00:00+00:00</published><updated>2025-09-28T00:00:00+00:00</updated><id>https://trigonaminima.github.io/2025/09/life-logging-calls-tasker</id><content type="html" xml:base="https://trigonaminima.github.io/2025/09/life-logging-calls-tasker/"><![CDATA[<p>I am building a comprehensive set of tools to do life logging. General idea is:</p>

<ul>
  <li>Push everything to a sink; and</li>
  <li>Visualise the data in this sink.</li>
</ul>

<p>Objective is to do weekly reviews and take interventions if things are not BAU. Long term vision is to eventually have enough signals to give me a comprehensive understanding of myself (physical, mental, social, financial, etc).</p>

<p>This post is about logging calls using <a href="https://tasker.joaoapps.com/">Tasker for Android</a>.</p>

<h2 id="logging-phone-calls">Logging Phone Calls</h2>

<p>Steps:</p>

<ol>
  <li>Trigger on the event “Phone Idle” (whenever the phone goes to an idle state - incoming call, missed call, and outgoing call)</li>
  <li>Read the data provider <code class="language-plaintext highlighter-rouge">content://call_log/calls</code> to get the most recent call details</li>
  <li>Format the details for my use</li>
  <li>Push to an an endpoint that saves this data in a table.</li>
</ol>

<p>This is how the logs looks like:</p>

<blockquote>
  <p>#call(40) +type[miss] @num[Friend Number] @name[Friend nName] +mode[phone] +add[My Location]</p>
</blockquote>

<blockquote>
  <p>#call(40) +type[in] @num[Friend Number] @name[Friend nName] +mode[phone] +add[My Location]</p>
</blockquote>

<blockquote>
  <p>#call(84) +type[out] @num[Unsaved Caller’s Number] @name[<null>] +mode[phone] +add[My Location]</null></p>
</blockquote>

<details closed="">
<summary>Tasker profile to log phone calls</summary>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
</pre></td><td class="code"><pre>Profile: Log Phone Calls
        Event: Phone Idle

    Enter Task: Call Logs

    A1: Variable Set <span class="o">[</span>
        Name: %call_log_cols
        To: <span class="nb">date</span>, geocoded_location, countryiso, <span class="nb">type</span>, number, name, duration
        Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

    A2: SQL Query <span class="o">[</span>
        Mode: URI Formatted
        File: content://call_log/calls
        Columns: %call_log_cols
        Order By: <span class="nb">date </span>desc
        Output Column Divider: ,
        Variable Array: %call_logs

    A3: Multiple Variables Set <span class="o">[</span>
        Names: %qs_ts, %call_geocoded_location, %call_countryiso, %call_type, %call_number, %caller_name, %call_duration
        Variable Names Splitter: ,
        Values: %call_logs<span class="o">(</span>1<span class="o">)</span>
        Values Splitter: ,
        Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>
        Use Global Namespace: On <span class="o">]</span>

    A4: Variable Clear <span class="o">[</span>
        Name: %call_logs <span class="o">]</span>

    A5: If <span class="o">[</span> %qs_ts neq %LAST_CALLTS <span class="o">]</span>

        A15: If <span class="o">[</span> %call_type eq 1 <span class="o">]</span>

            A16: Variable Set <span class="o">[</span>
                Name: %call_type
                To: <span class="k">in
                </span>Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A17: Else
            If  <span class="o">[</span> %call_type eq 2 <span class="o">]</span>

            A18: Variable Set <span class="o">[</span>
                Name: %call_type
                To: out
                Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A19: Else
            If  <span class="o">[</span> %call_type eq 3 <span class="o">]</span>

            A20: Variable Set <span class="o">[</span>
                Name: %call_type
                To: miss
                Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A21: End If

        A22: Variable Set <span class="o">[</span>
            Name: %qs_note
            To: <span class="c">#call(%call_duration) +type[%call_type] @num[%call_number] @name[%caller_name] +mode[phone]</span>
            Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A23: Perform Task <span class="o">[</span>
            Name: Commons: POST Note &amp; Location
            Priority: %priority
            Local Variable Passthrough: On
            Limit Passthrough To: %qs_note, %qs_ts
            Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On
            Continue Task After Error:On <span class="o">]</span>

        A24: Variable Set <span class="o">[</span>
            Name: %LAST_CALLTS
            To: %qs_ts
            Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

    A25: End If
</pre></td></tr></tbody></table></code></pre></figure>

</details>

<!-- <br> -->

<h2 id="logging-whatsapp-calls">Logging WhatsApp Calls</h2>

<p>You can’t read whatsapp calls from some data provider like normal phone calls. WhatsApp also doesn’t support exporting the call logs. These calls are also not available in normal phone call logs. The only method was to read WhatsApp notification logs to get the details. Here are the steps:</p>

<ol>
  <li>Every time WhatsApp gives a notification, run the next set of steps.</li>
  <li>If the notification was of a (audio/video) call, then get the data out in the relevant variables.</li>
  <li>Push to an an endpoint that saves this data in a table.</li>
</ol>

<p>Caveats:</p>

<ol>
  <li>WhatsApp generates separate calls related notifications:
    <ul>
      <li>incoming audio/video call</li>
      <li>missed audio/call call (after an incoming call is missed)</li>
      <li>If multiple calls have piled up then a separate notifation of (2+ missed calls from …)</li>
      <li>An outgoing calls just says: “calling…” –&gt; so, no audio/video label.</li>
    </ul>
  </li>
  <li>Since this is just a call notification (incoming, outgoing), there is no call duration available</li>
</ol>

<p>This is how the logs looks like:</p>

<blockquote>
  <p>#call(-1) +type[miss] @num[null] @name[Friend Name] +mode[whatsapp-video] +add[My Location]</p>
</blockquote>

<blockquote>
  <p>#call(-1) +type[in] @num[null] @name[Friend Name] +mode[whatsapp-video] +add[My Location]</p>
</blockquote>

<blockquote>
  <p>#call(-1) +type[out] @num[null] @name[Friend Name] +mode[whatsapp-any] +add[My Location]</p>
</blockquote>

<details closed="">
<summary>Tasker profile to log WA calls</summary>

<figure class="highlight"><pre><code class="language-shell" data-lang="shell"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
</pre></td><td class="code"><pre>Profile: Log WhatsApp Calls
    	Event: Notification <span class="o">[</span> Owner Application:WhatsApp Title:<span class="k">*</span> Text:<span class="k">*</span> Subtext:<span class="k">*</span> Messages:<span class="k">*</span> Other Text:<span class="k">*</span> Cat:<span class="k">*</span> New Only:On <span class="o">]</span>

    Enter Task: WhatsApp Call Logs

    A3: If <span class="o">[</span> %evtprm7 eq call <span class="o">]</span>

        A4: Multiple Variables Set <span class="o">[</span>
             Names: %qs_ts, %call_geocoded_location, %call_countryiso, %call_type_str, %call_number, %caller_name, %call_duration, %call_type,%call_mode
             Variable Names Splitter: ,
             Values: %TIMEMS,,,%evtprm3,null,%evtprm2,-1,null,any
             Values Splitter: ,
             Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A5: If <span class="o">[</span> %call_type_str ~R .<span class="k">*</span>Calling.<span class="k">*</span> <span class="o">]</span>

            A6: Variable Set <span class="o">[</span>
                 Name: %call_type
                 To: out
                 Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A7: Else
            If  <span class="o">[</span> %call_type_str ~R .<span class="k">*</span>Incoming.<span class="k">*</span> <span class="o">]</span>

            A8: Variable Set <span class="o">[</span>
                 Name: %call_type
                 To: <span class="k">in
                 </span>Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A9: Else
            If  <span class="o">[</span> %call_type_str ~R .<span class="k">*</span>Missed.<span class="k">*</span> <span class="o">]</span>

            A10: Variable Set <span class="o">[</span>
                  Name: %call_type
                  To: miss
                  Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A11: End If

        A12: If <span class="o">[</span> %call_type_str ~R .<span class="k">*</span>voice.<span class="k">*</span> <span class="o">]</span>

            A13: Variable Set <span class="o">[</span>
                  Name: %call_mode
                  To: voice
                  Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A14: Else
            If  <span class="o">[</span> %call_type_str ~R .<span class="k">*</span>video.<span class="k">*</span> <span class="o">]</span>

            A15: Variable Set <span class="o">[</span>
                  Name: %call_mode
                  To: video
                  Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A16: End If

        A17: Variable Set <span class="o">[</span>
              Name: %qs_note
              To: <span class="c">#call(%call_duration) +type[%call_type] @num[%call_number] @name[%caller_name] +mode[whatsapp-%call_mode]</span>
              Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On <span class="o">]</span>

        A18: Flash <span class="o">[</span>
              Text: %qs_note
              Continue Task Immediately: On
              Dismiss On Click: On <span class="o">]</span>

        A19: Perform Task <span class="o">[</span>
              Name: Commons: POST Note &amp; Location
              Priority: %priority
              Local Variable Passthrough: On
              Limit Passthrough To: %qs_note, %qs_ts
              Structure Output <span class="o">(</span>JSON, etc<span class="o">)</span>: On
              Continue Task After Error:On <span class="o">]</span>

        A20: Write File <span class="o">[</span>
              File: Download/wa_calls.txt
              Text: %qs_ts, %call_geocoded_location, %call_countryiso, %call_type_str, %call_number, %caller_name, %call_duration, %call_type,%call_mode
             %qs_note

              Append: On
              Add Newline: On <span class="o">]</span>

    A21: End If
</pre></td></tr></tbody></table></code></pre></figure>

</details>

<p><br /></p>

<p>Bye.</p>]]></content><author><name>Shivam Rana</name></author><category term="Quantified-self" /><summary type="html"><![CDATA[I am building a comprehensive set of tools to do life logging. General idea is:]]></summary></entry><entry><title type="html">[Mini] Life Logging</title><link href="https://trigonaminima.github.io/2025/09/life-logging/" rel="alternate" type="text/html" title="[Mini] Life Logging" /><published>2025-09-27T00:00:00+00:00</published><updated>2025-09-27T00:00:00+00:00</updated><id>https://trigonaminima.github.io/2025/09/life-logging</id><content type="html" xml:base="https://trigonaminima.github.io/2025/09/life-logging/"><![CDATA[<p>I am building a comprehensive set of tools to do life logging. General idea is:</p>

<ul>
  <li>Push everything to a sink; and</li>
  <li>Visualise the data in this sink.</li>
</ul>

<p>Objective is to do weekly reviews and take interventions if things are not BAU. Long term vision is to eventually have enough signals to give me a comprehensive understanding of myself (physical, mental, social, financial, etc).</p>

<p>Current progress in reverse chronology:</p>

<ul>
  <li>2025: <a href="/2025/09/life-logging-calls-tasker/">Logging phone calls</a></li>
</ul>

<p>I have tried this multile times in various formats over the years. Here are my previous efforts in reverse chronology:</p>

<ul>
  <li>2023: <a href="/2023/06/google-fit-data/">Google Fit data sync and analysis</a> – my most successful attempt and still in-use. Pushed me to keep things simple.</li>
  <li>2021: <a href="/2021/08/flutter_app_3/">Futter app to do the logging 3</a> – couldn’t manage building this along with work.</li>
  <li>2021: <a href="/2021/08/flutter_app_2/">Futter app to do the logging 2</a></li>
  <li>2021: <a href="/2021/07/flutter_app_1/">Futter app to do the logging 1</a></li>
  <li>2018: <a href="/2018/04/chatting-up-2/">Analysis of my WA chats 2</a> – good analysis on my chatting habits, but nothing new. Led me to work on some solo research projects.</li>
  <li>2016: <a href="/2016/06/chatting-up/">Analysis of my WA chats</a></li>
  <li>2014: <a href="/2014/11/gamification-of-life/">Gamification of Life</a> – too much information to manage, eventually started feeling like a chore.</li>
</ul>

<p>Bye.</p>]]></content><author><name>Shivam Rana</name></author><category term="Quantified-self" /><summary type="html"><![CDATA[I am building a comprehensive set of tools to do life logging. General idea is:]]></summary></entry><entry><title type="html">Consolidated Recommendation Systems</title><link href="https://trigonaminima.github.io/2025/02/consolidated-recsys/" rel="alternate" type="text/html" title="Consolidated Recommendation Systems" /><published>2025-02-13T00:00:00+00:00</published><updated>2025-02-13T00:00:00+00:00</updated><id>https://trigonaminima.github.io/2025/02/consolidated-recsys</id><content type="html" xml:base="https://trigonaminima.github.io/2025/02/consolidated-recsys/"><![CDATA[<p>This post is a quick summary of <a href="https://netflixtechblog.medium.com/lessons-learnt-from-consolidating-ml-models-in-a-large-scale-recommendation-system-870c5ea5eb4a">Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System</a>. I have also added a few questions I got while reading it. I end the post with what we do at work to deal with this.</p>

<h2 id="summary">Summary</h2>

<ul>
  <li>Recommendation System: candidate gen + ranking.</li>
  <li>
    <p>A typical ranking model pipeline:</p>

    <ol>
      <li>Label prep</li>
      <li>Feature prep</li>
      <li>Model training</li>
      <li>Model evaluation</li>
      <li>Model deployment (with inference contract)</li>
    </ol>
  </li>
  <li>Each recommendation use case (e.g.: discover page, notifications, related items, category exploration, search) will have a version of the above pipeline.</li>
  <li>
    <p>As use cases increase, the team will need to maintain multiple such pipelines. It is time-consuming to maintain multiple pipelines and increases points of failure.</p>

    <figure class="image">
  <img src="https://trigonaminima.github.io/assets/2025-02/consolidated_recsys_neflix_1.webp" alt="" style="text-align: center; margin: auto" />
  <figcaption style="text-align: center">Figure 1: Figure from the Netflix blog linked at the start.</figcaption>
  </figure>
  </li>
  <li>Since the pipelines have the same component, we can consolidate them.</li>
  <li>
    <p>Consolidated pipeline:</p>

    <ol>
      <li>Label prep for each use case separately</li>
      <li>Stratified union of all the prepared labels</li>
      <li>Feature prep (separate categorical feature representing the use case)</li>
      <li>Model training</li>
      <li>Model evaluation</li>
      <li>Model deployment (with inference contract)</li>
    </ol>

    <figure class="image">
  <img src="https://trigonaminima.github.io/assets/2025-02/consolidated_recsys_neflix_2.webp" alt="" style="text-align: center; margin: auto" width="100" />
  <figcaption style="text-align: center">Figure 2: Figure from the Netflix blog linked at the start.</figcaption>
  </figure>
  </li>
  <li>
    <p>Label prep for each use case separately</p>

    <ol>
      <li>Each use case will have different ways of generating the labels.</li>
      <li>Use case context details are added as separate features.
        <ul>
          <li>Search context: search query, region</li>
          <li>Similar items context: source item</li>
        </ul>
      </li>
      <li>When the use case is search, context features specific to the similar item use case will be filled with default values.</li>
    </ol>
  </li>
  <li>
    <p>Union of all the prepared labels</p>

    <ol>
      <li>Final labelled set: a% samples from use case-1 labels + b% samples from use case-2 labels + … + z% samples from use case-n labels</li>
      <li>The proportions [a, b, …, z] come from stratification</li>
      <li>Q: How is this stratification done? Platform traffic across different use cases?</li>
      <li>Q: What are the results when these proportions are business-driven? Eg: contribution to revenue.</li>
    </ol>
  </li>
  <li>
    <p>Feature prep</p>

    <ol>
      <li>All use case specific features added to the data.</li>
      <li>If a feature is only used for use case 1 then it will contain default value for all the other use cases.</li>
      <li>Add a new categorical feature task_type to the features to inform the model about the target reco task.</li>
    </ol>
  </li>
  <li>Model training happens as usual: feature vector and labels. Architecture remains the same. Optimisation remains the same.</li>
  <li>
    <p>Model evaluation</p>

    <ol>
      <li>Check the appropriate eval metrics to check the model.</li>
      <li>Q: How do we judge if the model performed well for all the use cases?</li>
      <li>Q: Will it require a separate evaluation set for each use case?</li>
      <li>Q: Can there be a 2nd order Simpson’s paradox here: the consolidated model performs well, but when tried for individual use cases, its performance is low? My hunch: no.</li>
    </ol>
  </li>
  <li>
    <p>Model deployment (with inference contract)</p>

    <ol>
      <li>Deploy the same model in the respective environment made for each use case. That env will have all the specific network-related knobs: batch size, throughput, latency, caching policy, parallelism, etc.</li>
      <li>Generic API contract to support the heterogenous context (search query for search, source item for related items use case.)</li>
    </ol>
  </li>
  <li>
    <p>Caveats</p>

    <ol>
      <li>The consolidated use cases should be related (eg: ranking for movies in the search and discover page)</li>
      <li>One definition of related can be: ranking the same entities.</li>
    </ol>
  </li>
  <li>
    <p>Advantages</p>

    <ol>
      <li>Reduces maintenance costs (less code; fewer deployments)</li>
      <li>Quick model iterations to all the use cases
        <ul>
          <li>Updates (new features, architecture, etc) for one use case can be applied to other use cases.</li>
          <li>If consolidated tasks are related, then new features don’t cause regression in practice.</li>
        </ul>
      </li>
      <li>Can be extended to any related use case from offline and online POV.</li>
      <li>Cross-learning: the model potentially gains more (hidden) learning from the other tasks. Eg: having search data gives more data to the model learning for related-items task.
        <ul>
          <li>Q: Is this happening? How can we verify this? One way: Train an independent model on the use-case specific data and compare its performance with the consolidated model’s performance on the same task.</li>
        </ul>
      </li>
    </ol>
  </li>
  <li>I was confused about what to call this learning paradigm. <a href="https://en.wikipedia.org/wiki/Multi-task_learning">Wikipedia</a> says that it is multi-task learning.</li>
</ul>

<h2 id="practice-at-my-work">Practice at my work</h2>

<ul>
  <li>The models are not merged across different tasks like relevance and search.</li>
  <li>Within relevance ranking tasks (discover, similar items, category exploration), have a common base ranker model.</li>
  <li>On top of that, we have different heuristics to make it better for that particular section.</li>
  <li>Advantages:
    <ul>
      <li>There is only one main model for all related tasks.</li>
      <li>Keeps the heuristics logic simple and, thus, easy to maintain.</li>
    </ul>
  </li>
  <li>Challenges
    <ul>
      <li>Heuristics are crude/manual/semi-automated → we may be leaving some gains on the table. There are bandit-based approaches to automating it, though.</li>
      <li>It loses out on cross-learning opportunities.</li>
    </ul>
  </li>
</ul>]]></content><author><name>Shivam Rana</name></author><category term="RecSys" /><summary type="html"><![CDATA[This post is a quick summary of Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System. I have also added a few questions I got while reading it. I end the post with what we do at work to deal with this.]]></summary></entry><entry><title type="html">Document Your Progress at Work</title><link href="https://trigonaminima.github.io/2025/01/document-your-progress/" rel="alternate" type="text/html" title="Document Your Progress at Work" /><published>2025-01-13T00:00:00+00:00</published><updated>2025-01-13T00:00:00+00:00</updated><id>https://trigonaminima.github.io/2025/01/document-your-progress</id><content type="html" xml:base="https://trigonaminima.github.io/2025/01/document-your-progress/"><![CDATA[<p><strong>How can you ensure that your contributions are also recognized?</strong></p>

<p>A common challenge, especially in larger organizations, is that your manager may not always be fully aware of the specifics of your work, and your manager’s manager likely has even less visibility. It isn’t due to a lack of interest but rather the sheer volume of responsibilities and information they handle. Additionally, even for you, it’s hard to remember all the details beyond the highlights. I find a proactive strategy essential for such scenarios: sending <strong>regular progress digests.</strong></p>

<p>These digests are concise, structured email updates that you send periodically to both direct manager and their manager. The aim is to offer a clear snapshot of your activities, their impact, and your forthcoming plans. See it as a method to keep your supervisors well-informed, especially when you lack regular direct interactions.</p>

<p>That’s it. That is the idea. You can be creative and apply it however you want. However you decide to do it, you will see gains.</p>

<p>In the next section, I list the <strong>key points</strong> I usually consider in my snapshots.</p>

<h2 id="key-elements-of-an-effective-progress-digest">Key Elements of an Effective Progress Digest</h2>

<p>To ensure your digests are both informative and impactful, here’s what you can include:</p>

<ul>
  <li><strong>Specific Task Details</strong>: Provide project specifics and relevant links to the completed/picked coding tasks. It entails a 1-sentence project description, PR links, JIRA tickets and other code artefacts.</li>
  <li><strong>Data Science Related</strong>: If applicable, detail the models you’ve trained and deployed. Any A/B experiments launched and test results of the ones that concluded. Also, share the project solutioning doc here.</li>
  <li><strong>Documentation Efforts</strong>: Highlight any documentation you’ve created or maintained. You can also merge this with other points.</li>
  <li><strong>Impact and Results</strong>: Clearly articulate the outcomes of your tasks and their value to the team and company.</li>
  <li><strong>Initiatives and Discussions</strong>: Share any new ideas you’ve put forward or discussions you’ve initiated.</li>
  <li><strong>Future Plans</strong>: Outline your planned next steps.</li>
</ul>

<h2 id="benefits">Benefits</h2>

<p>The effort invested in creating these digests yields substantial career benefits:</p>

<ul>
  <li><strong>Enhances Diligence</strong>: Summarizing your work makes you more conscious of your efforts.</li>
  <li><strong>Boosts Positive Perception</strong>: You are perceived as a proactive and accomplished individual.</li>
  <li><strong>Creates a Performance Record</strong>: These digests serve as valuable documentation of your work, valuable during performance reviews.</li>
  <li><strong>Ensures Visibility</strong>: Even if managers don’t respond directly to each email, they will read them, which ensures they are aware of your work and its progress.</li>
  <li><strong>Effective at Any Stage:</strong> While this practice is advantageous when starting a new job (or joining a new team), I have found it beneficial at any stage.</li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>Actively managing your visibility is key to long-term career growth. Sending out regular progress digests ensures that your work is recognized. You also establish a record of your accomplishments and demonstrate your value. This practice requires regular work but has good returns.</p>

<p>PS: I learned this trick on a tech podcast many years ago. If anyone knows which podcast or episode, please share it with me, and I will link it here.</p>

<p><strong>Update: 14th Jan</strong></p>

<p>PS: A related idea of <a href="https://jvns.ca/blog/brag-documents/">brag documents</a> explained beautifully by <a href="https://jvns.ca/">Julia Evans</a>. Shared on this <a href="https://news.ycombinator.com/item?id=42695837">HN comment</a>.</p>]]></content><author><name>Shivam Rana</name></author><summary type="html"><![CDATA[How can you ensure that your contributions are also recognized?]]></summary></entry><entry><title type="html">Confidence Intervals and Coverage</title><link href="https://trigonaminima.github.io/2024/09/confidence-intervals-and-coverage/" rel="alternate" type="text/html" title="Confidence Intervals and Coverage" /><published>2024-09-15T00:00:00+00:00</published><updated>2024-09-15T00:00:00+00:00</updated><id>https://trigonaminima.github.io/2024/09/confidence-intervals-and-coverage</id><content type="html" xml:base="https://trigonaminima.github.io/2024/09/confidence-intervals-and-coverage/"><![CDATA[<h2 id="confidence-interval-ci">Confidence Interval (CI)</h2>

<ul>
  <li>CI is an interval.</li>
  <li>An interval which is exepected to contain the parameter being estimated (eg: population mean.)</li>
  <li>Typical confidence levels are 95% and 99%.</li>
  <li>The confidence level of a confidence interval is called Nominal coverage (probability.)</li>
  <li>CI with 95% confidence: random interval which contains the parameter to be estimated 95% of the time.</li>
  <li>Two ways to mention a confidence level of 95%:
    <ul>
      <li>Confidence interval with \(\gamma = 0.95\); 95% confidence</li>
      <li>Confidence interval with \(\alpha = 0.05\); 95% confidence: \(1-\alpha = 0.95\)</li>
    </ul>
  </li>
  <li>
    <p>Mathematical representation</p>

\[P(u(X)&lt;\theta &lt;v(X))=\gamma\]

    <ul>
      <li>\(\theta\) is the parameter to be estimated (eg: population mean or median).</li>
      <li>\(X\) is a random variable from a probability distribution with parameter \(\theta\)</li>
      <li>\(u(X)\) and \(v(X)\) are random variables containing parameter \(\theta\) with probability \(\gamma\)</li>
      <li>Confidence level \(\gamma\) &lt; 1 (but close to 1). eg: 0.95</li>
    </ul>
  </li>
  <li>
    <p>Mathematical representation in case of normal distribution:</p>

\[\text{CI} = \bar{x} \pm z^* \left(\frac{\sigma}{\sqrt{n}}\right)\]

    <p>Where:</p>
    <ul>
      <li>\(\bar{x}\) is the sample mean.</li>
      <li>\(z^*\) is the critical value corresponding to the desired confidence level</li>
      <li>\(\sigma\) is the population standard deviation.</li>
      <li>\(n\) is the sample size.</li>
      <li>The quantity \(\displaystyle {\sigma }_{\bar {x}}={\frac {\sigma }{\sqrt {n}}}\) is also called the <a href="https://en.wikipedia.org/wiki/Standard_error">standard error of the mean</a>.</li>
      <li>The 95% confidence level will correspond to the 97.5th percentile of the distribution. Reason: the probability of \(\theta\) lying outside the 95% confidence level is 5%. So, 2.5% probability on both sides (if symmetric). So the range becomes 2.5% to (95+2.5)%.</li>
    </ul>
  </li>
  <li>We can calculate the critical value \(z^*\) as follows:
    <ul>
      <li>If the sample size is small (&lt; 30) or we do not know the std dev, then we use the t-statistic (Student’s t-distribution.) The t-distribution is wider and has heavier tails than the normal distribution, reflecting the increased uncertainty in small samples. Thus, it accounts for the extra variability.</li>
      <li>If the sample size is large enough to make CLT valid, we use normal distribution (Z-distribution) –&gt; z-score. Eg: a z-score of 1.96 for a 95% confidence level.</li>
      <li>As the sample size increases, both methods converge.</li>
      <li>It is better to use the t-statistic.</li>
    </ul>
  </li>
  <li>Ref:
    <ul>
      <li><a href="https://en.wikipedia.org/wiki/Confidence_interval">Confidence interval</a> wiki</li>
      <li><a href="https://en.wikipedia.org/wiki/Confidence_interval#Interpretation">Interpretation</a></li>
      <li><a href="https://en.wikipedia.org/wiki/Confidence_interval#Common_misunderstandings">Common misunderstandings</a></li>
    </ul>
  </li>
</ul>

<h2 id="ci-width">CI Width</h2>

<ul>
  <li>The narrower the width, the higher the confidence.</li>
  <li>Factors that impact the width of CI are sample size, variance/standard deviation, and confidence level.
    <ul>
      <li>Sample size high –&gt; narrow CI</li>
      <li>High variance/standard dev –&gt; wider CI</li>
      <li>Higher confidence level –&gt; wider CI (more data will lie under a higher confidence level)</li>
    </ul>
  </li>
</ul>

<h2 id="coverage">Coverage</h2>

<ul>
  <li>Coverage (probability): the probability that a confidence interval will include the true value (eg: population mean.)</li>
  <li>The proportion of CIs (at a particular confidence level) that contain the true value (eg: population mean.)</li>
  <li>95% CI coverage: For example, if you calculate a 95% confidence interval for a population mean, you are saying that if you were to take many samples and calculate a confidence interval from each one, approximately 95% of those intervals would contain the true population mean.</li>
  <li>Probability matching: if coverage probability is the same as nominal coverage probability.
<img src="https://trigonaminima.github.io/assets/2024-09/coverage_probability.png" alt="" width="500" style="text-align: center; margin: auto" />
    <ul>
      <li>Nominal coverage = 50%</li>
      <li>Coverage = 10/20 = 50% (blue CIs contain the true mean)</li>
      <li>Probability matching since coverage is the same as nominal coverage.</li>
      <li><a href="https://en.wikipedia.org/wiki/File:Normal_distribution_50%25_CI_illustration.svg">Image ref</a></li>
    </ul>
  </li>
  <li>Ref:
    <ul>
      <li><a href="https://en.wikipedia.org/wiki/Coverage_probability">Coverage probability</a> wiki</li>
      <li><a href="https://en.wikipedia.org/wiki/Neyman_construction">Confidence interval construction</a></li>
    </ul>
  </li>
</ul>

<h2 id="implementation-and-explorations">Implementation and Explorations</h2>

<p>Now, we will go through the above concepts in code.</p>

<details close="">
<summary>Common imports</summary>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">st</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
</pre></td></tr></tbody></table></code></pre></figure>

</details>
<p><br /></p>

<h3 id="compute-ci">Compute CI</h3>

<p>We implement the t-distribution and standard normal distribution to calculate the critical value.</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
</pre></td><td class="code"><pre><span class="k">def</span> <span class="nf">confidence_interval_t</span><span class="p">(</span><span class="n">sample</span><span class="p">,</span> <span class="n">confidence</span><span class="o">=</span><span class="mf">0.95</span><span class="p">):</span>
    <span class="s">"""
    Calculate the confidence interval for the mean of a sample using the t-distribution.

    This function is appropriate when the population standard deviation is unknown and
    the sample size is small (n &lt; 30), although it works for any sample size.

    Parameters:
    sample (numpy.ndarray): The sample data as a NumPy array.
    confidence (float): The desired confidence level (default
    is 0.95 for a 95% confidence interval).

    Returns:
    tuple: Lower and upper bounds of the confidence interval.
    """</span>
    <span class="c1"># Ensure the sample is a NumPy array
</span>    <span class="n">sample</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>

    <span class="n">sample_mean</span> <span class="o">=</span> <span class="n">sample</span><span class="p">.</span><span class="n">mean</span><span class="p">()</span>
    <span class="c1"># Use Bessel's correction (ddof=1) for sample standard deviation
</span>    <span class="n">sample_std</span> <span class="o">=</span> <span class="n">sample</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">ddof</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
    <span class="n">sample_size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
    <span class="n">standard_error</span> <span class="o">=</span> <span class="n">sample_std</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">sample_size</span><span class="p">)</span>

    <span class="c1"># Determine the critical value for the specified confidence level
</span>    <span class="n">critical_value</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="n">ppf</span><span class="p">((</span><span class="mi">1</span> <span class="o">+</span> <span class="n">confidence</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span><span class="p">,</span> <span class="n">df</span><span class="o">=</span><span class="n">sample_size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
    <span class="n">margin_of_error</span> <span class="o">=</span> <span class="n">critical_value</span> <span class="o">*</span> <span class="n">standard_error</span>

    <span class="n">lower_bound</span> <span class="o">=</span> <span class="n">sample_mean</span> <span class="o">-</span> <span class="n">margin_of_error</span>
    <span class="n">upper_bound</span> <span class="o">=</span> <span class="n">sample_mean</span> <span class="o">+</span> <span class="n">margin_of_error</span>

    <span class="k">return</span> <span class="n">lower_bound</span><span class="p">,</span> <span class="n">upper_bound</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>We use <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.t.html"><code class="language-plaintext highlighter-rouge">stats.t.ppf</code></a> to get the critical value using the t-distribution. We can replace that with <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html"><code class="language-plaintext highlighter-rouge">stats.norm.ppf</code></a> for the z-score.</p>

<details close="">
<summary>Confidence interval using standard normal distribution</summary>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
</pre></td><td class="code"><pre><span class="k">def</span> <span class="nf">confidence_interval_norm</span><span class="p">(</span><span class="n">sample</span><span class="p">,</span> <span class="n">confidence</span><span class="o">=</span><span class="mf">0.95</span><span class="p">):</span>
    <span class="s">"""
    Calculate the confidence interval for the mean of a sample using the normal
    distribution (Z-distribution).

    This function is appropriate when the population standard deviation is
    known, or when the sample size is large (n &gt;= 30), allowing the
    Central Limit Theorem to approximate the sample mean's distribution as normal.

    Parameters:
    sample (numpy.ndarray): The sample data as a NumPy array.
    confidence (float): The desired confidence level (default
    is 0.95 for a 95% confidence interval).

    Returns:
    tuple: Lower and upper bounds of the confidence interval.
    """</span>
    <span class="c1"># Ensure the sample is a NumPy array
</span>    <span class="n">sample</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>

    <span class="n">sample_mean</span> <span class="o">=</span> <span class="n">sample</span><span class="p">.</span><span class="n">mean</span><span class="p">()</span>
    <span class="c1"># Use Bessel's correction (ddof=1) for sample standard deviation
</span>    <span class="n">sample_std</span> <span class="o">=</span> <span class="n">sample</span><span class="p">.</span><span class="n">std</span><span class="p">(</span><span class="n">ddof</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
    <span class="n">sample_size</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
    <span class="n">standard_error</span> <span class="o">=</span> <span class="n">sample_std</span> <span class="o">/</span> <span class="n">np</span><span class="p">.</span><span class="n">sqrt</span><span class="p">(</span><span class="n">sample_size</span><span class="p">)</span>

    <span class="c1"># Determine the critical value for the specified confidence level
</span>    <span class="n">critical_value</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">ppf</span><span class="p">((</span><span class="mi">1</span> <span class="o">+</span> <span class="n">confidence</span><span class="p">)</span> <span class="o">/</span> <span class="mi">2</span><span class="p">)</span>
    <span class="n">margin_of_error</span> <span class="o">=</span> <span class="n">critical_value</span> <span class="o">*</span> <span class="n">standard_error</span>

    <span class="n">lower_bound</span> <span class="o">=</span> <span class="n">sample_mean</span> <span class="o">-</span> <span class="n">margin_of_error</span>
    <span class="n">upper_bound</span> <span class="o">=</span> <span class="n">sample_mean</span> <span class="o">+</span> <span class="n">margin_of_error</span>

    <span class="k">return</span> <span class="n">lower_bound</span><span class="p">,</span> <span class="n">upper_bound</span>
</pre></td></tr></tbody></table></code></pre></figure>

</details>
<p><br />
Let’s compare the results with scipy implementations.</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code"><pre><span class="n">sample</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">random_sample</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>

<span class="k">print</span><span class="p">(</span><span class="s">"defined functions:"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"tstat:</span><span class="se">\t</span><span class="s">"</span><span class="p">,</span> <span class="n">confidence_interval_t</span><span class="p">(</span><span class="n">sample</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"norm:</span><span class="se">\t</span><span class="s">"</span><span class="p">,</span> <span class="n">confidence_interval_norm</span><span class="p">(</span><span class="n">sample</span><span class="p">))</span>

<span class="k">print</span><span class="p">(</span><span class="s">"scipy functions:"</span><span class="p">)</span>
<span class="n">interval</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="n">interval</span><span class="p">(</span>
    <span class="n">confidence</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">df</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">),</span> <span class="n">scale</span><span class="o">=</span><span class="n">st</span><span class="p">.</span><span class="n">sem</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
<span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"tstat:</span><span class="se">\t</span><span class="s">"</span><span class="p">,</span> <span class="n">interval</span><span class="p">)</span>

<span class="n">interval</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">interval</span><span class="p">(</span><span class="n">confidence</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">),</span> <span class="n">scale</span><span class="o">=</span><span class="n">st</span><span class="p">.</span><span class="n">sem</span><span class="p">(</span><span class="n">sample</span><span class="p">))</span>
<span class="k">print</span><span class="p">(</span><span class="s">"norm:</span><span class="se">\t</span><span class="s">"</span><span class="p">,</span> <span class="n">interval</span><span class="p">)</span>

<span class="c1"># defined functions:
# tstat: (0.2756144976802315, 0.7458592632198344)
# norm:	 (0.3070236240157737, 0.7144501368842922)
# scipy functions:
# tstat: (0.2756144976802315, 0.7458592632198344)
# norm:	 (0.3070236240157737, 0.7144501368842922)</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>It is the same. In the following sections, we will use the scipy functions.</p>

<h3 id="ci-t-distribution-vs-ci-z-distribution">CI T-distribution vs. CI Z Distribution</h3>

<p>We will verify if the confidence interval converges as the sample size increases in both methods. We will try both on the samples generated using the following sampling methods:</p>

<ul>
  <li>Uniform</li>
  <li>Standard normal</li>
  <li>Poisson</li>
</ul>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
</pre></td><td class="code"><pre><span class="n">sample_sizes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">sample_means</span> <span class="o">=</span> <span class="p">[]</span>

<span class="n">t_interval_95_l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">t_interval_95_r</span> <span class="o">=</span> <span class="p">[]</span>

<span class="n">norm_interval_95_l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">norm_interval_95_r</span> <span class="o">=</span> <span class="p">[]</span>

<span class="k">for</span> <span class="n">sample_size</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">5</span><span class="p">):</span>
    <span class="n">sample</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">random_sample</span><span class="p">(</span><span class="n">sample_size</span><span class="p">)</span>
    <span class="n">t_interval_95</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="n">interval</span><span class="p">(</span>
        <span class="n">confidence</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">df</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">),</span> <span class="n">scale</span><span class="o">=</span><span class="n">st</span><span class="p">.</span><span class="n">sem</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
    <span class="p">)</span>
    <span class="n">norm_interval_95</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">norm</span><span class="p">.</span><span class="n">interval</span><span class="p">(</span>
        <span class="n">confidence</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">),</span> <span class="n">scale</span><span class="o">=</span><span class="n">st</span><span class="p">.</span><span class="n">sem</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
    <span class="p">)</span>
    <span class="n">sample_sizes</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">sample_size</span><span class="p">)</span>
    <span class="n">sample_means</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">))</span>
    <span class="n">t_interval_95_l</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">t_interval_95</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="n">t_interval_95_r</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">t_interval_95</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
    <span class="n">norm_interval_95_l</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">norm_interval_95</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="n">norm_interval_95_r</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">norm_interval_95</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
    <span class="c1"># print(sample_size, t_interval_95, norm_interval_95)
</span>
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">fill_between</span><span class="p">(</span>
    <span class="n">sample_sizes</span><span class="p">,</span> <span class="n">t_interval_95_l</span><span class="p">,</span> <span class="n">t_interval_95_r</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"b"</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.4</span>
<span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">fill_between</span><span class="p">(</span>
    <span class="n">sample_sizes</span><span class="p">,</span> <span class="n">norm_interval_95_l</span><span class="p">,</span> <span class="n">norm_interval_95_r</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"r"</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.4</span>
<span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"Sample size"</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"CI upper and lower bounds"</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span><span class="s">"Uniform Distribution"</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>Replace <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.random_sample.html"><code class="language-plaintext highlighter-rouge">np.random.random_sample</code></a> with <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.standard_normal.html"><code class="language-plaintext highlighter-rouge">np.random.standard_normal</code></a> and <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.poisson.html"><code class="language-plaintext highlighter-rouge">np.random.poisson</code></a> to get standard normal and the Poisson random samples.</p>

<p>Here are the results:</p>

<figure class="third t_vs_std_norm gallery-popup">
  
  
  <a href="/assets/2024-09/t_norm_ci_uniform_sample.png" title="Blue part is t-distribution. Red part is standard normal distribution. For each sample drawn from a uniform distribution, the CI bounds are plotted. Blue is wider than red and then they merge as sample size gets large enough.">
    <img src="/assets/2024-09/t_norm_ci_uniform_sample.png" alt="" />
  </a>
  
  
  <a href="/assets/2024-09/t_norm_ci_normal_sample.png" title="Blue part is t-distribution. Red part is standard normal distribution. For each sample drawn from a normal distribution, the CI bounds are plotted. Blue is wider than red and then they merge as sample size gets large enough.">
    <img src="/assets/2024-09/t_norm_ci_normal_sample.png" alt="" />
  </a>
  
  
  <a href="/assets/2024-09/t_norm_ci_poisson_sample.png" title="Blue part is t-distribution. Red part is standard normal distribution. For each sample drawn from a Poisson distribution, the CI bounds are plotted. Blue is wider than red and then they merge as sample size gets large enough.">
    <img src="/assets/2024-09/t_norm_ci_poisson_sample.png" alt="" />
  </a>
  
  
  <figcaption></figcaption>
  
</figure>

<p>In each figure (click to zoom), the blue part corresponds to t-distribution-based CI, and the red part is to standard normal based CI. We can observe that:</p>

<ol>
  <li>t-distribution based CIs are wider than standard normal based CI.</li>
  <li>As the sample size increases, both converge.</li>
</ol>

<h3 id="ci-width-simulations">CI Width Simulations</h3>

<p>Let’s visualise how the CI width changes with different factors: confidence level, sample size, and standard deviation or variance.</p>

<h4 id="confidence-level---gamma">Confidence Level - \(\gamma\)</h4>

<p>The width of CI increases as the CI level increases. Intuition: as the confidence level increases, we widen the range to get the upper and lower limits of the confidence interval.</p>

<details collapse="">
<summary>Code</summary>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre><span class="n">sample</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">standard_normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
<span class="n">sample_mean</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
<span class="n">sample_sem</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">sem</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>

<span class="n">cis</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">t_interval_l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">t_interval_r</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">ci</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">5</span><span class="p">):</span>
    <span class="n">ci</span> <span class="o">=</span> <span class="n">ci</span> <span class="o">*</span> <span class="mf">0.01</span>
    <span class="n">t_interval</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="n">interval</span><span class="p">(</span>
        <span class="n">confidence</span><span class="o">=</span><span class="n">ci</span><span class="p">,</span> <span class="n">df</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="n">sample_mean</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">sample_sem</span>
    <span class="p">)</span>
    <span class="n">cis</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">ci</span><span class="p">)</span>
    <span class="n">t_interval_l</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">t_interval</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="n">t_interval_r</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">t_interval</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">fill_between</span><span class="p">(</span><span class="n">cis</span><span class="p">,</span> <span class="n">t_interval_l</span><span class="p">,</span> <span class="n">t_interval_r</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"g"</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.4</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"CI level"</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"CI upper and lower bounds"</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

</details>

<figure class=" ci_width_ci gallery-popup">
  
  
  <a href="/assets/2024-09/ci_level_with_ci.png" title="As confidence level increases from 0 to 100%, confidence interval widens. When confidence level is 0, there will not be any CI. When confodence level is 100%, CI will contain all the data.">
    <img src="/assets/2024-09/ci_level_with_ci.png" alt="" />
  </a>
  
  
  <figcaption></figcaption>
  
</figure>

<h4 id="sample-size">Sample Size</h4>

<p>The width of CI reduces with as the sample size increases. Intuition: as sample size increases, we get more confident in our normal distribution parameter estimation, and thus, the confidence interval width reduces.</p>

<details collapse="">
<summary>Code</summary>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre><span class="n">sample_sizes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">sample_means</span> <span class="o">=</span> <span class="p">[]</span>

<span class="n">t_interval_95_l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">t_interval_95_r</span> <span class="o">=</span> <span class="p">[]</span>

<span class="n">norm_interval_95_l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">norm_interval_95_r</span> <span class="o">=</span> <span class="p">[]</span>

<span class="k">for</span> <span class="n">sample_size</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">10000</span><span class="p">,</span> <span class="mi">10</span><span class="p">):</span>
    <span class="n">sample</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">standard_normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="n">sample_size</span><span class="p">)</span>
    <span class="n">t_interval_95</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="n">interval</span><span class="p">(</span>
        <span class="n">confidence</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">df</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">),</span> <span class="n">scale</span><span class="o">=</span><span class="n">st</span><span class="p">.</span><span class="n">sem</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
    <span class="p">)</span>
    <span class="n">sample_sizes</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">sample_size</span><span class="p">)</span>
    <span class="n">sample_means</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">))</span>
    <span class="n">t_interval_95_l</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">t_interval_95</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="n">t_interval_95_r</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">t_interval_95</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">fill_between</span><span class="p">(</span>
    <span class="n">sample_sizes</span><span class="p">,</span> <span class="n">t_interval_95_l</span><span class="p">,</span> <span class="n">t_interval_95_r</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"r"</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.4</span>
<span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"Sample size"</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"CI upper and lower bounds"</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

</details>

<figure class=" ci_width_sample_size gallery-popup">
  
  
  <a href="/assets/2024-09/ci_vs_sample_size.png" title="When sample size increases, the confidence interval becomes narrow and more centered around mean, 0.0.">
    <img src="/assets/2024-09/ci_vs_sample_size.png" alt="" />
  </a>
  
  
  <figcaption></figcaption>
  
</figure>

<h4 id="standard-deviation-variance">Standard Deviation (Variance)</h4>

<p>As with confidence level as the variance increases, we have more dispersion in the data. That leads to a wider CI width.</p>

<details collapse="">
<summary>Code</summary>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</pre></td><td class="code"><pre><span class="n">stds</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">sample_means</span> <span class="o">=</span> <span class="p">[]</span>

<span class="n">t_interval_95_l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">t_interval_95_r</span> <span class="o">=</span> <span class="p">[]</span>

<span class="n">norm_interval_95_l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">norm_interval_95_r</span> <span class="o">=</span> <span class="p">[]</span>

<span class="k">for</span> <span class="n">std</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">200</span><span class="p">,</span> <span class="mi">1</span><span class="p">):</span>
    <span class="n">std</span> <span class="o">=</span> <span class="n">std</span> <span class="o">*</span> <span class="mf">0.01</span>
    <span class="n">sample</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">std</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">1000</span><span class="p">)</span>
    <span class="n">t_interval_95</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="n">interval</span><span class="p">(</span>
        <span class="n">confidence</span><span class="o">=</span><span class="mf">0.95</span><span class="p">,</span> <span class="n">df</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">loc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">),</span> <span class="n">scale</span><span class="o">=</span><span class="n">st</span><span class="p">.</span><span class="n">sem</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span>
    <span class="p">)</span>
    <span class="n">stds</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">std</span><span class="p">)</span>
    <span class="n">t_interval_95_l</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">t_interval_95</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="n">t_interval_95_r</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">t_interval_95</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">fill_between</span><span class="p">(</span><span class="n">stds</span><span class="p">,</span> <span class="n">t_interval_95_l</span><span class="p">,</span> <span class="n">t_interval_95_r</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"r"</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.4</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_xlabel</span><span class="p">(</span><span class="s">"Standard Deviation"</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"CI upper and lower bounds"</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

</details>

<figure class=" ci_width_std gallery-popup">
  
  
  <a href="/assets/2024-09/ci_vs_stddev.png" title="The x-axis shows the standard deviation of a normal distribution going from 0 to 2. As the standard deviation increases (meaning more vairance in the data,) the 95% confidence interval on the sample widens, and thus, more unreliable.">
    <img src="/assets/2024-09/ci_vs_stddev.png" alt="" />
  </a>
  
  
  <figcaption></figcaption>
  
</figure>

<h3 id="probability-matching">Probability Matching</h3>

<p>Coverage probability will not always be the same as nominal coverage probability. When it matches, we get probability matching. In the below figure, out of 100 confidence intervals, 7 CIs do not contain the true mean (black.) Thus, we get a coverage of 93%, which is not the same as 95%, hence no probability matching.</p>

<details collapse="">
<summary>Code</summary>


<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
</pre></td><td class="code"><pre><span class="n">population</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">standard_normal</span><span class="p">(</span><span class="n">size</span><span class="o">=</span><span class="mi">100000</span><span class="p">)</span>

<span class="n">ci_ids</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">t_interval_l</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">t_interval_r</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">ci_contains_true_value</span> <span class="o">=</span> <span class="p">[]</span>

<span class="n">ci_level</span> <span class="o">=</span> <span class="mf">0.95</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">100</span><span class="p">):</span>
    <span class="n">sample</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">choice</span><span class="p">(</span><span class="n">population</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="mi">10000</span><span class="p">,</span> <span class="n">replace</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
    <span class="n">t_interval</span> <span class="o">=</span> <span class="n">st</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="n">interval</span><span class="p">(</span>
        <span class="n">confidence</span><span class="o">=</span><span class="n">ci_level</span><span class="p">,</span>
        <span class="n">df</span><span class="o">=</span><span class="nb">len</span><span class="p">(</span><span class="n">sample</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span>
        <span class="n">loc</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">mean</span><span class="p">(</span><span class="n">sample</span><span class="p">),</span>
        <span class="n">scale</span><span class="o">=</span><span class="n">st</span><span class="p">.</span><span class="n">sem</span><span class="p">(</span><span class="n">sample</span><span class="p">),</span>
    <span class="p">)</span>
    <span class="n">ci_ids</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
    <span class="n">t_interval_l</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">t_interval</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
    <span class="n">t_interval_r</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">t_interval</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>

    <span class="k">if</span> <span class="n">t_interval</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">&lt;=</span> <span class="mi">0</span> <span class="o">&lt;=</span> <span class="n">t_interval</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
        <span class="n">ci_contains_true_value</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">ci_contains_true_value</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>

<span class="n">cols</span> <span class="o">=</span> <span class="p">[</span><span class="s">"g"</span> <span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="mi">1</span> <span class="k">else</span> <span class="s">"red"</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">ci_contains_true_value</span><span class="p">]</span>
<span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">vlines</span><span class="p">(</span><span class="n">ci_ids</span><span class="p">,</span> <span class="n">t_interval_l</span><span class="p">,</span> <span class="n">t_interval_r</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="n">cols</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.3</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">axhline</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"black"</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_ylabel</span><span class="p">(</span><span class="s">"CI upper and lower bounds"</span><span class="p">)</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_xticks</span><span class="p">([])</span>
<span class="n">_</span> <span class="o">=</span> <span class="n">ax</span><span class="p">.</span><span class="n">set_title</span><span class="p">(</span>
    <span class="sa">f</span><span class="s">"Coverage = </span><span class="si">{</span><span class="nb">sum</span><span class="p">(</span><span class="n">ci_contains_true_value</span><span class="p">)</span> <span class="o">*</span> <span class="mi">100</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">ci_contains_true_value</span><span class="p">)</span><span class="si">}</span><span class="s">%"</span>
    <span class="sa">f</span><span class="s">" (Nominal Coverage = </span><span class="si">{</span><span class="n">ci_level</span><span class="o">*</span><span class="mi">100</span><span class="si">}</span><span class="s">%)"</span>
<span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

</details>

<figure class=" coverage_probab gallery-popup">
  
  
  <a href="/assets/2024-09/coverage_probability_2.png" title="Each line represents a 95% CI on a random sample. Out of 100 confidence intervals, 7 CIs (marked in red) do not contain the true mean (black line.) Thus, we get a coverage of 93%, which is not the same as 95%, hence no probability matching.">
    <img src="/assets/2024-09/coverage_probability_2.png" alt="" />
  </a>
  
  
  <figcaption></figcaption>
  
</figure>

<h3 id="conclusion">Conclusion</h3>

<p>A Confidence Interval is an interval over a sample that is expected to contain the distribution parameter that we are trying to estimate (eg: mean.) That means that all CIs will not contain the mean. The sample and population mean could have the following properties:</p>

<ol>
  <li>It does not contain the mean</li>
  <li>It contains the mean somewhere in the middle</li>
  <li>It contains the mean but as an outlier</li>
</ol>

<p>Since the confidence interval is built from this sample using normal distribution the confidence interval may not contain the mean in the 1st or the 3rd scenario. That is why we take the confidence level as 95% or more to handle the 3rd scenario (demonstrated in the simulation section).</p>

<p>Since narrow-width confidence intervals are better (and more reliable), we should try to</p>

<ul>
  <li>take higher confidence levels (95% or more);</li>
  <li>have a bigger sample size; and</li>
  <li>have less variance in the data.</li>
</ul>

<p><br /></p>]]></content><author><name>Shivam Rana</name></author><category term="ML" /><summary type="html"><![CDATA[Confidence Interval (CI)]]></summary></entry><entry><title type="html">Lognormal to Normal Distribution</title><link href="https://trigonaminima.github.io/2024/01/lognormal-to-normal/" rel="alternate" type="text/html" title="Lognormal to Normal Distribution" /><published>2024-01-14T00:00:00+00:00</published><updated>2024-01-14T00:00:00+00:00</updated><id>https://trigonaminima.github.io/2024/01/lognormal-to-normal</id><content type="html" xml:base="https://trigonaminima.github.io/2024/01/lognormal-to-normal/"><![CDATA[<p>The Normal and lognormal distributions are fundamental concepts in statistics. I recently used the relationship between these two distributions in a project. In this blog post, I want to share what I learned.</p>

<p>Outline</p>

<ol>
  <li><a href="#dist">Normal &amp; Lognormal Distributions</a></li>
  <li><a href="#log2normal">Lognormal to Normal</a></li>
  <li><a href="#normal2log">Normal to Lognormal</a></li>
  <li><a href="#conclusion">Conclusion</a></li>
</ol>

<h2 id="normal--lognormal-distributions">Normal &amp; Lognormal Distributions<a name="dist"></a></h2>

<p>The normal distribution is also called the bell curve or Gaussian distribution. The bell height represents the mean position, and the bottom width of the bell represents the spread of values (standard deviation). Thus, the shape changes as we change mu (\(\mu\)) and sigma (\(\sigma\)). The \(\mu\) is the mean or average of the sample, and \(\sigma\) is the standard deviation. We denote a normal distribution as:</p>

\[{\mathcal {N}}(\mu ,\sigma ^{2})\]

<p>Find more details about the normal distribution on <a href="https://en.wikipedia.org/wiki/Normal_distribution">Wikipedia</a>. Here are two ways of defining a normal distribution in Python.</p>

<ul>
  <li>Using python stdlib</li>
</ul>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="kn">from</span> <span class="nn">statistics</span> <span class="kn">import</span> <span class="n">NormalDist</span>
<span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="p">.</span><span class="mi">5</span>
<span class="n">norm_dist</span> <span class="o">=</span> <span class="n">NormalDist</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

<ul>
  <li>Using scipy</li>
</ul>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>
<span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="p">.</span><span class="mi">5</span>
<span class="n">norm_dist</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p><br /></p>

<p>We get a lognormal distribution when we apply exponentiation to the normal distribution. The result is a lopsided curve. It means that there is a longer tail on the right side, where larger values occur. We denote the lognormal distribution as follows:</p>

\[{\displaystyle \ X\sim \operatorname {Lognormal} \left(\ \mu _{x},\sigma _{x}^{2}\ \right)\ }\]

<p>Since the log of the lognormal distribution is a normal distribution, we can denote the relationship as follows:</p>

\[{\displaystyle \ln(X)\sim {\mathcal {N}}(\mu ,\sigma ^{2})}\]

<p>Find more details about the lognormal distribution on <a href="https://en.wikipedia.org/wiki/Log-normal_distribution">Wikipedia</a>. We define a lognormal distribution in Python as follows. The Python stdlib does not have a lognormal implementation.</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>
<span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="p">.</span><span class="mi">5</span>
<span class="n">norm_dist</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">(</span><span class="n">s</span><span class="o">=</span><span class="n">sigma</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">mu</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>

<p>Note: the <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.lognorm.html"><code class="language-plaintext highlighter-rouge">scipy.stats.lognorm</code></a> takes mu and sigma of the underlying <em>normal distribution</em> from which we derive the lognormal distribution. While providing the <code class="language-plaintext highlighter-rouge">scale</code> parameter, we take the exponentiation of the mean of the normal distribution. I found the documentation inadequate in explaining the parameters. This <a href="https://stackoverflow.com/q/8870982/2650427">SO question</a> has answers that discuss the meaning of the parameters.</p>

<p><br />
Here is how both the distributions look for the same mu (\(\mu\)) and sigma (\(\sigma\)).</p>

<details close="">
<summary>Code to generate the below plot.</summary>


<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>


<span class="c1"># all distributions
</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="p">.</span><span class="mi">5</span>
<span class="n">norm_d1</span> <span class="o">=</span> <span class="n">NormalDist</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
<span class="n">lognorm_d1</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">(</span><span class="n">s</span><span class="o">=</span><span class="n">sigma</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">mu</span><span class="p">))</span>
<span class="n">lognorm_d1</span><span class="p">.</span><span class="n">mu</span><span class="p">,</span> <span class="n">lognorm_d1</span><span class="p">.</span><span class="n">sigma</span> <span class="o">=</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span>

<span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">1</span>
<span class="n">norm_d2</span> <span class="o">=</span> <span class="n">NormalDist</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
<span class="n">lognorm_d2</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">(</span><span class="n">s</span><span class="o">=</span><span class="n">sigma</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">mu</span><span class="p">))</span>
<span class="n">lognorm_d2</span><span class="p">.</span><span class="n">mu</span><span class="p">,</span> <span class="n">lognorm_d2</span><span class="p">.</span><span class="n">sigma</span> <span class="o">=</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span>

<span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">=</span> <span class="mi">4</span><span class="p">,</span> <span class="mf">0.3</span>
<span class="n">norm_d3</span> <span class="o">=</span> <span class="n">NormalDist</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
<span class="n">lognorm_d3</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">(</span><span class="n">s</span><span class="o">=</span><span class="n">sigma</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">mu</span><span class="p">))</span>
<span class="n">lognorm_d3</span><span class="p">.</span><span class="n">mu</span><span class="p">,</span> <span class="n">lognorm_d3</span><span class="p">.</span><span class="n">sigma</span> <span class="o">=</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span>

<span class="c1"># norm y
</span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="n">norm_y1</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">norm_d1</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x</span><span class="p">])</span>
<span class="n">norm_y2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">norm_d2</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x</span><span class="p">])</span>
<span class="n">norm_y3</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">norm_d3</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x</span><span class="p">])</span>

<span class="c1"># lognorm y
</span><span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">800</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="n">lognorm_y1</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">lognorm_d1</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x</span><span class="p">])</span>
<span class="n">lognorm_y2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">lognorm_d2</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x</span><span class="p">])</span>
<span class="n">lognorm_y3</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">lognorm_d3</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x</span><span class="p">])</span>


<span class="c1"># Set the figsize
</span><span class="n">fig1</span><span class="p">,</span> <span class="n">ax1</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">norm_y1</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mu = </span><span class="si">{</span><span class="n">norm_d1</span><span class="p">.</span><span class="n">mean</span><span class="si">}</span><span class="s">; sigma = </span><span class="si">{</span><span class="n">norm_d1</span><span class="p">.</span><span class="n">stdev</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">norm_y2</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mu = </span><span class="si">{</span><span class="n">norm_d2</span><span class="p">.</span><span class="n">mean</span><span class="si">}</span><span class="s">; sigma = </span><span class="si">{</span><span class="n">norm_d2</span><span class="p">.</span><span class="n">stdev</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">norm_y3</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mu = </span><span class="si">{</span><span class="n">norm_d3</span><span class="p">.</span><span class="n">mean</span><span class="si">}</span><span class="s">; sigma = </span><span class="si">{</span><span class="n">norm_d3</span><span class="p">.</span><span class="n">stdev</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>

<span class="n">fig2</span><span class="p">,</span> <span class="n">ax2</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">lognorm_y1</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mu = </span><span class="si">{</span><span class="n">lognorm_d1</span><span class="p">.</span><span class="n">mu</span><span class="si">}</span><span class="s">; sigma = </span><span class="si">{</span><span class="n">lognorm_d1</span><span class="p">.</span><span class="n">sigma</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">lognorm_y2</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mu = </span><span class="si">{</span><span class="n">lognorm_d2</span><span class="p">.</span><span class="n">mu</span><span class="si">}</span><span class="s">; sigma = </span><span class="si">{</span><span class="n">lognorm_d2</span><span class="p">.</span><span class="n">sigma</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">lognorm_y3</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mu = </span><span class="si">{</span><span class="n">lognorm_d3</span><span class="p">.</span><span class="n">mu</span><span class="si">}</span><span class="s">; sigma = </span><span class="si">{</span><span class="n">lognorm_d3</span><span class="p">.</span><span class="n">sigma</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>

<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>

<span class="n">fig1</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'norm_dist.svg'</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s">'svg'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">1200</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="s">'tight'</span><span class="p">)</span>
<span class="n">fig2</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'lognorm_dist.svg'</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s">'svg'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">1200</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="s">'tight'</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>


For normal distribution: Instead of using the <code>NormalDist.pdf()</code> we can also use <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html"><code>numpy.random.Generator.normal</code></a> to get a normal distribution sample and plot a histogram. Similarly, for lognormal distribution, instead of <code>stats.lognorm.pdf()</code>, we can use <a href="https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.lognormal.html"><code>numpy.random.Generator.lognormal</code></a>.

</details>

<figure class="half distributions gallery-popup">
  
  
  <a href="/assets/2024-01/norm_dist.svg" title="">
    <img src="/assets/2024-01/norm_dist.svg" alt="" />
  </a>
  
  
  <a href="/assets/2024-01/lognorm_dist.svg" title="">
    <img src="/assets/2024-01/lognorm_dist.svg" alt="" />
  </a>
  
  
  <figcaption>Normal and lognormal distributions with different mu and sigma.</figcaption>
  
</figure>

<h2 id="lognormal-to-normal">Lognormal to Normal<a name="log2normal"></a></h2>

<p>As mentioned in the previous section, normal distribution is just a log of the lognormal distribution. So, if \({\displaystyle \ X\sim \operatorname {Lognormal} \left(\mu _{x},\sigma _{x}^{2} \right)}\), then \({\ \displaystyle \ln(X)\sim {\mathcal {N}}(\mu ,\sigma ^{2})}\).</p>

<p>Let us understand this by code.</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>

<span class="n">rng</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">default_rng</span><span class="p">()</span>

<span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="p">.</span><span class="mi">5</span>
<span class="n">lognorm_samples</span> <span class="o">=</span> <span class="n">rng</span><span class="p">.</span><span class="n">lognormal</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
<span class="c1"># take the log of lognorm samples to derive the normal dist.
</span><span class="n">norm_samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">lognorm_samples</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">norm_samples</span><span class="p">.</span><span class="n">mean</span><span class="p">(),</span> <span class="n">norm_samples</span><span class="p">.</span><span class="n">std</span><span class="p">())</span>
</pre></td></tr></tbody></table></code></pre></figure>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>5.005339216906491 0.4934326302969564
</code></pre></div></div>

<p>The parameters (mean and std) of the derived normal distribution (line 7) are the same as the original parameters we provided to the lognormal dist (line 5).</p>

<details close="">
<summary>Code to generate the below plots</summary>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</pre></td><td class="code"><pre><span class="c1"># log normal dist
</span><span class="n">fig1</span><span class="p">,</span> <span class="n">ax1</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">hist</span><span class="p">(</span><span class="n">lognorm_samples</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"orange"</span><span class="p">)</span>

<span class="n">x1</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">800</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="n">lognorm_d</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">(</span><span class="n">s</span><span class="o">=</span><span class="n">sigma</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">mu</span><span class="p">))</span>
<span class="n">lognorm_y</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">lognorm_d</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x1</span><span class="p">])</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">lognorm_y</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mu = </span><span class="si">{</span><span class="n">mu</span><span class="si">}</span><span class="s">; sigma = </span><span class="si">{</span><span class="n">sigma</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>

<span class="c1"># normal dist
</span><span class="n">fig2</span><span class="p">,</span> <span class="n">ax2</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">hist</span><span class="p">(</span><span class="n">norm_samples</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"orange"</span><span class="p">)</span>

<span class="n">x2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="n">norm_d</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
<span class="n">norm_y</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">norm_d</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x2</span><span class="p">])</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x2</span><span class="p">,</span> <span class="n">norm_y</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mu = </span><span class="si">{</span><span class="n">mu</span><span class="si">}</span><span class="s">; sigma = </span><span class="si">{</span><span class="n">sigma</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>

<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">fig1</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'lognorm_dist2.svg'</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s">'svg'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">1200</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="s">'tight'</span><span class="p">)</span>
<span class="n">fig2</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'norm_dist2.svg'</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s">'svg'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">1200</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="s">'tight'</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

</details>

<figure class="half lognormal_to_normal gallery-popup">
  
  
  <a href="/assets/2024-01/lognorm_dist2.svg" title="">
    <img src="/assets/2024-01/lognorm_dist2.svg" alt="" />
  </a>
  
  
  <a href="/assets/2024-01/norm_dist2.svg" title="">
    <img src="/assets/2024-01/norm_dist2.svg" alt="" />
  </a>
  
  
  <figcaption>Lognormal to Normal conversion.</figcaption>
  
</figure>

<p>Conclusion: to convert from a lognormal to normal, take the logarithm of the lognormal sample.</p>

<h2 id="normal-to-lognormal">Normal to Lognormal<a name="normal2log"></a></h2>

<p>If the logarithm of a lognormal distribution is normally distributed, then the reverse will also be true. That is, the exponential of a normal distribution will give us a lognormal distribution. In notation, if \({\displaystyle Y\sim {\mathcal {N}}(\mu ,\sigma ^{2})}\), then \({\ \displaystyle \exp(Y)\sim \operatorname {Lognormal} \left(\mu _{x},\sigma _{x}^{2} \right)\ }\).</p>

<p>Let’s again understand this through code.</p>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="code"><pre><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">scipy.stats</span> <span class="k">as</span> <span class="n">stats</span>

<span class="n">rng</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">default_rng</span><span class="p">()</span>

<span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span> <span class="o">=</span> <span class="mi">5</span><span class="p">,</span> <span class="p">.</span><span class="mi">5</span>
<span class="n">norm_samples</span> <span class="o">=</span> <span class="n">rng</span><span class="p">.</span><span class="n">normal</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>

<span class="c1"># take the exp of norm samples to derive the lognormal dist.
</span><span class="n">lognorm_samples</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">norm_samples</span><span class="p">)</span>

<span class="c1"># fit a lognorm distribution to get the mean and std dev
</span><span class="n">shape</span><span class="p">,</span> <span class="n">loc</span><span class="p">,</span> <span class="n">scale</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">lognorm_samples</span><span class="p">)</span>
<span class="n">mean</span><span class="p">,</span> <span class="n">stddev</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">log</span><span class="p">(</span><span class="n">scale</span><span class="p">),</span> <span class="n">shape</span>
<span class="k">print</span><span class="p">(</span><span class="n">mean</span><span class="p">,</span> <span class="n">stddev</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>4.984256782660331 0.5067622675605842
</code></pre></div></div>

<p>The parameters (mean and std) of the derived lognormal distribution (line 10) are the same as the original parameters we provided to the normal dist (line 6). Note that we used the [<code class="language-plaintext highlighter-rouge">scipy.stats.lognorm.fit</code>] method to fit the lognorm distribution on the data. It gives us the following three parameters: <code class="language-plaintext highlighter-rouge">loc</code>, <code class="language-plaintext highlighter-rouge">shape</code> and <code class="language-plaintext highlighter-rouge">scale</code>. The <code class="language-plaintext highlighter-rouge">shape</code> is same as standard deviation. To get the mean, we have to take the logarithm of the <code class="language-plaintext highlighter-rouge">scale</code>. We did not have to do this when we converted the lognormal to a normal distribution (previous section) because we can directly get the params (mean and std). Read this <a href="https://stackoverflow.com/a/8748722/2650427">SO answer</a> for more details.</p>

<details close="">
<summary>Code to generate the below plots</summary>

<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</pre></td><td class="code"><pre><span class="c1"># normal dist
</span><span class="n">fig1</span><span class="p">,</span> <span class="n">ax1</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">hist</span><span class="p">(</span><span class="n">norm_samples</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"orange"</span><span class="p">)</span>

<span class="n">x1</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">7</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="n">norm_d</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">norm</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span>
<span class="n">norm_y</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">norm_d</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x1</span><span class="p">])</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">norm_y</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mu = </span><span class="si">{</span><span class="n">mu</span><span class="si">}</span><span class="s">; sigma = </span><span class="si">{</span><span class="n">sigma</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax1</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>

<span class="c1"># lognormal dist
</span><span class="n">fig2</span><span class="p">,</span> <span class="n">ax2</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="mi">3</span><span class="p">))</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">hist</span><span class="p">(</span><span class="n">lognorm_samples</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span><span class="p">,</span> <span class="n">density</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">"orange"</span><span class="p">)</span>

<span class="n">x2</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">800</span><span class="p">,</span> <span class="mi">500</span><span class="p">)</span>
<span class="n">lognorm_d</span> <span class="o">=</span> <span class="n">stats</span><span class="p">.</span><span class="n">lognorm</span><span class="p">(</span><span class="n">s</span><span class="o">=</span><span class="n">sigma</span><span class="p">,</span> <span class="n">scale</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">mu</span><span class="p">))</span>
<span class="n">lognorm_y</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">lognorm_d</span><span class="p">.</span><span class="n">pdf</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">x2</span><span class="p">])</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x2</span><span class="p">,</span> <span class="n">lognorm_y</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s">"mu = </span><span class="si">{</span><span class="n">mu</span><span class="si">}</span><span class="s">; sigma = </span><span class="si">{</span><span class="n">sigma</span><span class="si">}</span><span class="s">"</span><span class="p">)</span>
<span class="n">ax2</span><span class="p">.</span><span class="n">legend</span><span class="p">()</span>

<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
<span class="n">fig1</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'norm_dist3.svg'</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s">'svg'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">1200</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="s">'tight'</span><span class="p">)</span>
<span class="n">fig2</span><span class="p">.</span><span class="n">savefig</span><span class="p">(</span><span class="s">'lognorm_dist3.svg'</span><span class="p">,</span> <span class="nb">format</span><span class="o">=</span><span class="s">'svg'</span><span class="p">,</span> <span class="n">dpi</span><span class="o">=</span><span class="mi">1200</span><span class="p">,</span> <span class="n">bbox_inches</span><span class="o">=</span><span class="s">'tight'</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>

</details>

<figure class="half normal_to_lognormal gallery-popup">
  
  
  <a href="/assets/2024-01/norm_dist3.svg" title="">
    <img src="/assets/2024-01/norm_dist3.svg" alt="" />
  </a>
  
  
  <a href="/assets/2024-01/lognorm_dist3.svg" title="">
    <img src="/assets/2024-01/lognorm_dist3.svg" alt="" />
  </a>
  
  
  <figcaption>Normal to Lognormal conversion.</figcaption>
  
</figure>

<p>Conclusion: to convert from a normal to lognormal, take exp of the normal sample.</p>

<h2 id="conclusion">Conclusion<a name="conclusion"></a></h2>

<p>We started with the Normal and Lognormal distributions and with their definition in Python. We converted each of the distributions into the other. It took me some effort to figure out how to do the conversion. With this post, I tried to demystify the confusion.</p>

<p>If you are interested in how other distributions look, your search is over. This <a href="https://stackoverflow.com/q/37559470/2650427">SO answer</a> has visualisations of all the distributions available in <a href="http://docs.scipy.org/doc/scipy/reference/stats.html">scipy.stats</a>.</p>

<p><strong>Update: 18th Jan</strong>: Someone asked me the following question on reddit.</p>

<blockquote>
  <p>For what purpose are you converting between normal and lognormal? The two functions share the same parameters but thats about it. ln(data) is a non-destructive transformation but the process can obscure patterns just as often as it reveals them. Certain advanced statistical tests that require a normal distribution cannot necessarily have the results applied to the lognormal data.</p>
</blockquote>

<p>This stranger is correct that patterns are obscured, or rather, some other patterns come up after log transformation. Although, in my case, it did not matter.</p>

<p>I wanted to match the customers with the items that are within the customer spending range. The formulation was that if I have customer and outlet distributions, then I can match these distributions or get the overlap to get the <em>match percentage</em>. This match percentage will then be used on top of relevance scores.</p>

<p>Looking at the customer’s spend history, I saw that the distribution was lognormally distributed. A similar trend was observed in the restaurant’s order history. Since, computing the overlap in the production env was easier with the normal distributions, I was okay with the conversion. I will cover this in more detail in a future post.</p>]]></content><author><name>Shivam Rana</name></author><category term="ML" /><summary type="html"><![CDATA[The Normal and lognormal distributions are fundamental concepts in statistics. I recently used the relationship between these two distributions in a project. In this blog post, I want to share what I learned.]]></summary></entry></feed>