<template>
  <section name="288f" class="section section--body section--first section--last">
    <div class="section-content">
      <div class="section-inner sectionLayout--insetColumn"><h3 name="1e8a" id="1e8a" class="graf graf--h3 graf--leading graf--title">
        Minimising sampling error by discovering posts&nbsp;early.</h3><h4 name="698c" id="698c" class="graf graf--h4 graf-after--h3 graf--subtitle">
        <strong class="markup--strong markup--h4-strong">Using deep learning to predict the time posts are
          to be published</strong></h4>
        <p name="e08c" id="e08c" class="graf graf--p graf-after--h4">A necessarily condition for the
          effective use of uniform sampling is to discover new posts in a timely manner. If a post is
          discovered late, then even perfect subsequent uniform sampling will not compensate for the
          post’s first likes being missed. Therefore, it might easily result in a 2-digit error. Imagine
          gathering survey data in the US and completely ignoring the Pacific Time zone i.e. 1/3 of the
          country. The sample gathered in this way won’t be random or won’t represent the country’s real
          opinion.</p>
        <p name="4397" id="4397" class="graf graf--p graf-after--p">Considering the fact that the number of
          likes grows in a non-linear way, and that the growth is fastest at the beginning of the life of
          a post, any delay can be highly critical. A delay of only an hour may mean missing 20% of the
          total likes.</p>
        <p name="fd23" id="fd23" class="graf graf--p graf-after--p">The simplest way to combat delay is to
          check for new posts more frequently: once an hour, say, or once every half hour. But this is
          also the most uneconomical way to do it, given the quantity of queries to Instagram. Influencers
          are hardly likely to post while they are asleep; they may have a favorite time or day of the
          week to post. If we can grasp each influencers individual “timetable” and arrange our checks
          accordingly, we may be able to save substantially on queries.</p>
        <h3 name="46cd" id="46cd" class="graf graf--h3 graf-after--p">How Influencers post</h3>
        <p name="30d9" id="30d9" class="graf graf--p graf-after--h3">To understand whether we have a chance
          of working out influeners’ behavior patterns, let’s visualize some of the data we have.</p><h4 name="67cd" id="67cd" class="graf graf--h4 graf-after--p">Intervals between posts depending
          on the time of&nbsp;day</h4>
        <figure name="4b3d" id="4b3d" class="graf graf--figure graf--layoutOutsetLeft graf-after--h4">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 525px; max-height: 427px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 62.4%;"></div>
            <img class="graf-image" data-image-id="1*Zba6p-8Ab9HLPXWTD-yvqg.png" data-width="845" data-height="527" data-is-featured="true" src="/img/articles/max_600_1_Zba6p-8Ab9HLPXWTD-yvqg.png"></div>
        </figure>
        <p name="77a5" id="77a5" class="graf graf--p graf-after--figure">There are clear 24-, 48- and
          72-hour peaks, corresponding to influencers posting strictly one post a day, two or three.
          Perhaps these are influencers who use timed posting services with a schedule.</p>
        <figure name="baef" id="baef" class="graf graf--figure graf--layoutOutsetLeft graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 525px; max-height: 437px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 64.3%;"></div>
            <img class="graf-image" data-image-id="1*hzZaZyMokQZt2zzvfHwtUA.png" data-width="820" data-height="527" src="/img/articles/max_600_1_hzZaZyMokQZt2zzvfHwtUA.png"></div>
        </figure>
        <p name="17ad" id="17ad" class="graf graf--p graf-after--figure">On a smaller scale, we see peaks
          corresponding to posting once an hour or once every few hours. These are probably also scheduled
          posting services.</p><h4 name="821c" id="821c" class="graf graf--h4 graf-after--p">Number of
          posts by time of&nbsp;day.</h4>
        <figure name="0080" id="0080" class="graf graf--figure graf--layoutOutsetLeft graf-after--h4">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 525px; max-height: 433px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 63.3%;"></div>
            <img class="graf-image" data-image-id="1*uYQt22eCBPZHpxhZE_dGYQ.png" data-width="832" data-height="527" src="/img/articles/max_600_1_uYQt22eCBPZHpxhZE_dGYQ.png"></div>
        </figure>
        <p name="64f9" id="64f9" class="graf graf--p graf-after--figure">Peaks and troughs in activity over
          the course of the day can easily be seen: these are connected to the natural daily rhythms of
          sleep and waking and to the difference in activity between working and non-working hours.</p>
        <p name="daa0" id="daa0" class="graf graf--p graf-after--p">Instagram’s audience is international,
          and is distributed across various time zones around the world. To get a better understanding of
          these peaks and troughs, let’s visualize the audience in different countries separately.</p>
      </div>
      <div class="section-inner sectionLayout--outsetColumn">
        <figure name="0d24" id="0d24" class="graf graf--figure graf--layoutOutsetCenter graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 1000px; max-height: 579px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 57.9%;"></div>
            <img class="graf-image" data-image-id="1*HoheQMpNx_F1b_9qZQEVlw.png" data-width="1484" data-height="859" src="/img/articles/max_1000_1_HoheQMpNx_F1b_9qZQEVlw.png"></div>
        </figure>
      </div>
      <div class="section-inner sectionLayout--insetColumn"><p name="21a8" id="21a8" class="graf graf--p graf-after--figure">On each
        country’s graph, the daily cycle is expressed still more clearly. In some countries we also observe
        a surge in activity before the start of the working day and after it finishes.</p>
        <h3 name="ef80" id="ef80" class="graf graf--h3 graf-after--p">Detailed posting&nbsp;patterns</h3>
        <p name="14fd" id="14fd" class="graf graf--p graf-after--h3">Visual analysis of total activity shows
          that posts are not distributed uniformly through the 24 hour period: there are regular patterns.
          Now let’s turn from general analysis to a more detailed look at the ways individual influencers
          post.</p><h4 name="7e76" id="7e76" class="graf graf--h4 graf-after--p">Consistent
          influencers</h4>
        <p name="48a0" id="48a0" class="graf graf--p graf-after--h4">First let’s look at the “consistent”
          influencers — the ones with the smallest range of intervals between their posts. As we see, most
          of these influencers post exactly once in 24 hours, with some small variation.</p></div>
      <div class="section-inner sectionLayout--outsetColumn">
        <figure name="a9ea" id="a9ea" class="graf graf--figure graf--layoutOutsetCenter graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 1000px; max-height: 524px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 52.400000000000006%;"></div>
            <img class="graf-image" data-image-id="1*GnpWz59pa7M19kJvnmFNPQ.png" data-width="2665" data-height="1396" src="/img/articles/max_1000_1_GnpWz59pa7M19kJvnmFNPQ.png"></div>
        </figure>
      </div>
      <div class="section-inner sectionLayout--insetColumn"><h4 name="a263" id="a263" class="graf graf--h4 graf-after--figure">
        Inconsistent influencers</h4>
        <p name="08b2" id="08b2" class="graf graf--p graf-after--h4">The next graph shows the “inconsistent”
          influencers, who have a wide variation in the intervals between posts. This is mostly people who
          post in packages of two or three posts at a time, with big gaps between the packages. We see
          that the interval between posts varies cyclically, from a matter of seconds to several days.</p>
      </div>
      <div class="section-inner sectionLayout--outsetColumn">
        <figure name="d819" id="d819" class="graf graf--figure graf--layoutOutsetCenter graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 1000px; max-height: 510px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 51%;"></div>
            <img class="graf-image" data-image-id="1*Wuvk2FIPxewA6RVyOQyIfA.png" data-width="2736" data-height="1396" src="/img/articles/max_1000_1_Wuvk2FIPxewA6RVyOQyIfA.png"></div>
        </figure>
      </div>
      <div class="section-inner sectionLayout--insetColumn"><h4 name="2c06" id="2c06" class="graf graf--h4 graf-after--figure">
        Prolific influencers</h4>
        <p name="a8fe" id="a8fe" class="graf graf--p graf-after--h4">This graph shows the influencers who
          have the smallest intervals between their posts, i.e. the ones who post many times in an hour.
          These are mostly store accounts that post their catalogs to Instagram. We observe regular pauses
          between posting periods (single surges): evidently they only post during working hours, and take
          evenings and weekends off.</p></div>
      <div class="section-inner sectionLayout--outsetColumn">
        <figure name="47c6" id="47c6" class="graf graf--figure graf--layoutOutsetCenter graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 1000px; max-height: 515px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 51.5%;"></div>
            <img class="graf-image" data-image-id="1*zGX7pEnDF4856aHDzblWzA.png" data-width="2710" data-height="1396" src="/img/articles/max_1000_1_zGX7pEnDF4856aHDzblWzA.png"></div>
        </figure>
      </div>
      <div class="section-inner sectionLayout--insetColumn"><h4 name="22e3" id="22e3" class="graf graf--h4 graf-after--figure">
        Infrequent influencers</h4>
        <p name="54ca" id="54ca" class="graf graf--p graf-after--h4">This graph shows the influencers with
          the longest intervals between posts, i.e. the ones who post rarely. It is hard to trace any
          obvious regularities with this group.</p></div>
      <div class="section-inner sectionLayout--outsetColumn">
        <figure name="3b30" id="3b30" class="graf graf--figure graf--layoutOutsetCenter graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 1000px; max-height: 519px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 51.9%;"></div>
            <img class="graf-image" data-image-id="1*nHP39ijyeRt_ymyzIG2p6Q.png" data-width="2691" data-height="1396" src="/img/articles/max_1000_1_nHP39ijyeRt_ymyzIG2p6Q.png"></div>
        </figure>
      </div>
      <div class="section-inner sectionLayout--insetColumn"><p name="426d" id="426d" class="graf graf--p graf-after--figure">So,
        we’ve seen on the one hand that posting patterns show a number of obvious regularities, and on the
        other hand that these regularities vary for different influencers — while there are a lot of
        influencers whose posting shows no pattern at all. Creating a set of rules by hand that would be
        equally suited to any influencer seems impossible. The only way of solving this problem within a
        reasonable timeframe is by using <em class="markup--em markup--p-em">machine learning</em>.</p>
        <h3 name="e71d" id="e71d" class="graf graf--h3 graf-after--p">Let’s try machine&nbsp;learning</h3><h4 name="2d5b" id="2d5b" class="graf graf--h4 graf-after--h3">A bit of&nbsp;theory</h4>
        <p name="198b" id="198b" class="graf graf--p graf-after--h4">The first question that needs to be
          asked when we’re developing a machine learning model is this: what do we actually want to
          forecast? At first glance it might seem that we need to predict the time when the next post will
          appear. That is, our model has to learn the function <strong class="markup--strong markup--p-strong"><em class="markup--em markup--p-em">F,</em></strong></p>
        <figure name="c7a6" id="c7a6" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 375px; max-height: 50px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 13.3%;"></div>
            <img class="graf-image" data-image-id="1*A8dTg0AKI-wMo1PvRKSFFQ.png" data-width="375" data-height="50" src="/img/articles/max_800_1_A8dTg0AKI-wMo1PvRKSFFQ.png"></div>
        </figure>
        <p name="b3f7" id="b3f7" class="graf graf--p graf-after--figure">where:</p>
        <figure name="6cb5" id="6cb5" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 133px; max-height: 38px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 28.599999999999998%;"></div>
            <img class="graf-image" data-image-id="1*QXgBukqDu1AhokH0LP2ObQ.png" data-width="133" data-height="38" src="/img/articles/max_800_1_QXgBukqDu1AhokH0LP2ObQ.png"></div>
          <figcaption class="imageCaption">the previous posting&nbsp;history</figcaption>
        </figure>
        <figure name="eefe" id="eefe" class="graf graf--figure graf-after--figure">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 75px; max-height: 42px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 56.00000000000001%;"></div>
            <img class="graf-image" data-image-id="1*OY-OtD0PQw6KRkbCdjcF3w.png" data-width="75" data-height="42" src="/img/articles/max_800_1_OY-OtD0PQw6KRkbCdjcF3w.png"></div>
          <figcaption class="imageCaption">the time of the next post, which is what we would like to
            forecast.
          </figcaption>
        </figure>
        <p name="60fe" id="60fe" class="graf graf--p graf-after--figure">This is actually a bad idea.
          Forecasting the exact time of the next post is like forecasting the toss of a coin. If it’s a
          fair coin, we know that the average of our tosses will be 50/50. But forecasting each specific
          toss is not possible. It’s the same situation with an influencer. Let’s say we know they prefer
          to post in the evening. But on any specific day maybe they won’t post at all, because they’re
          traveling; or maybe they’ll have other things to do, and end up posting later than usual; or
          maybe they’ll post twice in one go. To forecast the exact time, we would need to know a whole
          set of factors from influencers’ lives: we don’t have access to these factors, and even if we
          did, we wouldn’t use it because it would be a violation of their privacy.</p>
        <p name="15c5" id="15c5" class="graf graf--p graf-after--p">So it’s better to forecast the <em class="markup--em markup--p-em">probability</em> that an influencer will post at some
          arbitrary point in time. The probability of single events that happen within a fixed time period
          is described in the simplest case by the <a href="https://en.wikipedia.org/wiki/Poisson_distribution" data-href="https://en.wikipedia.org/wiki/Poisson_distribution" class="markup--anchor markup--p-anchor" rel="noopener" target="_blank">Poisson
            distribution</a>:</p>
        <blockquote name="ad44" id="ad44" class="graf graf--blockquote graf-after--p">Using a Poisson
          distribution presupposes that events take place independently of one another. In our case this
          may not be true: an influencer may decide to post exactly once per day, no more and no less, and
          in this event the probability of a post will be 0% if they have already posted today and 100% if
          they haven’t. So the probability depends on previous events. But, as we shall see, what matters
          is not the absolute probability but the <em class="markup--em markup--blockquote-em">intensity</em> value, and this will still be
          correct (intensity = 1 for one post a day)
        </blockquote>
        <figure name="4499" id="4499" class="graf graf--figure graf-after--blockquote">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 292px; max-height: 113px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 38.7%;"></div>
            <img class="graf-image" data-image-id="1*gZQ98T27ZZrSVFHCbZOgdA.png" data-width="292" data-height="113" src="/img/articles/max_800_1_gZQ98T27ZZrSVFHCbZOgdA.png"></div>
        </figure>
        <p name="1f20" id="1f20" class="graf graf--p graf-after--figure">where:</p>
        <p name="823d" id="823d" class="graf graf--p graf-after--p"><em class="markup--em markup--p-em">k </em>– the number of events per unit of time (in our case
          the events are posts, and the unit of time could be, say, a day)</p>
        <p name="5c3a" id="5c3a" class="graf graf--p graf-after--p">λ – the mathematical expectation, i.e.
          the average observed number of events per unit of time. This parameter is also called the <em class="markup--em markup--p-em">intensity</em>.</p>
        <p name="95eb" id="95eb" class="graf graf--p graf-after--p">Let’s say an influencer averages three
          posts a day (λ = 3). Then, according to the Poisson distribution, we obtain the following
          probabilities and observe <em class="markup--em markup--p-em">k </em>posts over one specific
          day:</p>
        <ul class="postList">
          <li name="669a" id="669a" class="graf graf--li graf-after--p">probability of seeing 0 (no)
            posts:
          </li>
        </ul>
        <figure name="af90" id="af90" class="graf graf--figure graf-after--li">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 700px; max-height: 110px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 15.7%;"></div>
            <img class="graf-image" data-image-id="1*s5GOLEP5eEKcrPoB9vdFFA.png" data-width="721" data-height="113" src="/img/articles/max_800_1_s5GOLEP5eEKcrPoB9vdFFA.png"></div>
        </figure>
        <ul class="postList">
          <li name="6a94" id="6a94" class="graf graf--li graf-after--figure">probability of seeing 1
            post:
          </li>
        </ul>
        <figure name="ed39" id="ed39" class="graf graf--figure graf-after--li">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 700px; max-height: 109px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 15.6%;"></div>
            <img class="graf-image" data-image-id="1*ravTLiNIfg4rYaGaLmyHqw.png" data-width="725" data-height="113" src="/img/articles/max_800_1_ravTLiNIfg4rYaGaLmyHqw.png"></div>
        </figure>
        <ul class="postList">
          <li name="bcef" id="bcef" class="graf graf--li graf-after--figure">the other values are shown on
            the graph:
          </li>
        </ul>
        <figure name="9851" id="9851" class="graf graf--figure graf-after--li">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 700px; max-height: 386px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 55.2%;"></div>
            <img class="graf-image" data-image-id="1*cP7gc4mlPRkQl5iWJsFhSw.png" data-width="1189" data-height="656" src="/img/articles/max_800_1_cP7gc4mlPRkQl5iWJsFhSw.png"></div>
        </figure>
        <p name="2256" id="2256" class="graf graf--p graf-after--figure">If influencers always posted at the
          same intensity, i.e. if condition <em class="markup--em markup--p-em">λ = const </em>were met,
          then we could stop our analysis here: we wouldn’t even need machine learning. But in real life
          the intensity is always changing. An influencer might find a new topic area, get inspired, and
          start posting several times more often than usual. Or, contrariwise, the influencer might give
          up their account and switch to something else altogether, or go on holiday, and thus stop
          posting entirely. In this event, their posting intensity will tend towards zero. Thus, in real
          life <em class="markup--em markup--p-em">λ </em>is not a constant, but is a function that varies
          with time:</p>
        <figure name="ba73" id="ba73" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 179px; max-height: 50px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 27.900000000000002%;"></div>
            <img class="graf-image" data-image-id="1*yFdkH-jmO8c-5xM2Xwu0ZQ.png" data-width="179" data-height="50" src="/img/articles/max_800_1_yFdkH-jmO8c-5xM2Xwu0ZQ.png"></div>
        </figure>
        <p name="2fe9" id="2fe9" class="graf graf--p graf-after--figure">Our task is to discover this
          function: then we can assess the probability of a new post appearing at any arbitrary point in
          time, and start modeling influencers’ behavior.</p>
        <h3 name="9f54" id="9f54" class="graf graf--h3 graf-after--p">From theory to&nbsp;practise</h3>
        <p name="9150" id="9150" class="graf graf--p graf-after--h3">It’s time to make the transition from
          theory to practise. Let’s build a machine learning model that can learn the integral
          function</p>
        <figure name="0aac" id="0aac" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 371px; max-height: 50px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 13.5%;"></div>
            <img class="graf-image" data-image-id="1*Ae2Es04GdE5CmvLlswOfmA.png" data-width="371" data-height="50" src="/img/articles/max_800_1_Ae2Es04GdE5CmvLlswOfmA.png"></div>
        </figure>
        <ul class="postList">
          <li name="1f52" id="1f52" class="graf graf--li graf-after--figure"><em class="markup--em markup--li-em">history</em> – influencer’s posting history,
          </li>
          <li name="e08d" id="e08d" class="graf graf--li graf-after--li"><em class="markup--em markup--li-em">t</em> – relative time since their last post.
          </li>
        </ul>
        <p name="516d" id="516d" class="graf graf--p graf-after--li">We shall forecast the intensity for the
          next 24 hours after the post. For training purposes we need to provide a loss function showing
          how well our forecasts are working. As a loss function, we’ll use the <a href="https://en.wikipedia.org/wiki/Likelihood_function#Log-likelihood" data-href="https://en.wikipedia.org/wiki/Likelihood_function#Log-likelihood" class="markup--anchor markup--p-anchor" rel="noopener" target="_blank">negative log
            likelihood</a>:</p>
        <figure name="0fb4" id="0fb4" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 613px; max-height: 50px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 8.200000000000001%;"></div>
            <img class="graf-image" data-image-id="1*UbfZQoj7ekuCYJYV1nmuUg.png" data-width="613" data-height="50" src="/img/articles/max_800_1_UbfZQoj7ekuCYJYV1nmuUg.png"></div>
        </figure>
        <p name="9ba4" id="9ba4" class="graf graf--p graf-after--figure">where:</p>
        <figure name="82b1" id="82b1" class="graf graf--figure graf--layoutOutsetLeft graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 271px; max-height: 46px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 17%;"></div>
            <img class="graf-image" data-image-id="1*_Einelb-GuFnROQBxEyOyg.png" data-width="271" data-height="46" src="/img/articles/max_600_1__Einelb-GuFnROQBxEyOyg.png"></div>
        </figure>
        <p name="dea5" id="dea5" class="graf graf--p graf-after--figure">this is the probability of
          observing a quantity of posts equal to <em class="markup--em markup--p-em">X </em>over the 24
          hours following moment <em class="markup--em markup--p-em">t</em>, for a Poisson distribution
          characterized by parameter λ. The quantity of posts <em class="markup--em markup--p-em">X </em>is
          taken from real data. The more accurately we predicted the value of λ, the higher the likelihood
          calculated from the real number of posts will be, and the lower the loss.</p>
        <p name="bce7" id="bce7" class="graf graf--p graf-after--p">Our training is really analogous to the
          <a href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" data-href="https://en.wikipedia.org/wiki/Maximum_likelihood_estimation" class="markup--anchor markup--p-anchor" rel="noopener" target="_blank">Maximum Likelihood
            Estimation</a> (MLE) procedure, except that we are looking not for a fixed value of
          parameter λ, common to the whole data set, but for the value at each given moment in time, <em class="markup--em markup--p-em">t</em>, that characterizes the next 24 hours.</p>
        <p name="5553" id="5553" class="graf graf--p graf-after--p">For training, we shall use a deep
          learning model consisting of a <a href="https://en.wikipedia.org/wiki/Recurrent_neural_network" data-href="https://en.wikipedia.org/wiki/Recurrent_neural_network" class="markup--anchor markup--p-anchor" rel="noopener" target="_blank">Recurrent Neural Network</a> and several
          hidden layers. The RNN is presented with the prior posting history of an influencer, and the
          first hidden layer with the RNN final states and time <em class="markup--em markup--p-em">t.</em></p>
        <p name="ad74" id="ad74" class="graf graf--p graf-after--p">Let’s see what we get as a result of our
          training, taking particular influencers’ posts as examples. Light blue shows the forecast
          intensity, and orange triangles and vertical lines indicate moments in time when the influencers
          posted.</p></div>
      <div class="section-inner sectionLayout--outsetColumn">
        <figure name="d728" id="d728" class="graf graf--figure graf--layoutOutsetCenter graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 1000px; max-height: 150px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 15%;"></div>
            <img class="graf-image" data-image-id="1*ns6aovGrYiIHVJ9evRUFtg.png" data-width="1881" data-height="283" src="/img/articles/max_1000_1_ns6aovGrYiIHVJ9evRUFtg.png"></div>
        </figure>
      </div>
      <div class="section-inner sectionLayout--insetColumn"><p name="e51b" id="e51b" class="graf graf--p graf-after--figure">We see
        from this example that if the frequency of posts increases the value of parameter λ rises, just as
        we would expect; and if there are no posts for a while it drops. When an influencer stops posting
        new content, the value of λ falls virtually to nil. When a new post appears after a long pause&nbsp;, the
        value of λ zooms up — that is, the model sees that the influencer is still alive and starts to
        anticipate a stream of new posts.</p></div>
      <div class="section-inner sectionLayout--outsetColumn">
        <figure name="c094" id="c094" class="graf graf--figure graf--layoutOutsetCenter graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 1000px; max-height: 153px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 15.299999999999999%;"></div>
            <img class="graf-image" data-image-id="1*ZpcuWZ9-A9UKPHvwS-rmmQ.png" data-width="1881" data-height="288" src="/img/articles/max_1000_1_ZpcuWZ9-A9UKPHvwS-rmmQ.png"></div>
        </figure>
      </div>
      <div class="section-inner sectionLayout--insetColumn"><p name="7f04" id="7f04" class="graf graf--p graf-after--figure">In this
        example the model has observed a three-month posting history and identified the influencer’s
        “favorite days” on which to post. Correspondingly, the predicted intensity starts to rise in
        anticipation of the fact that the influencer will soon be posting. Even when there are no posts, the
        anticipated intensity still rises and falls in a cyclical pattern.</p></div>
      <div class="section-inner sectionLayout--outsetColumn">
        <figure name="a597" id="a597" class="graf graf--figure graf--layoutOutsetCenter graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 1000px; max-height: 153px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 15.299999999999999%;"></div>
            <img class="graf-image" data-image-id="1*LsAIjkm0yqHoWdzO2LAxjg.png" data-width="1881" data-height="288" src="/img/articles/max_1000_1_LsAIjkm0yqHoWdzO2LAxjg.png"></div>
        </figure>
      </div>
      <div class="section-inner sectionLayout--insetColumn"><p name="e94e" id="e94e" class="graf graf--p graf-after--figure">This
        example also shows a weekly pattern, but of a somewhat different kind: the influencer doesn’t
        usually post at the weekend. The intensity remains roughly constant throughout the working week, and
        then at the weekend it drops.</p>
        <p name="4318" id="4318" class="graf graf--p graf-after--p">As we see, our model is quite capable of
          identifying an influencer’s behavior patterns and predicting their probability of posting over
          time in accordance with these patterns.</p>
        <h3 name="8406" id="8406" class="graf graf--h3 graf-after--p">A model for real application</h3>
        <p name="9266" id="9266" class="graf graf--p graf-after--h3">The model that forecasts intensity
          (i.e. parameter <em class="markup--em markup--p-em">λ</em> for a Poisson distribution) is a good
          one from the theoretical standpoint. But it does not directly answer the question that interests
          us in practise:</p>
        <blockquote name="b173" id="b173" class="graf graf--blockquote graf-after--p"><em class="markup--em markup--blockquote-em">When do we need to check for a new post?</em>
        </blockquote>
        <p name="5a1c" id="5a1c" class="graf graf--p graf-after--blockquote">To answer this question, we
          need to integrate the function <em class="markup--em markup--p-em">λ(t)</em>:</p>
        <figure name="e8dd" id="e8dd" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 396px; max-height: 125px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 31.6%;"></div>
            <img class="graf-image" data-image-id="1*E1txDc_9OaDzJdYduHfsCA.png" data-width="396" data-height="125" src="/img/articles/max_800_1_E1txDc_9OaDzJdYduHfsCA.png"></div>
        </figure>
        <p name="d21f" id="d21f" class="graf graf--p graf-after--figure">where:</p>
        <figure name="29a9" id="29a9" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 71px; max-height: 38px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 53.5%;"></div>
            <img class="graf-image" data-image-id="1*Etnmqpj2SfZ5ea5Y-cPiCA.png" data-width="71" data-height="38" src="/img/articles/max_800_1_Etnmqpj2SfZ5ea5Y-cPiCA.png"></div>
          <figcaption class="imageCaption">current time</figcaption>
        </figure>
        <figure name="4caa" id="4caa" class="graf graf--figure graf-after--figure">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 100px; max-height: 38px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 38%;"></div>
            <img class="graf-image" data-image-id="1*JxoDi3qbJx4beD7M92C_Lg.png" data-width="100" data-height="38" src="/img/articles/max_800_1_JxoDi3qbJx4beD7M92C_Lg.png"></div>
          <figcaption class="imageCaption">proposed check&nbsp;time</figcaption>
        </figure>
        <p name="3f94" id="3f94" class="graf graf--p graf-after--figure">First we need to place current time
          in proposed check time, then we let proposed check time gradually increase until the calculated
          value of the integral Λ becomes greater than some pre-chosen threshold. Of course, calculating
          the integral by numeric approximation from individual points where the model has computed
          forecasts for <em class="markup--em markup--p-em">λ </em>is an inexact, inconvenient and
          resource-intensive procedure. That is why for real use it’s better to build another model, which
          will forecast the check time directly.</p>
        <p name="7b30" id="7b30" class="graf graf--p graf-after--p">Just as with the previous model, we need
          to decide what the loss function will be. We need to meet two contradictory conditions at once:
          on the one hand we need to check as rarely as we can, so as not to generate a large volume of
          Instagram queries. On the other hand we need to keep the delay to a minimum, i.e. we need to
          discover a post at a moment when it hasn’t yet received many likes. Keeping the delay down means
          actually checking as often as we can. Our loss function will express a balance between these two
          conditions.</p>
        <p name="b469" id="b469" class="graf graf--p graf-after--p">First let’s sort out how to evaluate
          delays. The growth in a post’s number of likes is non-linear: it starts off very fast, then it
          slows down and by the time two days have elapsed from the moment of the post’s publication the
          growth has virtually stopped. We can visualize the number of likes for various posts on a graph
          like this:</p>
        <figure name="a826" id="a826" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 700px; max-height: 437px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 62.5%;"></div>
            <img class="graf-image" data-image-id="1*kr-CFXjMQrghAFmfLercEQ.png" data-width="1351" data-height="844" src="/img/articles/max_800_1_kr-CFXjMQrghAFmfLercEQ.png"></div>
        </figure>
        <p name="ca80" id="ca80" class="graf graf--p graf-after--figure">The dependence of number of likes
          on time can be approximately modeled with the formula:</p>
        <figure name="b990" id="b990" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 404px; max-height: 63px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 15.6%;"></div>
            <img class="graf-image" data-image-id="1*AnmxFNOjlHct5AkFVPM3lA.png" data-width="404" data-height="63" src="/img/articles/max_800_1_AnmxFNOjlHct5AkFVPM3lA.png"></div>
        </figure>
        <p name="7062" id="7062" class="graf graf--p graf-after--figure">where:</p>
        <ul class="postList">
          <li name="8ec5" id="8ec5" class="graf graf--li graf-after--p"><em class="markup--em markup--li-em">t </em>—<em class="markup--em markup--li-em"> </em>time
            in hours
          </li>
          <li name="95bc" id="95bc" class="graf graf--li graf-after--li"><em class="markup--em markup--li-em">α, β </em>—<em class="markup--em markup--li-em"> </em>empirical
            coefficients. In our case <em class="markup--em markup--li-em">α</em> = 4.2 <em class="markup--em markup--li-em">and β</em> = 0.7.
          </li>
        </ul>
        <p name="7ca7" id="7ca7" class="graf graf--p graf-after--li">The modeled values appear as the light
          blue line on the graph.</p>
        <p name="1206" id="1206" class="graf graf--p graf-after--p">Using this formula we can evaluate the
          delay from the fraction of the total number of likes that we end up missing. Thus, the delay is
          a magnitude in the interval [0,1]. This is the first part of our loss function.</p>
        <p name="7bd0" id="7bd0" class="graf graf--p graf-after--p">The second part is very simple: it is
          the inverted length of the forecast interval between the current time and the next check time.
          The longer this interval is, the fewer checks there will be — that is, the inverse of the
          interval is analogous to check frequency.</p>
        <p name="7725" id="7725" class="graf graf--p graf-after--p"><strong class="markup--strong markup--p-strong">So our loss function is:</strong></p>
        <figure name="09b0" id="09b0" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 683px; max-height: 50px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 7.3%;"></div>
            <img class="graf-image" data-image-id="1*khXj47PMuEqo12st9M2Flg.png" data-width="683" data-height="50" src="/img/articles/max_800_1_khXj47PMuEqo12st9M2Flg.png"></div>
        </figure>
        <p name="6497" id="6497" class="graf graf--p graf-after--figure"><em class="markup--em markup--p-em">k </em>is the coefficient that regulates the balance between
          check frequency and the size of the delay. Inserted manually based on business considerations:
          what the checking budget is, and how critical delays are. If the coefficient rises, the check
          frequency will drop and the delays will rise; if it goes down, the opposite will happen.</p>
        <figure name="a2d3" id="a2d3" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 288px; max-height: 58px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 20.1%;"></div>
            <img class="graf-image" data-image-id="1*zEiPTQnmNBJDX7unr3SwAw.png" data-width="288" data-height="58" src="/img/articles/max_800_1_zEiPTQnmNBJDX7unr3SwAw.png"></div>
        </figure>
        <figure name="c0b6" id="c0b6" class="graf graf--figure graf-after--figure">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 700px; max-height: 121px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 17.299999999999997%;"></div>
            <img class="graf-image" data-image-id="1*AmXf9LPjCOKIyLHQch3kpQ.png" data-width="888" data-height="154" src="/img/articles/max_800_1_AmXf9LPjCOKIyLHQch3kpQ.png"></div>
        </figure>
        <p name="7aa8" id="7aa8" class="graf graf--p graf-after--figure">where:</p>
        <figure name="b1c0" id="b1c0" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 17px; max-height: 46px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 270.6%;"></div>
            <img class="graf-image" data-image-id="1*AdKPqqm9u96NsZKs7yE-Sw.png" data-width="17" data-height="46" src="/img/articles/max_800_1_AdKPqqm9u96NsZKs7yE-Sw.png"></div>
          <figcaption class="imageCaption">is the forecast time interval from the current time to the
            next&nbsp;check.
          </figcaption>
        </figure>
        <figure name="861b" id="861b" class="graf graf--figure graf-after--figure">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 79px; max-height: 58px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 73.4%;"></div>
            <img class="graf-image" data-image-id="1*eOSVFsOu3P8x40MoZDx-Gg.png" data-width="79" data-height="58" src="/img/articles/max_800_1_eOSVFsOu3P8x40MoZDx-Gg.png"></div>
          <figcaption class="imageCaption">is the time interval from the current time to the next&nbsp;posts.
          </figcaption>
        </figure>
        <p name="f694" id="f694" class="graf graf--p graf-after--figure">Only posts where there was a delay
          are considered, i.e. posts that satisfy the condition:</p>
        <figure name="f3b9" id="f3b9" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 246px; max-height: 58px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 23.599999999999998%;"></div>
            <img class="graf-image" data-image-id="1*vef8ve6ejmEG3yKlgzkuTQ.png" data-width="246" data-height="58" src="/img/articles/max_800_1_vef8ve6ejmEG3yKlgzkuTQ.png"></div>
        </figure>
        <p name="4f55" id="4f55" class="graf graf--p graf-after--figure">If there are no such posts
          then:</p>
        <figure name="68c9" id="68c9" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 246px; max-height: 38px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 15.4%;"></div>
            <img class="graf-image" data-image-id="1*LYO2PRGfhEbb1D4gvCm1FQ.png" data-width="246" data-height="38" src="/img/articles/max_800_1_LYO2PRGfhEbb1D4gvCm1FQ.png"></div>
        </figure>
        <p name="b361" id="b361" class="graf graf--p graf-after--figure"><em class="markup--em markup--p-em">α </em>and<em class="markup--em markup--p-em"> β </em>are
          the coefficients for modeling the dynamics of likes (see above).</p>
        <p name="caeb" id="caeb" class="graf graf--p graf-after--p">During training, we shall consistently
          choose a random point of current time within each account’s posting history, and forecast the
          next check time relative to that “current” time. Thus, after a sufficiently lengthy period of
          training, we hit almost every interval between posts and we make forecasts for a whole set of
          different points in the history.</p>
        <h3 name="17d8" id="17d8" class="graf graf--h3 graf-after--p">Results</h3>
        <p name="30cc" id="30cc" class="graf graf--p graf-after--h3">Let’s take a look at the results we get
          from our trained model. We’ll visualize real posts and our model’s predicted checking points on
          the same timeline. Ideally, if we knew just when each influencer was going to post, each check
          would be right after a new post had appeared and there would be no checks at all, or very few
          checks, in the gaps between posts. But, as we said at the beginning of the article, that ideal
          is unattainable. On the other hand, it would be possible not to apply any models at all and just
          check once an hour, for example, to see whether each influencer had posted something new. Then
          the checks would be distributed evenly through the gaps between posts, but there would be a lot
          of wasted checks.</p>
        <p name="554d" id="554d" class="graf graf--p graf-after--p">The checks obtained using our model will
          have to be somewhere in between these two extremes. That is, there should be fewer checks during
          intervals when the probability of a post is low, and more checks during intervals when the
          probability of discovering a new post is high.</p>
        <p name="95f6" id="95f6" class="graf graf--p graf-after--p">We’ll depict checks as little light
          green points, and posts as bigger points colored on a scale from blue to yellow depending on how
          many likes we missed due to delay.</p>
        <figure name="92cc" id="92cc" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 700px; max-height: 88px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 12.6%;"></div>
            <img class="graf-image" data-image-id="1*U6MyTw_wQlfmPsNs4NQqWQ.png" data-width="1370" data-height="173" src="/img/articles/max_800_1_U6MyTw_wQlfmPsNs4NQqWQ.png"></div>
        </figure>
        <p name="12a8" id="12a8" class="graf graf--p graf-after--figure">We must also remember that likes
          appear very quickly, so that a delay of only ten minutes means we miss 10% of them; if we delay
          for half an hour, we miss 20%; and if we delay for an hour, we miss 30%.</p>
        <p name="7cfe" id="7cfe" class="graf graf--p graf-after--p">Let’s look at our model’s forecasts for
          influencers from the test group (i.e. influencers the model had not seen during its
          training)</p></div>
      <div class="section-inner sectionLayout--fullWidth">
        <figure name="86c2" id="86c2" class="graf graf--figure graf--layoutFillWidth graf-after--p">
          <div class="aspectRatioPlaceholder is-locked">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 9.5%;"></div>
            <img class="graf-image" data-image-id="1*nx2VnH_jD5k7UQMeFOPAXQ.png" data-width="1881" data-height="179" src="/img/articles/max_2560_1_nx2VnH_jD5k7UQMeFOPAXQ.png"></div>
        </figure>
        <figure name="b592" id="b592" class="graf graf--figure graf--layoutFillWidth graf-after--figure">
          <div class="aspectRatioPlaceholder is-locked">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 10.4%;"></div>
            <img class="graf-image" data-image-id="1*-v3wlcrKA_KEbwMZ_CAnSQ.png" data-width="1881" data-height="196" src="/img/articles/max_2560_1_-v3wlcrKA_KEbwMZ_CAnSQ.png"></div>
        </figure>
      </div>
      <div class="section-inner sectionLayout--insetColumn"><p name="e35b" id="e35b" class="graf graf--p graf-after--figure">The
        first diagram shows the influencer’s whole history; the second shows a section of the history, so as
        to provide a more detailed view of the intervals between checks that our model generates. The y axis
        shows the intervals between checks: the higher the green point, the greater the interval. We see
        that at the beginning of the history the model is still adapting to the influencer’s behavior and we
        observe quite substantial delays, with short gaps between checks. Later, as information about the
        influencer’s habits is accumulated, the checks become less frequent and more accurate.</p>
        <p name="c3ed" id="c3ed" class="graf graf--p graf-after--p">The second diagram shows the part of the
          history where the model settled down to a stable working mode. We see an evident daily cycle,
          with many fewer checks during the night than during the day. The times of posts coincide with
          the periods when checks are conducted most frequently: our model has been trained
          successfully.</p></div>
      <div class="section-inner sectionLayout--fullWidth">
        <figure name="29ca" id="29ca" class="graf graf--figure graf--layoutFillWidth graf-after--p">
          <div class="aspectRatioPlaceholder is-locked">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 9.5%;"></div>
            <img class="graf-image" data-image-id="1*dH5Ygxo-_ev4EfwUD7-ZSw.png" data-width="1881" data-height="179" src="/img/articles/max_2560_1_dH5Ygxo-_ev4EfwUD7-ZSw.png"></div>
        </figure>
        <figure name="baad" id="baad" class="graf graf--figure graf--layoutFillWidth graf-after--figure">
          <div class="aspectRatioPlaceholder is-locked">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 10.4%;"></div>
            <img class="graf-image" data-image-id="1*_wC5Bi0pq4iaU1_sVLy-WQ.png" data-width="1881" data-height="196" src="/img/articles/max_2560_1__wC5Bi0pq4iaU1_sVLy-WQ.png"></div>
        </figure>
      </div>
      <div class="section-inner sectionLayout--insetColumn"><p name="f5f0" id="f5f0" class="graf graf--p graf-after--figure">Another
        account. The first graph shows the model gradually increasing the interval between checks (the green
        points appear higher up), if the influencer does not post anything new. Why keep checking an account
        so often, if it’s not posting anything?</p>
        <p name="104b" id="104b" class="graf graf--p graf-after--p">On the second graph we see the model
          adapting to an influencer’s changing behavior. The influencer was initially posting once a day
          and the check frequency diagram shows a clear increase in frequency during the middle of the
          day — the time when the chance of a post is highest. Then the influencer starts posting twice a
          day, we can clearly see how the checks quickly adapt to this new behavior: instead of one trough
          corresponding to the lower interval between posts during the middle of the day, we start to see
          a “flat area” on the right hand side of the graph reflecting the evenly lower interval
          throughout the day.</p>
        <p name="754e" id="754e" class="graf graf--p graf-after--p">The success of our model is confirmed
          not just visually, but also in figures. If the gaps between checks are set by our model, we can
          achieve the same rate of missed likes (about 15%) from 2–4 times fewer checks than the baseline.
          As a baseline we use uniform checks once in every n minutes. If on the other hand we fix the
          number of checks and compare the rate of missed likes, then the model achieves a rate that is
          1.5–2 times lower than the baseline.</p>
        </div>
    </div>
  </section>
</template>

<script>
export default {
  name: "MinimisingSamplingError"
}
</script>
