<template>
  <section name="2c10" class="section section--body section--first">
    <div class="section-content">
      <div class="section-inner sectionLayout--insetColumn"><h3 name="142d" id="142d" class="graf graf--h3 graf--leading graf--title">
        Klug is MORE accurate at gender and age recognition than Microsoft &amp;&nbsp;Amazon!</h3>
        <p name="794d" id="794d" class="graf graf--p graf-after--h3">Determining the gender and age from a
          photo is no longer a novelty: there are many studies on this subject and many practical
          implementations have been developed that are available through the specialized services of
          Computer Vision: <a href="https://aws.amazon.com/rekognition/" data-href="https://aws.amazon.com/rekognition/" class="markup--anchor markup--p-anchor" rel="noopener" target="_blank">AWS
            Rekognition</a>, <a href="https://clarifai.com/" data-href="https://clarifai.com/" class="markup--anchor markup--p-anchor" rel="noopener" target="_blank">Clarifai</a>,
          <a href="https://www.faceplusplus.com/" data-href="https://www.faceplusplus.com/" class="markup--anchor markup--p-anchor" rel="noopener" target="_blank">Face++</a> and <a href="https://azure.microsoft.com/en-us/services/cognitive-services/face/" data-href="https://azure.microsoft.com/en-us/services/cognitive-services/face/" class="markup--anchor markup--p-anchor" rel="noopener" target="_blank">Microsoft Azure
            cognitive services</a>.</p>
        <p name="8c8f" id="8c8f" class="graf graf--p graf-after--p">We are not proponents of reinventing the
          wheel, and at first we tried to organize our operation through the above-mentioned services.
          Unfortunately, however, the price/performance ratio did not meet our expectations: commercial
          services give very inaccurate answers (see the Testing section below) and are expensive for
          large volumes (we need to analyze millions of photos). Therefore, we decided to develop our own
          in-house solution for this task.</p>
        <p name="ff8b" id="ff8b" class="graf graf--p graf-after--p">There are two main components in our
          system:</p>
        <ol class="postList">
          <li name="bfe0" id="bfe0" class="graf graf--li graf-after--p">Face detector: If a face is found
            on the photo, the service displays its coordinates and information on the position of the
            reference points (landmarks) — eyes, lips, etc. This information is then used for
            normalization, i.e., to bring all the detected faces to a single scale and geometric
            position. Normalization works through affine projection, which is calculated from the
            landmarks.
          </li>
          <li name="28cf" id="28cf" class="graf graf--li graf-after--li">Gender and age classifier: The
            input of this service is fed a normalized fragment of a photo containing a face. At the
            output, there is a probability that the photo depicts a man/woman, the expected age and the
            confidence interval.
          </li>
        </ol>
        <p name="a842" id="a842" class="graf graf--p graf-after--li">The detector is based on a model
          created on the basis of the architecture published in 2016, “Joint Face Detection and Alignment
          using Multi-task Cascaded Convolutional Networks” (Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu
          Qiao</p>
        <figure name="bbb4" id="bbb4" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder"><img class="graf-image" data-image-id="1*6HsMj9u3VSF4tqW3zBcv3Q.png" src="/img/articles/max_800_1_6HsMj9u3VSF4tqW3zBcv3Q.png">
          </div>
        </figure>
        <p name="a8bb" id="a8bb" class="graf graf--p graf-after--figure">The detector creates several copies
          of the image with different resolutions (to be able to find both large and small faces on photos
          of any resolution), searches for each copy of the face, and gradually refines the projection of
          the face coordinates, transmitting the image through a cascade of several convolutional
          networks. At the last stage, the coordinates of the landmarks are determined as well.</p>
        <figure name="cbe0" id="cbe0" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder"><img class="graf-image" data-image-id="1*-Fp_DHA1mehW1laoua23dg.png" src="/img/articles/max_800_1_-Fp_DHA1mehW1laoua23dg.png">
          </div>
        </figure>
        <p name="7834" id="7834" class="graf graf--p graf-after--figure">The detector works with good
          accuracy (&gt; 95%) and is able to even find small faces, measuring only a dozen or so pixels.
          For practical purposes, we set a minimum face size of 32 pixels, as with a smaller size it is
          difficult to determine the age.</p>
        <figure name="58ee" id="58ee" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder"><img class="graf-image" data-image-id="1*y2mHjtultcwxq12LXS5bnw.png" src="/img/articles/max_800_1_y2mHjtultcwxq12LXS5bnw.png">
          </div>
        </figure>
        <p name="2290" id="2290" class="graf graf--p graf-after--figure">This component is a Klug
          development, which uses the common architecture of Computer Vision, <em class="markup--em markup--p-em">Deep Residual Networks</em>. The basic component of this
          architecture is <em class="markup--em markup--p-em">Residual Block</em>. The main idea of this
          block is in the availability of a skip connection, which sends a copy of the input information,
          “bypassing” the main computing unit and thus making it possible to create very deep neural
          networks with hundreds of layers. In fact, the popularity of the term <em class="markup--em markup--p-em">“Deep Learning”</em> is largely due specifically to the
          massive use of this architecture.</p>
        <figure name="9d1f" id="9d1f" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder"><img class="graf-image" data-image-id="1*30chGBW74la4LHPh5Qs_tg.png" src="/img/articles/max_800_1_30chGBW74la4LHPh5Qs_tg.png">
          </div>
        </figure>
        <figure name="a2d3" id="a2d3" class="graf graf--figure graf-after--figure">
          <div class="aspectRatioPlaceholder"><img class="graf-image" data-image-id="1*suejVGJd-AQI8QQ9XFDpIw.png" src="/img/articles/max_800_1_suejVGJd-AQI8QQ9XFDpIw.png">
          </div>
        </figure>
        <p name="8502" id="8502" class="graf graf--p graf-after--figure">To save computing resources, we
          used a relatively shallow neural network of only 50 layers.</p>
        <p name="fc25" id="fc25" class="graf graf--p graf-after--p">The neural network simultaneously
          performs both classification (determining gender) and regression (determining the numerical
          value of age). Training also takes place simultaneously for two tasks. This approach, which is
          called multi-task learning, improves the efficiency of the neural network (since information
          about the field is useful for determining age and vice versa) and at the same time reduces the
          training time since only one neural network needs to be trained instead of two.</p>
        <p name="0095" id="0095" class="graf graf--p graf-after--p">The training was done on our own dataset
          created on the basis of open information from social networks (400k photos). The dataset was
          additionally checked and formatted by our employees and independent assessors (100 assessors).
          Every photo was assed three times by a minimum of three assessors to determine gender and age.
          The assessors would not only analysis the image but also the back ground information and
          comments (i.e. “My 21st birthday” or “celebrating Kate’s 30th”) Over the 3 month project this
          allowed us to obtain training data of an excellent quality for our machine learning.</p>
        <!-- <p name="0095" id="0095" class="graf graf--p graf-after--p">You can test how our result gender and age prediction works at <a href="/face-demo" class="markup--anchor markup--p-anchor" rel="nofollow noopener noopener noopener" target="_blank">face-demo</a>.</p> -->
        <p></p>
        <p name="31a0" id="31a0" class="graf graf--p graf-after--p">We compared the results from the
          operation of our system with the results of commercial Computer Vision systems on 3
          datasets:</p>
        <ol class="postList">
          <li name="9a30" id="9a30" class="graf graf--li graf-after--p">Our own dataset (a reserved part
            was used that was not involved in the training)
          </li>
          <li name="0f7a" id="0f7a" class="graf graf--li graf-after--li"><a href="https://talhassner.github.io/home/projects/Adience/Adience-data.html" data-href="https://talhassner.github.io/home/projects/Adience/Adience-data.html" class="markup--anchor markup--li-anchor" rel="noopener" target="_blank">Adience</a>
            (standard dataset for testing gender &amp; age classification tasks)
          </li>
          <li name="fdd6" id="fdd6" class="graf graf--li graf-after--li"><a href="https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/" data-href="https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/" class="markup--anchor markup--li-anchor" rel="noopener" target="_blank">IMDB-WIKI</a>.
            More modern and large compared to the Adience dataset. As it turned out, it contains a huge
            number of marking errors, so we discarded the IMDB data as totally unsuitable for accuracy
            determination, this left only the Wiki data. Also in the Wiki data, the gender marking was
            further adjusted where there were obvious errors.
          </li>
        </ol>
        <p name="598f" id="598f" class="graf graf--p graf-after--li">For the age, the <em class="markup--em markup--p-em">mean absolute percentage error</em> was measured, which
          represents the deviation from the true age as a percentage (the lower the percentage the closer
          to the real age):</p>
        <figure name="1fc0" id="1fc0" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder"><img class="graf-image" data-image-id="1*AG63xC4H5jXGldJtaWo2MQ.png" src="/img/articles/max_800_1_AG63xC4H5jXGldJtaWo2MQ.png">
          </div>
        </figure>
        <p name="580c" id="580c" class="graf graf--p graf-after--figure">Photos from social networks. The
          distribution of ages corresponds to the natural distribution in social networks.</p>
        <figure name="9699" id="9699" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 700px; max-height: 227px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 32.5%;"></div>
            <img class="graf-image" data-image-id="1*L2IrIFEDo68XrvoCSS_ZtQ.png" data-width="951" data-height="309" src="/img/articles/max_800_1_L2IrIFEDo68XrvoCSS_ZtQ.png"></div>
        </figure>
        <p name="eef8" id="eef8" class="graf graf--p graf-after--figure">For the tests, photos were taken of
          people in the age range of 13–44 years (the most relevant ages for Instagram). Also, photographs
          taken before 2008 were discarded since the stylistics and the way of manufacturing these photos
          (scans of analog carriers) differ from modern ones.</p>
        <figure name="7050" id="7050" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 700px; max-height: 241px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 34.4%;"></div>
            <img class="graf-image" data-image-id="1*R5gSie4N-NDkmmZ09O6z7w.png" data-width="953" data-height="328" src="/img/articles/max_800_1_R5gSie4N-NDkmmZ09O6z7w.png"></div>
        </figure>
        <p name="ea32" id="ea32" class="graf graf--p graf-after--figure">Adience is a dataset in which age
          is indicated in the form of a range. The <em class="markup--em markup--p-em">Age Accuracy</em>
          index is the percentage of hits of the predicted age in the correct range. <em class="markup--em markup--p-em">Age Accuracy One-Off</em> is the percentage of hits or
          the right range or the nearest two.</p>
        <figure name="8516" id="8516" class="graf graf--figure graf-after--p">
          <div class="aspectRatioPlaceholder is-locked" style="max-width: 700px; max-height: 258px;">
            <div class="aspectRatioPlaceholder-fill" style="padding-bottom: 36.8%;"></div>
            <img class="graf-image" data-image-id="1*-sPr-H3Me0xAmTo1iyJcqA.png" data-width="962" data-height="354" src="/img/articles/max_800_1_-sPr-H3Me0xAmTo1iyJcqA.png"></div>
        </figure>
        <p name="32c5" id="32c5" class="graf graf--p graf-after--figure">In this test, the age accuracy of
          our service was inferior to that of commercial services (number one is Azure). This is explained
          by the fact that Adience is an academic dataset in which all ages from 0 to 80 years are present
          in approximately equal proportion. We also trained our system for the age distribution observed
          in social networks in real life, which is very far from uniform (the age of 18–30 is dominant).
          Accordingly, on a uniform distribution, accuracy is worse with us because, if the neural network
          doubts the age, it prefers an age from the range of 18–30 years old with all else being
          equal.</p>
        <p name="0c66" id="0c66" class="graf graf--p graf-after--p">If the goal was to show a good result
          specifically on Adience, we would train our system on a sample with uniform sampling across all
          ages.</p>
      </div>
    </div>
  </section>
</template>

<script>
export default {
  name: "ArticleGenderAndAge"
}
</script>
