Search code examples
algorithmopencvface-detectionlbph-algorithm

Understanding OpenCV LBP implementation


I need some help on LBP based face detection and that is why I am writing this.

I have the following questions related to face detection implemented on OpenCV:

  1. In lbpCascade_frontal_face.xml (this is from opencv): what is internalNodes, leafValues, tree,features etc? I know they are used in the algorithm . But I do not understand the meaning of each one of them. For example, why we take a particular feature and not the other for a particular stage? how we are deciding which feature/ node to choose?
  2. What is feature values in the LBP_frontal_face_classifier.xml? I know they are a vector of 4 integers. But how should I use this features? I thought stage 0 access to the first feature but access is not in this pattern. What is the access pattern to this features?

  3. All the papers in literature give a high level overview only. Their descriptions mainly consist of LBP calculation from neighborhood pixels. But how this LBP values is used against those elements in the classifier?

  4. How does integral image help in calculating LBP value of a pixel? I know how HAAR is used. I need to understand LBP.

I read some papers, articles. But none clearly describes how LBP based face detection works or the algorithm in details. If someone wants to develop a face detection program on his own,what are the steps he should follow- no document describes that.

Please help me on these if you could. I would be grateful.


Solution

  • I refer you to my own answer from the past which lightly touches on the topic, but didn't explain the XML cascade format.

    Let's look at a fake, modified for clarity example of a cascade with only a single stage, and three features.

    <!-- stage 0 -->
    <_>
      <maxWeakCount>3</maxWeakCount>
      <stageThreshold>-0.75</stageThreshold>
      <weakClassifiers>
        <!-- tree 0 -->
        <_>
          <internalNodes>
            0 -1 3 -67130709 -21569 -1426120013 -1275125205 -21585
            -16385 587145899 -24005</internalNodes>
          <leafValues>
            -0.65 0.88</leafValues></_>
        <!-- tree 1 -->
        <_>
          <internalNodes>
            0 -1 0 -163512766 -769593758 -10027009 -262145 -514457854
            -193593353 -524289 -1</internalNodes>
          <leafValues>
            -0.77 0.72</leafValues></_>
        <!-- tree 2 -->
        <_>
          <internalNodes>
            0 -1 2 -363936790 -893203669 -1337948010 -136907894
            1088782736 -134217726 -741544961 -1590337</internalNodes>
          <leafValues>
            -0.71 0.68</leafValues></_></weakClassifiers></_>
    

    Somewhat later....

    <features>
      <_>
        <rect>
          0 0 3 5</rect></_>
      <_>
        <rect>
          0 0 4 2</rect></_>
      <_>
        <rect>
          0 0 6 3</rect></_>
      <_>
        <rect>
          0 1 4 3</rect></_>
      <_>
          <rect>
          0 1 3 3</rect></_>
    

    ...

    Let us look first at the tags of a stage:

    • The maxWeakCount for a stage is the number of weak classifiers in the stage, what is called in the comments a <!-- tree --> and what I call an LBP feature.
      • In this example, the number of LBP features in stage 0 is 3.
    • The stageThreshold is what the weights of the features must add up to at least for the stage to pass.
      • In this example the stage threshold is -0.75.

    Turning to the tags describing an LBP feature:

    • The internalNodes are an array of 11 integers. The first two are meaningless for LBP cascades. The third is the index into the <features> table of <rect>s at the end of the XML file (A <rect> describes the geometry of the feature). The last 8 values are eight 32-bit values which together constitute the 256-bit LUT I spoke of in my earlier answer. This LUT is computed by the training process, which I don't fully understand myself.
      • In this example, the first feature of the stage references rectangle 3, which is described by the four integers 0 1 4 3.
    • The leafValues are the two weights (pass/fail) associated with a feature. Depending on the bit selected from the internalNodes during feature evaluation, one of those two weights is added to a total. This total is compared to the stage's <stageThreshold>. Then, bool stagePassed = (sum >= stageThreshold - EPS);, where EPS is 1e-5, determines whether the stage has passed or failed. The weights are also determined by the training process.
      • In this example the first feature's fail weight is -0.65 and the pass weight is 0.88.

    Lastly, the <feature> tag. It consists of an array of <rect> tags which contain 4 integers describing the geometry of the feature. Given a processing window (24x24 in your case), the first two integers describe its x and y integer pixel offset within the processing window, and the next two integers describe the width and height of one subrectangle out of the 9 that are needed for the LBP feature to be evaluated.

    In essence then, a tag <rect> ft.x ft.y ft.width ft.height </rect> situated within a processing window pW.widthxpW.height checking whether a face is present at pW.xxpW.y corresponds to...

    https://i.sstatic.net/NL0XX.png

    To evaluate the LBP then, it suffices to read the integral image at points p[0..15] and use p[BR]+p[TL]-p[TR]-p[BL] to compute the integral of the nine subrectangles. The central subrectangle, R4, is compared that of the eight others, clockwise starting from R0, to produce an 8-bit LBP (the bits are packed [msb 01258763 lsb]).

    This 8-bit LBP is then used as an index into the feature's (2^8 = 256)-bit LUT (the <internalNodes>), selecting a single bit. If this bit is 1, the feature is inconsistent with a face; if 0, it is consistent with a face. The appropriate weight (<leafNode>) is then returned and added with the weights of all other features to produce an overall stage sum. This is then compared to <stageThreshold> to determine whether the stage passed or failed.

    If there's something else I didn't explain well enough I can clarify.