Once you have a *.vec file that you created from your positive images you are ready to train your classifier. OpenCV comes with a built in training utility

opencv_haartraining.exe -data "C:\My Documents\HaarClassifier\haarcascade" -vec "C:\My Documents\Positive Test Set\positives.vec" -bg "C:\My Documents\Negative Test Set\negatives.txt" -npos 1134 -nneg 625 -nstages 20

Explanation of the code above:

opencv_haartraining.exe is a utility distributed with the OpenCV library. This is usually contained int the β€œbin” folder in the directory you installed OpenCV.
ex. C:\OpenCV2.2\bin

Below are the options available for this utility as well as what the defaults are.

Usage: opencv_haartraining.exe
  -data <dir_name>
  -vec <vec_file_name>
  -bg <background_file_name>
  [-bg-vecfile]
  [-npos <number_of_positive_samples = 2000>]
  [-nneg <number_of_negative_samples = 2000>]
  [-nstages <number_of_stages = 14>]
  [-nsplits <number_of_splits = 1>]
  [-mem <memory_in_MB = 200>]
  [-sym (default)] [-nonsym]
  [-minhitrate <min_hit_rate = 0.995000>]
  [-maxfalsealarm <max_false_alarm_rate = 0.500000>]
  [-weighttrimming <weight_trimming = 0.950000>]
  [-eqw]
  [-mode <BASIC (default) | CORE | ALL>]
  [-w <sample_width = 24>]
  [-h <sample_height = 24>]
  [-bt <DAB | RAB | LB | GAB (default)>]
  [-err <misclass (default) | gini | entropy>]
  [-maxtreesplits <max_number_of_splits_in_tree_cascade = 0>]
  [-minpos <min_number_of_positive_samples_per_cluster = 500>]

From my example of the usage I just kept everything as the default, aside from my number of positives and negatives and how many stages I wanted, typically the larger the number of stages you will achieve a better false alarm rates but it will take more time generating your final cascade. It does not make sense to increase the number of stages when you have a small number of positive and negative samples. The system will attempt to build a classifier with the desired hit rate, then it will calculate its false alarm rate and if the false alarm rate is higher than the max false alarm rate it will reject the classifier and will build the next classifier.

A small excerpt from the training output:

Tree Classifier
Stage
+---+
|  0|
+---+

   0

Parent node: 0

*** 1 cluster ***
POS: 282 283 0.996466
NEG: 155 0.54007
BACKGROUND PROCESSING TIME: 0.03
Precalculation time: 0.07
+----+----+-+---------+---------+---------+---------+
|  N |%SMP|F|  ST.THR |    HR   |    FA   | EXP. ERR|
+----+----+-+---------+---------+---------+---------+
|   1|100%|-|-0.874803| 1.000000| 1.000000| 0.139588|
+----+----+-+---------+---------+---------+---------+
|   2|100%|+|-0.722001| 1.000000| 0.774194| 0.105263|
+----+----+-+---------+---------+---------+---------+
|   3| 91%|-|-0.179203| 0.996454| 0.303226| 0.155606|
+----+----+-+---------+---------+---------+---------+
Stage training time: 59.56
Number of used features: 3

Parent node: 0
Chosen number of splits: 0

Total number of splits: 0

N – Current feature for this cascade
%SMP – Percentage of samples used
F – ‘+’ if symmetry is specified
ST.THR – Stage Threshold
HR – Hit Rate based on the ST.THR (hitrate/numpos)
FA – False Alarm based on ST.THR (falsealarm/numneg)
EXP.ERR – Strong Classification of Adaboost Algorithm, based on threshold

One thing observed is that while in the training phase you will go to some forums where the trainer has been running for more than 4 days and some people saying that it took them a number of weeks, well sometimes the trainer will get into an infinite loop state and as a user you will never know that its really not running. SO, after finding this out when I began wondering why my trainer was still on the same training stage for 4 days. So for example if you specify 20 stages and say at stage 17 your NEG: 365 2.52601e-006 rate gets about to or below the rate it will just sit there, and you must exit out. After finding this out I went into the source code and made some modifications so it will not get into this state as well as tailored it to my wants and needs.

After the training completes successfully it will generate an XML that is your combined cascade of all the weak classifiers.

A small excerpt from the xml output:

<?xml version="1.0"?>
<opencv_storage>
<haarcascade8 type_id="opencv-haar-classifier">
  <size>
    24 24</size>
  <stages>
    <_>
      <!-- stage 0 -->
      <trees>
        <_>
          <!-- tree 0 -->
          <_>
            <!-- root node -->
            <feature>
              <rects>
                <_>
                  0 6 18 7 -1.</_>
                <_>
                  9 6 9 7 2.</_></rects>
              <tilted>0</tilted></feature>
            <threshold>-7.8396856784820557e-002</threshold>
            <left_val>6.6973441839218140e-001</left_val>
            <right_val>-6.9123142957687378e-001</right_val></_></_>
        <_>

Tags: , ,

2 Comments on Training a Haar Classifier

  1. ckmoon says:

    Thank you for good blog content. When I traing, my trainer get into an infinite loop state. you say that “After finding this out I went into the source code and made some modifications”.

    I really have a question to you.
    ” What is your code? and then where insert the code?”

    I Suffer from this problem for several days.

    I look forward to the answer.

    Thank you.

  2. Amin says:

    Hi
    I do exactly the things that you do but when I try to run it I get :
    “opencv_haartraining has stoped working”

    Anything that helps . I really need it , Thank U πŸ™‚

Leave a Reply to Amin Cancel reply