what is a critique of the feature detector model of object recognition? This is a topic that many people are looking for. amritsang.org is a channel providing useful information about learning, life, digital marketing and online courses …. it will help you have an overview and solid multi-faceted knowledge . Today, amritsang.org would like to introduce to you Object Detection: Part 1 | Student Competition: Computer Vision Training. Following along are instructions in the video below:
Topic of object detection will be covered in two parts in part 1. We will will learn about specific object detection. Where you can tell if an object is detected not and if detected you can find its location specifically.
We will look into how we can detect particular objects. By using template matching extracting. The histogram of gradient features using the cascade object detector and using the training image labeler app to select the region of interest in training images lets begin by looking into template matching template matching matches an actual image patch against an input image by sliding the patch over the input image.
The location of the template match loc can be obtained by setting up the system object vision dot. Template matcher and then using the steps in tax here with the grayscale input image. I gray and template t lets move to matlab and see how this works let us load a mat file bike template.
Which contains the template image of a bike. We can view it by using the imshow command. Now lets load an input bike image and view.
This as well recall that the template matter needs a grayscale image. So let us convert this rgb image to grayscale. We can now set up the vision dot template matcher system object.
There is a metric property envisioned or template matter which allows us to compute the difference between the original image pixels and the corresponding template image pixels by default. This is set to sum of absolute differences. Where you sum the absolute values of the difference between the corresponding pixels.
Some of the other options that can be used are sum of squared differences and maximum absolute difference. We can also see that the search method by default is set to exhaustive. Which performs template matching by shifting a template a single pixel increment throughout an image.
We need to set the search method property to three step to search for the minimum difference using a steadily decreasing step size. Although there is a risk of obtaining the wrong detection with this method. It is a lot less computationally expensive.
We can now calculate the location where the template matches the input image by using step on the vision door template matcher system object this returns the x and y coordinates of the location to visualize the location on the image. We can insert a marker in the image. This is very similar to inserting a shape and can be done by using insert marker.
The first input. It needs is the image. Where the marker needs to be inserted the second input is the exact location where to put the marker.
Which in our case is loc. We can also specify what shape and size to use for the marker in our case. This is a circle and size 10.
We can visualize the result a using the imshow command to confirm that the marker was indeed inserted in the correct location. Lets put all these commands in a script. If the command history section of matlab is not open in the matlab desktop.
We can access all our previously executed commands by clicking the up arrow key in the keyboard. Now we can select all the relevant commands needed and right click to create a script with them lets put in some comments and clean up the script a little bit. It is always a good idea to add comments to all scripts.
It helps others understand the code better and also helps us recall things quicker. If we havent viewed. The script in a while lets save the script as template matching.
We can now read in different input images using the scrip and see how well this method works for this image. We can see that the template matcher works and has correctly located the bike in the image. Lets try another image.
Once again the template. Matter has correctly located the bike. Lets try a different image for this image.
However. The method fails. This is because template matching is not scale invariant and fails for different scales of the object.
So lets take a look at another technique for object detection. Which is scale invariant.
The histogram of gradients technique counts occurrences of gradient orientation and localized portions of an image. The implementation of these hig descriptors can be obtained by first dividing the image into groups of cells note that a cell is simply a collection of neighboring pixels. Then for each cell.
A histogram of gradient directions or edge orientations for the pixels within the cell. Is compiled the combination of these histograms. Then represents the descriptor lets go back to matlab to help visualize what extracted hid features look like i have a template called a choji visualization.
Dot m. Lets use that to create a script lets first load the input image and view. It we will use the extract hig.
Features command to extract the hig features from the image. The first input. It needs is the image from which it will extract the features for the second input.
We can either provide specific location points from which we want the features alternatively. We can provide a cell size to use for the entire image. The default cell.
Size used is 8×8. Lets use 16. By 16 as the larger the cell size the easier.
It is to view the extracted features this command returns two outputs. The first one encodes local shape information from regions within an image and the second output is a visualisation matrix for image ai for a given cell size. Lets visualize.
The results by plotting viz 16. On top of the original image run. The script in this image.
We can see the gradient directions and edge orientations for the whole image. If we zoom in it is easier to see the vector fields representing the different gradient directions for different parts of the image. A choji features are very good at classifying objects with fixed aspect ratios.
Even if the scale varies for our bike. Example. The ratio of the width and height of a bike remains constant in spite of the change in scale.
So a choji features are ideal for bike detection. Theyre also very good for detecting upright human beings faces etc. A point to note about extracted features is the smaller the cell size the longer the encoded information returned so if extracting a choji features on the frames of a video.
We need to be aware of the amount of memory being used by the smaller cell sizes one way of implementing object detection using hoz features is the cascade object detector. Lets see the workflow for this the cascade object detector is so called because it does the detection over a number of cascaded stages. We first select a set of positive images and negative images to train the detector positive images mean any image.
Which contains the object of interest in it for our example. We have a set of images that contain bikes in them negative images mean any images that do not contain the object of interest the point to note about negative images is that they need to have similar backgrounds to that of positive images. They should not be plain black or white background or even images of other objects.
We provide these as input to the train cascade object detector. Which then starts training the classifier by sliding a window over the image and extracting. The hig features for that window automatically during the training at each stage of the classifier.
It labels. The region defined by the current location of the sliding window as either positive or negative positive indicates that an object was found and negative indicates no objects were found if the label is negative the classification of this region is complete and the classifier slides the window to the next location. If the label is positive the classifier passes.
The region to the next stage. This process continues till. It reaches the final state of the cascaded classifier.
The detector reports an object found at the current window location only when the final stage classifies. The region as positive to properties that can be specified by the user at the time of training at the number of cascaded stages to be used and the acceptable falls alarm rate for each stage. These two properties determine the effectiveness of the trained classifier for example using more cascaded stages provide more accurate result.
But takes longer to train the classifier once the training is complete a trained cascade object detector is stored in an xml file. This xml file can then be used by visioned or cascade object detector to detect the objects of interest in an image or video.
So the first step to create a cascade object detector is to provide a set of positive images to the trained cascade object detector system object to do this we will be using an app in matlab called the training image labeler app. The app allows you to interactively select rectangular regions of interests or ro eyes from a set of images the ro is define locations of objects. Which are used to train a classifier.
Lets move to matlab and see what this means in apps. Navigate to image processing and computer vision and select training image labeler. Lets add some images of bikes to this app.
And we can select the region of interest in these images manually once we have selected the region of interest we click on export roi. This creates a variable in the workspace. Lets call.
The variable biker. We can see the bike has been created in the workspace. This is a 1 by force truck.
Which contains the name of the image and the corresponding bounding box is selected by us in the app for images is not enough to train a classifier so i have a mat file by positive dot map. Which has the roi information from 346 images. The process to create the variable is the same as the one i showed for 4 images only with a much larger data set.
It is not necessary to complete specifying. The are wise for all the images in one session. The app allows us to save our session and load.
The session at a later time and continue from where we left off now that we have our struct variable in the workspace with all the positive our wise lets use the template script. See od training to perform the next steps we begin by loading the same bike positive mat file next we create a variable called negative folder. This will contain the location of all the negative images now that we have all the positive and negative images we can start training the cascade object detector as mentioned earlier.
We can specify two parameters the number of cascaded stages to be used for the training and the acceptable false alarm rate for each stage the default value for the number of stages is 20 lets start with something smaller like 5. The default for the acceptable false alarm rate is 50 lets set this to 5 percent. Now we use trained cascade object detector to start the training.
The first input. We need to provide is the name of the xml file. We want as output.
Then we provide the positive and negative bike images followed by the number of cascaded stages and the false alarm rate selected by us different xml files can be created by changing the number of cascaded stages and or the false alarm rate. So it is a good idea to incorporate these values in the name of the file training a cascade object detect can take a while so i will not be running. This script at this time.
I have previously run the script with different number of stages and false alarm rates to create 24 different xml files for the 24 files that were created it took about two hours the naming convention used for the files is first the number of stages which range from 5 to 15 followed by false alarm rate of ten seven point five five and two point five percent since it is not possible to put dots and file. Names 100 stands for. 10.
False lamb. Rate 75 stands for 75. And so on lets use the trained object detector stored in the xml files to see how well they can detect bikes from images and videos lets use template script a choji detect image to create a script to detect a bike in an image.
The first thing. We do is create a detector using vision dot cascade object detector. And specify which xml file to use lets first use the xml file for five stages with a false alarm rate of 10 next lets load in the image.
Which the template matcher had failed to identify correctly the bounding box. Can then be found by using step on vision dot. Casket object detector.
Once the bounding box is found. We can insert it in the image using insert shape and view the results using i am sure lets run. The script.
We can see that the bike has indeed been detected correctly. However there are quite a few false positives detected as well this is because the number of cascaded stages is only 5. Which is not very high lets change it to a higher value of 9 stages.
This definitely improves the result a lot. However there is still one false positive in the image. We can either increase the number of stages even higher to 11.
Which gives the desired result or as an alternative. We can keep the number of stages as 9 and try changing the false alarm rate to a stricter value from 10 to 25.
This also provides the desired result so either option is valid. Now lets use hig detect video to extend the image script to be used for videos. We begin the same way by creating a detector using vision.
Rod cascade object detector. And specifying which x m. File.
To use this time lets use five stages and five percent false alarm rate next we use vision dot video file reader to read in a video file since the detector looks for objects frame by frame it takes a while for the detector to work through the whole video file so rather than using a vision or deployable video player system objects like we have in the past. It is better to write the result of the detection to an output video file. Which we can view later this can be done by using vision or video file writer system object the input to vision rod video file writer is the name of the file.
We want as output next we add a while loop to check. If the video is done and as long as there are frames. Remaining keep updating eye with the next frame.
Then detect the bounding box. In that frame insert a shape for the bounding box in the frame. And write the updated image to the output file finally we release the detector video file reader and video file writer objects again running this script through the whole video.
File takes a while so the script was run offline. And the results have already been saved lets look at the results this one is using five cascaded stages with a false alarm rate of 5 and while it is able to detect the by correctly it is also providing a large number of false positives. This is the result of using nine cascaded stages.
And we can immediately see that the false positives have reduced significantly. The couple of false positives in the top right corner of the image are because some of the branches in the tree appear like spokes in a bike we can see the different number of stages and false alarm rates affect the accuracy of the results so how do we know which one to use as mentioned earlier. I created 24 different xml files using this script.
It took about two hours to create all the files because as the number of stages increased. It took longer and longer to train the detector. Then i created a validation folder with twelve images of bikes and twelve images of no bikes note.
These images were not in the training set. And wrote a simple validation script to test the effectiveness of the different classifiers first. I created a truth variable here every image that had a bike was assigned a value of 1 and every image with no bike was assigned a value of 0 for each xml file.
The detector looped through the 24 test images and calculated the number of true positives false positives false negatives and true negatives detected by each classifier i used a very simple algorithm if more than one byte was detected in a positive image. I assumed at least one of them is the correct location. But did not validate the position of the bounding box.
All the other detection in the image were considered false positives. This script can be made more accurate by adding these checks. Then i wrote a couple of scripts to visualize.
The results. Found this figure has four subplots true positive. Correctly detecting a bike false alarm or false positive detecting.
A bike in a negative image. Misses or false negative failing to detect a bike in a positive image and true negative. Correctly detecting no bikes in a negative image this figure has increasing false alarm rates in the x axis if we look at the blue line number of stages.
5. We can see that it was able to correctly detect all the positive images. But also give a lot of false alarm.
However it was hardly able to detect any true negatives. Another way to look at the same data would be to plot the increasing number of stages in the x axis as the number of stages increase. We see the false alarm rates go down and true negatives start getting detected correctly.
However the number of misses start to increase as well. So it can be seen that as the stage is increased the classifier goes from detecting nearly everything in the image is a by 2 getting more and more inclined to say. There is no bike in the image at all lets wrap up this discussion by talking about some best practices for training cascade.
Object. Detectors. Firstly.
Use as many samples as possible the more the number of training samples used the more robust. The trained classifier will be also try to trained with the same objects and backgrounds that youre trying to identify during a task example when training a character recognition algorithm in the suas competition use pictures of actual targets with similar ground textures finally do not use pictures of incorrect targets as negative images try to use backgrounds instead so to wrap up we learned how to detect specific objects in an image or video by using template matching extracting histogram of gradient features using the cascade object detector. And using the training image labeler app.
This concludes this video. .
Thank you for watching all the articles on the topic Object Detection: Part 1 | Student Competition: Computer Vision Training. All shares of amritsang.org are very good. We hope you are satisfied with the article. For any questions, please leave a comment below. Hopefully you guys support our website even more.