[OpenR8 solution] Image_40Labels_SSD512_Caffe (Image analysis uses the SSD 512 algorithm and the Caffe library for text detection in up to 40 categories)
  1. Image_40Labels_SSD512_Caffe

 

This Image_40Labels_SSD512_Caffe is in the framework of deep learning Caffe, first using the SSD (Single Shot Multibox Detector) algorithm to train the model, and then through the trained model of text detection, can be up to 40 category classification, this training picture size is 512 x 512.

 

The main process is shown in Fig. 1 below.

 

First, we need to prepare the image that we want the model to learn, select the target box in the image, and what category is the box tag that the box selects, in order to let the model know what the object in this image belongs to.

 

Then, through a series of flow files, two txt list files are generated to let the model know which files are for training testing, and which categories are divided into these files, and then perform flow file training for two txt list files, after the training is completed according to want to test the image to select different flow file tests.

 

Fig. 1.SSD detection text flow.png

Fig. 1.SSD detection text flow.

 

 

  1. Step 1: Pre-processing - Areas of interest to mark

 

  1. Purpose:

Prepare the box to select the area of interest for the marker.

 

  1. Introduction to the content:

In the Image box, select the object that you want the model to learn. For example, if you want the model to learn text, prepare an image of the text, and then select the text box in the image and mark it as text. As a result, the model will know that the area of the picture frame is text. By analogy, if you want the model to learn something, prepare the image and frame to mark it.

So the files we need to prepare contain the following two items

(1). Images (images that you want the model to learn, images that want to test the accuracy of the model)

(2). An xml file that marks the target location of an image (a file that is automatically generated when the box is selected)

 

  1. Example: In this example of a Image_40Labels_SSD512_Caffe scenario:

To detect the text in the image, we mark the target text box in the image through labelImg.exe (lower Fig. 2), as follows Fig. 3 red box is the category in which the box selects text and marks it as text. When the tag is complete, the .xml file is automatically generated when stored.

Place the image in the specified folder location. The following Fig. 4.

 

【Training】

Image please place

OpenR8/solution/Image_40Labels_SSD512_Caffe/data/train_image

xml please place

OpenR8/solution/Image_40Labels_SSD512_Caffe/data/train_annotation

 

【Test】

Image please place

OpenR8/solution/Image_40Labels_SSD512_Caffe/data/test_image

xml please place

OpenR8/solution/Image_40Labels_SSD512_Caffe/data/test_annotation

 

  1. Additional Instructions:

Tagged software labelImg.exe has been attached with the OpenR8 file, its file path is OpenR8 > solution > Image_40Labels_SSD512_Caffe > labelImg.exe, the following Fig. 2. The method of use can refer to the Open Source Robot Club [ezAI simple AI] labelImg usage method (Windows version).

 

Fig. 2.Location of the volume label tool.png

Fig. 2.Location of the volume label tool.

 

Fig. 3.Mark the target object in the image.png

Fig. 3.Mark the target object in the image.

 

Fig. 4.Training sample folder volume mark finished image and xml file placement location.png

Fig. 4.Training sample folder volume mark finished image and xml file placement location.

 

 

  1. Step 2: Pre-processing - Set up a txt file for the file location

 

Fig. 4.1Training sample folder volume mark finished image and xml file placement location.png

 

  1. Purpose:

Defines the location of the image and its markup for training and testing.

 

  1. Introduction to the content:

(1) If the computer has a display adapter installed, click "R8_Python3.6_GPU.bat" to open the "1_prepare_train_txt.flow".

If the computer does not have a display adapter installed, click "R8_Python3.6_CPU.bat" to open the "1_prepare_train_txt.flow".

 

Fig. 5.Location of R8_Python3.6_GPU.bat and R8_Python3.6_CPU.png

Fig. 5.Location of R8_Python3.6_GPU.bat and R8_Python3.6_CPU.bat.

 

Fig. 6.Open the 1_prepare_train_txt.flow.png

Fig. 6.Open the 1_prepare_train_txt.flow.

 

(2) Confirm the file path of the training sample and the marking file described below, after confirmation, you can press Run, produce train.txt, appear "Press any key to continue ...", indicating success in producing train.txt.

 

At present, the path of the training sample is fixed in the solution/Image_40Labels_SSD512_Caffe/data/train_image

At present, the path of the training sample marker file is fixed in the solution/Image_40Labels_SSD512_Caffe/data/train_annotation

 

【train.txt】The resulting position is OpenR8/solution/Image_40Labels_SSD512_Caffe/data/train.txt

 

Fig. 7.1_prepare_train_txt.flow produces train.txt results of the operation.png

Fig. 7.1_prepare_train_txt.flow produces train.txt results of the operation.

 

(3) If the computer has a display adapter installed, click "R8_Python3.6_GPU.bat" to open the "2_prepare_test_txt.flow".

If the computer does not have a display adapter installed, click "R8_Python3.6_CPU.bat" to open the "2_prepare_test_txt.flow".

 

Fig. 8.Open the 2_prepare_test_txt.png

Fig. 8.Open the 2_prepare_test_txt.flow.

 

(4) Confirm the test sample described below, the location of the test mark file, after confirmation can press Run, produce test.txt, the appearance of "Press any key to continue ...", indicating the success of the test.txt.

 

At present, the path of the current test sample is fixed in the solution/Image_40Labels_SSD512_Caffe/data/test_image

At present, the path of the test sample marker file is fixed in the solution/Image_40Labels_SSD512_Caffe/data/test_annotation

 

Fig. 9.2_prepare_test_txt.flow produces test.txt results of the operation.png

Fig. 9.2_prepare_test_txt.flow produces test.txt results of the operation.

 

Fig. 10.Location of train.txt and test.png

Fig. 10.Location of train.txt and test.txt.

 

Fig. 11..png

Fig. 11.train.txt's content diagram.

 

Fig. 12..png

Fig. 12.test.txt's content diagram.

 

 

  1. Step 3: Pre-processing - Set up volume label category txt file

 

Fig. 12.1.png

 

  1. Purpose:

The last step is to create a list of files to tell the model which files are to be trained and tested.

Then set up the category txt, which is to tell the model which categories of our data to divide into, so that the model can learn to classify the images on its own.

 

  1. Introduction to the content:

Please create a "labelmap.prototxt" file under the OpenR8/solution/Image_40Labels_SSD512_Caffe/data path. The file content format is as follows:

item {

name: "none_of_the_above"

label: 0

display_name: "background"

item {

name: "Category volume label name"

label: 1

display_name: "Display the name of this category"

……

item {

name: "Category volume label name"

label: n

display_name: "Display the name of this category"

 

Please note: The above "name" "display_name" must be marked with the "mark area of interest mark" the category name and quantity of the xml match. For example: when marking, divided into cats, dogs two categories, then the above "name" "display_name" must be consistent with the mark cat, dog category naming, such as Fig. 13.

 

  1. Example:

At present, the Image_40Labels_SSD512_Caffe example in the OpenR8 folder labelmap.prototxt has been built, as described in the following text.

 

In the OpenR8/solution/Image_40Labels_SSD512_Caffe/data folder, you can see labelmap.prototxt, which can be turned on by notepad++ or notepad to open the file. The following Fig. 14.

 

The contents of the file are as follows Fig. 15, we can see that we classify the images into none_of_the_above (none of the above) as well as A, B, C, D, E, F ... Z with 0, 1, 2, 3 ... 9, a total of 37 categories, of which individual label numbers 0, 1, 2, 3 ... 36.

 

Fig. 14..png

Fig. 14.labelmap.prototxt file path.

 

Fig. 15..png

Fig. 15.labelmap.prototxt content diagram.

 

 

  1. Step 4: Pre-processing - annoset_to_lmdb

 

Fig. 15.1.png

 

This step is to convert the txt that has been defined in the previous two steps into the lmdb file required for subsequent training and testing of the model.

 

  1. Open annoset_to_lmdb flow file

If the computer has a display adapter installed, please "mouse double-click R8_Python3.6_GPU.bat", do not install the display adapter, please "mouse double-click

R8_Python3.6_CPU.bat" => click "File" => "Open" => "OpenR8 > solution under the Image_40Labels_SSD512_Caffe "=>" 3_annoset_to_lmdb.flow ". The schematic is as follows Fig. 16Fig. 17Fig. 18Fig. 19.

 

For any questions about opening the software to the load solution, refer to the "OpenR8 operating manual".

 

Fig. 16.R8_Python3.6_GPU.bat and R8_Python3.6_CPU.png

Fig. 16.R8_Python3.6_GPU.bat and R8_Python3.6_CPU.bat.

 

Fig. 17.File Open.png

Fig. 17.File Open.

 

Fig. 18.Select the 3_annoset_to_lmdb.flow of the Image_40Labels_SSD512_Caffe folder.png

Fig. 18.Select the 3_annoset_to_lmdb.flow of the Image_40Labels_SSD512_Caffe folder.

 

Fig. 19.Loading 3_annoset_to_lmdb.flow.png

Fig. 19.Loading 3_annoset_to_lmdb.flow.

 

  1. Run the annoset_to_lmdb process files

After the parameter is confirmed, click Run. If there is an old lmdb file, it will jump out or remove, please click "Yes", the following Fig. 20.

 

When the execution is complete, train_lmdb and test_lmdb are generated. The following Fig. 21Fig. 22.

 

Fig. 20.Delete lmdb file.png

Fig. 20.Delete lmdb file.

 

Fig. 21.Run complete.png

Fig. 21.Run complete.

 

Fig. 22.Generate train_lmdb and test_lmdb.png

Fig. 22.Generate train_lmdb and test_lmdb.

 

  1. annoset_to_lmdb process file parameter description

 

Fig. 23.Input file left turn to output file right .png

Fig. 23.Input file (left) turn to output file (right).

 

This item sets the path of the input file and the path of the output lmdb, whose file items are as above Fig. 23. The parameter path is by default and can be skipped directly to 3.

 

Directly executed.

 

The following is an introduction to each process order for this flow file, from the previous to the following respectively:

 

Caffe_Init: Caffe frame initialization.

 

File_DeleteDir (Fig. 24): When the old lmdb file folder exists in the folder, you need to remove the old file before you start generating a new lmdb. (For training purposes)

As shown below, the process area selects "File_DeleteDir" to appear in the variable list, the function list appears the parameters of the process, click "dirName (String)" Edit, the variable of this folder name and display in the variable area, the "Value" of this area, "data/train_lmdb/" is the folder for lmdb.

 

Fig. 24.File_DeleteDir.png

Fig. 24.File_DeleteDir.

 

Caffe_ObjectDetect_CreateTrainData (Fig. 25): Generate lmdb files. (For training purposes)

 

Fig. 25.Caffe_ObjectDetect_CreateTrainData.png

Fig. 25.Caffe_ObjectDetect_CreateTrainData.

 

The last two "File_DeleteDir" and "Caffe_ObjectDetect_CreateTrainData" are as follows: In order to produce the lmdb used in the test model, set the way such as the lmdb of the training generated above.

 

 

  1. Step 5: Training Models - train

 

Fig. 25.1Caffe_ObjectDetect_CreateTrainData.png

 

The introduction is divided into two parts, the 1th first describes how to start training, and then if you are interested in knowing that each process block can be viewed in the 2nd to describe the details of this process.

 

  1. When the pre-preparation data is complete, start training the model.

First, use the OpenR8 "open 4_train.flow" file, load the 4_train.flow, as follows Fig. 26Fig. 27.

 

Next, verify that the computer supports GPU acceleration (when GPU acceleration is not supported, go to the GPU field of "Caffe_Train", change the value from "all" to ""), press "Run" to start the training model, this step takes a little time to wait for the program to build the model. The following Fig. 28.

 

When this execution is complete, a trained model is generated, and its model files are generated and placed, as follows, Fig. 29.

Path : OpenR8/solution/Image_40Labels_SSD512_Caffe/data/

File name 1 : xxx.caffemodel

File name 2 : xxx.solverstate

 

Fig. 26.4_train.flow file path.png

Fig. 26.4_train.flow file path.

 

Fig. 27.Open the 4_train.flow.png

Fig. 27.Open the 4_train.flow.

 

Fig. 28.Run the 4_train.flow and its execution process diagram.png

Fig. 28.Run the 4_train.flow and its execution process diagram.

 

Fig. 29.The production model is finished after training.png

Fig. 29.The production model is finished after training.

 

  1. 4_train process introduction:

 

►Caffe_Init : Caffe frame initialization.

 

►Caffe_Train :

【CaffeObject】The setting that follows the previous initialization.

【GPU】Whether the device used to train the model needs to use GPU acceleration, if so, change the value to all; Conversely, fill in the blanks. (If you fill in all, make sure the appliance has a GPU)

【solverPath】The main parameter. This is the input file solver.prototxt required for Caffe training, which contains the number of training iterations, training parameters, and a list of samples. The following Fig. 30Fig. 31.

【continueTrainModelPath】Continue training with previously trained models. Fill in the value of the model path that you want to train, and the model extension must be .caffemodel.

 

Fig. 30.Caffe_train of the 4_train.flow process.png

Fig. 30.Caffe_train of the 4_train.flow process.

 

Fig. 31.solver.prototxt content diagram.png

Fig. 31.solver.prototxt content diagram.

 

 

  1. Step 6: Testing models for each training phase

 

  1. Purpose: Used to test the accuracy of each stage model, from which to find the appropriate model, to avoid training over the preparation of training samples, resulting in poor test sample results.

 

  1. Introduction to the content: 

 

(1) Training sample: "Open the OpenR8" => "Open and load the 5_train_result.flow" => "Perform the flow" => "Produce the train_result.txt"

 

(2) Test sample: "Open the OpenR8" => "Open and load the 6_test_result.flow" => "Perform the flow" => "Generate the test_result.txt"

 

(3) Select the appropriate training number model:

Such as Fig. 32, from the train_result.txt and test_result.txt two content can be observed, the model in a certain number of training times the results can be accepted, or after a certain stage has reached convergence. Therefore, it is more appropriate for us to choose the model of this number.

 

Fig. 32.train_result.txt and test_result.txt content.png

Fig. 32.train_result.txt and test_result.txt content.

 

 

  1. Step 7: Test a trained model – inference_image

 

Fig. 32.1train_result.txt and test_result.txt content.png

 

When the model has been trained, you can test the trained model with this step.

 

First use the OpenR8 to open "7_inference.flow" and load the file. The following Fig. 33Fig. 34.

  1. Test flow

 

►Select the image you want to test, as follows Fig. 35.

Replace the selected image (1-1.png) with the following Fig. 38, showing the detection results for 1-1.png.

 

►Verify that the computer supports GPU acceleration. Click "Edit" of the "Caffe_ObjectDetect_ReadNet" GPU, and "all" of "Value" in the variable area is changed to "" To cancel the use of GPU acceleration.

 

►Select a model with good training effect to detect the picture. Click "Caffe_ObjectDetect_ReadNet" caffeModelPath "..." To select the training model (.caffemodel file) you want to use.

 

►View the result message. Press Run to detect the coordinate position of the target in the object and its category through the trained model. The following Fig. 36.

 

►View the results messages and images. If you want to see the results box detected above the image, press Debug.

Display the results of object detection. The following Fig. 37, the image has the detection of text and the box is selected.

 

※About "Choosing a model with better training effect" can use "5_train_result.flow" and "6_test_result.flow" to measure the classification results of all caffemodel files to train_image and test_image, and can root according to the classification results, decide which training model to choose.

 

Fig. 33.7_inference.flow.png

Fig. 33.7_inference.flow.

 

Fig. 34.Loading 7_inference.png

Fig. 34.Loading 7_inference.

 

Fig. 35.7_inference select test images.png

Fig. 35.7_inference select test images.

 

Fig. 36.7_inference run results.png

Fig. 36.7_inference run results.

 

Fig. 37.7_inference Debug results.png

Fig. 37.7_inference Debug results.

 

Fig. 38.7_inference another image test result.png

Fig. 38.7_inference another image test result.

 

  1. Introduction to inference.flow process content

 

►Caffe_Init: Caffe frame initialization.

 

►Image_Open:

【imageFileName (String)】To infer the image path of the test.

【image (Image)】After reading the image to be inferred according to the image file case name above, save this variable.

 

►Caffe_ObjectDetect_ReadNet

【CaffeObject (Object)】

【GPU (String)】Whether to open the GPU acceleration.

【deployPath (String)】Read the network structure model. datadeploy.prototxt.

【caffeModelPath (String)】Read the trained model (extension of .caffemodel).

【labelPath (String)】Read the category file used when tagging with labelImg.exe. datapredefined_classes.txt.

【meanFilePath (String)】

 

►Caffe_ObjectDetect_InferenceImage

【CaffeObject (Object)】

【OutputResult (Json)】Read the category file used when tagging with labelImg.exe. datapredefined_classes.txt.

【GPU (String)】Whether to open the GPU acceleration.

【image (Image)】An image to infer.

【confidence Threshold (Float)】The trust threshold.

 

►Json_Print

【json (Json)】Display the test results message for Caffe_ObjectDetect_InferenceImage.

 

►Image_DrawRectJson

【image (Image)】An image to infer.

【json (Json)】Inference of the result message.

【imageDrawRectJson (Image)】Draw the inferred result message to the image to be inferred.

 

►Debug_Image

【image (Image)】Display an image of a message that has been drawn on the inference result.

【displayPercentage (Int)】Display the percentage of the graph. A value of 200 means 200 (the picture is magnified one times), 50 means 50 (the picture is halved), and so on, zoom in and out.

【windowTitle (String)】Display the title of the diagram window.

 

  1. If you want to infer the test results of multiple images, you can open the 8_inference_folder.flow file to use. Place multiple images to be inferred in a custom folder, such as Fig. 39 as shown. Set the image folder path that you want to infer, such as Fig. 40, press run to start multiple image inferences, and if you want to see the results box detected above the image, press Debug. The 8_inference_folder.flow process content settings can be referred to the 7_inference.flow set way.

 

Fig. 39.Image folder to infer.png

Fig. 39.Image folder to infer.

 

Fig. 40.Inference of test results for multiple images.png

Fig. 40.Inference of test results for multiple images.

 

 


Recommended Article

1.
OpenR8 - AI Software for Everyone (Download)