Shape detection techniques are an important aspect of computer vision and are used to transform raw image data into the symbolic representations needed for object recognition and location.
Photo by Aleksei Vasileika on Dribbble
In this article, a notebook is presented which contains the development of a system that detects four types of 3D shapes — Cube, Cylinder, Spheroid and Sphere.
The model used is built on top of mobilenet v1, leveraging the benefits of transfer learning inorder to build an a light weight but accurate CNN model.
Its implementation is done on the Cainvas Platform, which provides seamless execution of python notebooks to build AI systems which can eventually be deployed on the edge (i.e an embedded system such as compact MCUs).
Mobilenet v1 — Base Model
MobileNets are based on a streamlined architecture that uses depth-wise separable convolutions to build light weight deep neural networks.
The purpose behind using mobilenet for this use case is that, this project is intended to be deployed on mobile devices on the edge, hence making perfect sense to build a model based on a class of efficient models (MobileNets) that were pre-trained to suite deployment of Fine-Tuned DNN models for mobile and embedded vision applications.
Loading MobileNet:
base_model = MobileNet ( input_shape =
(
IMAGE_SIZE,
IMAGE_SIZE,
3
),
alpha=ALPHA,
depth_multiplier = 1,
dropout = 0.001,
include_top = False,
weights = "imagenet",
classes = 4,
backend = keras.backend,
layers = keras.layers,
models = keras.models,
utils = keras.utils
)
Here the model parameters are set as follows IMAGE_SIZE = 224 ALPHA = 0.75 EPOCHS=20
3D Shapes Dataset
Dataset used here is a custom extracted dataset with images of the size (224,224). It consists of 4 directories which contain the images corresponding to the 4 classes of shapes.
All the images used for Training and Testing are pre-processed as follows:
def prepare_image(file):
img = image.load_img(img_path+file, target_size=
(
IMAGE_SIZE,
IMAGE_SIZE
)
)
img_array = image.img_to_array(img)
img_array_expanded_dims = np.expand_dims (img_array, axis = 0)
return keras.applications.mobilenet.preprocess_input
(img_array_expanded_dims)
Visualizing a sample of the Training Dataset:
Building the Model — Transfer Learning
def build_finetune_model(base_model, dropout, fc_layers, num_classes):
for layer in b ase_model.layers:
layer.trainable = False
x = base_model.output
x = GlobalAveragePooling2D()(x)
for fc in fc_layers:
# New FC layer, random init
x = Dense(fc, activation='relu')(x)
x = Dropout(dropout)(x)
# New softmax layer
predictions = Dense(num_classes, activation='softmax')(x)
finetune_model = Model
(
inputs = base_model.input,
outputs = predictions
)
return finetune_model
FC_LAYERS= [100, 50]
dropout = 0.5
finetune_model = build_finetune_model
(
base_model,
dropout = dropout,
fc_layers = FC_LAYERS,
The model to be fine-tune is built by adding a few extra layers to the base mobilenet model. Here, we add 2 fully connected dense layers of 100 and 50 neurons respectively with a ‘relu’ activation function and a dropout of 0.5, to the last layer of mobilenet, and a final output layer for the predictions — which is another dense layer with 4 output neurons and ‘softmax’ activation function. (each neuron corresponding to an output class of shapes)
Training the Model — Fine-Tuning
Now that our transfer learning model is built, we can train (fine-tune) it on the dataset mentioned earlier using the keras ImageDataGenerator to preprocess the images even more to be suitable for our mobilenet model, thereby generating a training generator. (Code shown below)
train_datagen = ImageDataGenerator(preprocessing_functionc=cpreprocess_input)
train_generator=train_datagen.flow_from_directory
(
'3D_Shapes_Dataset',
target_sizec=c(IMAGE_SIZE,IMAGE_SIZE),
color_modec ='rgb',
batch_size = 32,
class_mode = 'categorical',
shuffle = True
)
The CNN model built earlier is now compiled with an adam optimizer, a categorical crossentropy loss and metric considered while training is the accuracy of the model.
The training generator defined is then fit into the model compiled as shown in the code below.
finetune_model.summary()
finetune_model.compile
(
optimizer='Adam',
loss='categorical_crossentropy',
metrics=['accuracy']
)
step_size_train = train_generator.n//train_generator.batch_size
history = finetune_model.fit_generator
(
generator=train_generator,
steps_per_epoch=step_size_train,
epochs = EPOCHS,
shuffle = True
)
finetune_model.save('shape_model.h5')
Summary of the model can viewed as the output before training is initiated within the notebook. Finally the model is saved after training is completed as a keras model (.h5).
Testing the Model
The model achieves an accuracy of 99% and since its classification of just geometric shape of the object, it does not overfit even at such high levels of accuracy.
The model is tested on real-world objects as well as internet images to better understand its capabilities.
Following are the results:
Conclusion
This 3D shape detection system can be used to classify objects even in real-time. Further advancements of this project would be to convert the keras CNN model to a minimal edge deployable model such as .tflite or .onnx inorder to deploy this onto an edge AIoT embedded module/MCU such as the OpenMV Cam or the Raspberry Pie. This deployment is possible through the Cainvas Platform by making use of their compiler called deepC. Thus effectively bringing AI out on the edge — in actual and physical real world use cases.
Source: Medium - Abdul Mannan Zafar
The Tech Platform
Comments