Gadde Sai Shailesh

Related Listings

Sales Data Analysis W...

0 comments, 1 review , 1 like
Satellite Image Detec...

0 comments, 0 reviews , 0 likes

Face Mask Detection

0 comments, 1 review , 647 views, 2 likes
Crowd Counting

0 comments, 2 reviews , 486 views, 3 likes

Major Concepts

Models Home » Domain Usecases » Others » Sign Language Detection

Sign Language Detection

Models Status

Model Overview

Sign-Language Detection

Introduction

Sign language recognition is a problem that has been addressed in research for years. However, we are still far from finding a complete solution available in our society.

Among the works developed to address this problem, the majority of them have been based on basically two approaches: contact-based systems, such as sensor gloves; or vision-based systems, using only cameras. The latter is way cheaper and the boom of deep learning makes it more appealing.

There are new and accessible technologies emerging to help those with hearing disabilities, there is still plenty of work to be done. For example, advancements in machine learning algorithms could help the deaf and hard-of-hearing even further by offering ways to better communicate using computer vision applications. Our project aims to do just that.

We sought to create a system that is capable of identifying American Sign Language (ASL) hand gestures. Since ASL has both static and dynamic hand gestures, we needed to build a system that can identify both types of gestures. This article will detail the phases of our project.

Dataset

American Sign Language Letters - v1 v1

This dataset was exported via roboflow.ai on October 20, 2020, at 4:54 PM GMT

It includes 1728 images. Letters are annotated in Tensorflow TFRecord (raccoon) format.

The following pre-processing was applied to each image:

Auto-orientation of pixel data (with EXIF-orientation stripping)

Resize to 416x416 (Stretch)

The following augmentation was applied to create 3 versions of each source image:

50% probability of horizontal flip

Randomly crop between 0 and 20 % of the image

Random rotation of between -5 and +5 degrees

Random shear of between -5° to +5° horizontally and -5° to +5° vertically

Random brightness adjustment of between -25 and +25 %.

Random Gaussian blur of between 0 and 1.25 pixels

Model Used

Before we begin the setup, make sure to change the runtime type in Colab to GPU so that we can make use of the free GPU provided.

There are many models ready to download from the Tensorflow Model Zoo.

Be careful in choosing which model to use as some are not made for Object Detection. For this tutorial we will be using the following model:

SSD MobileNet V2 FPNLite 320x320.

Download it into your Colab Notebook.

Let's Understand the Code

Let's Implement all the required Libraries

Here, we are trying to automate the whole process so that when a user uploads the images with their annotations they will receive the weights of the last checkpoint.

Here is my approach in order to automate the process.

How to upload and unzip the dataset?

We are uploading the dataset in a zip format to google colab and then unzip the folder so that we can use it.

dataset.zip contains 2 folders train and test.

train folder contains images and their respective XML files.

test folder contains images and their respective XML files.

How to automate the process?

We create some temporary folders in order to store the configuration file, the tfrecords, weights and everything required.

Defining The Paths

We define the paths of everything that we use in this project.

Installing The Required Libraries,

Let's install the required libraries.

How To Create LabelMaps From Xml & Generate Tf records?

Creating CSV and LabelMap.

xml_to_csv.py contains:

"""

Usage:

# Create train data:

python xml_to_csv.py -i [PATH_TO_IMAGES_FOLDER]/train -o [PATH_TO_ANNOTATIONS_FOLDER]/train_labels.csv



# Create test data:

python xml_to_csv.py -i [PATH_TO_IMAGES_FOLDER]/test -o [PATH_TO_ANNOTATIONS_FOLDER]/test_labels.csv

"""



import os

import glob

import pandas as pd

import argparse

import xml.etree.ElementTree as ET





def xml_to_csv(path):

    """Iterates through all .xml files (generated by labelImg) in a given directory and combines them in a single Pandas datagrame.



    Parameters:

    ----------

    path : {str}

        The path containing the .xml files

    Returns

    -------

    Pandas DataFrame

        The produced dataframe

    """

    classes_names = []

    xml_list = []

    for xml_file in glob.glob(path + "/*.xml"):

        tree = ET.parse(xml_file)

        root = tree.getroot()

        for member in root.findall("object"):

            classes_names.append(member[0].text)

            value = (

                root.find("filename").text,

                int(root.find("size")[0].text),

                int(root.find("size")[1].text),

                member[0].text,

                int(member[4][0].text),

                int(member[4][1].text),

                int(member[4][2].text),

                int(member[4][3].text),

            )

            xml_list.append(value)

    column_name = [

        "filename",

        "width",

        "height",

        "class",

        "xmin",

        "ymin",

        "xmax",

        "ymax",

    ]

    xml_df = pd.DataFrame(xml_list, columns=column_name)

    classes_names = list(set(classes_names))

    classes_names.sort()

    return xml_df, classes_names





def main():

    # Initiate argument parser

    parser = argparse.ArgumentParser(

        description="Sample TensorFlow XML-to-CSV converter"

    )

    parser.add_argument(

        "-i",

        "--inputDir",

        help="Path to the folder where the input .xml files are stored",

        type=str,

    )

    parser.add_argument(

        "-o", "--outputFile", help="Name of output .csv file (including path)", type=str

    )



    parser.add_argument(

        "-l",

        "--labelMapDir",

        help="Directory path to save label_map.pbtxt file is specified.",

        type=str,

        default="",

    )



    args = parser.parse_args()



    if args.inputDir is None:

        args.inputDir = os.getcwd()

    if args.outputFile is None:

        args.outputFile = args.inputDir + "/labels.csv"



    assert os.path.isdir(args.inputDir)

    os.makedirs(os.path.dirname(args.outputFile), exist_ok=True)

    xml_df, classes_names = xml_to_csv(args.inputDir)

    xml_df.to_csv(args.outputFile, index=None)

    print("Successfully converted xml to csv.")

    if args.labelMapDir:

        os.makedirs(args.labelMapDir, exist_ok=True)

        label_map_path = os.path.join(args.labelMapDir, "label_map.pbtxt")

        print("Generate `{}`".format(label_map_path))



        # Create the `label_map.pbtxt` file

        pbtxt_content = ""

        for i, class_name in enumerate(classes_names):

            pbtxt_content = (

                pbtxt_content

                + "item {{\n    id: {0}\n    name: '{1}'\n}}\n\n".format(

                    i + 1, class_name

                )

            )

        pbtxt_content = pbtxt_content.strip()

        with open(label_map_path, "w") as f:

            f.write(pbtxt_content)





if __name__ == "__main__":

    main()

Now let's create the tfrecords:

generate_tfrecord.py contains:

""" Sample TensorFlow XML-to-TFRecord converter



usage: generate_tfrecord.py [-h] [-x XML_DIR] [-l LABELS_PATH] [-o OUTPUT_PATH] [-i IMAGE_DIR] [-c CSV_PATH]



optional arguments:

  -h, --help            show this help message and exit

  -x XML_DIR, --xml_dir XML_DIR

                        Path to the folder where the input .xml files are stored.

  -l LABELS_PATH, --labels_path LABELS_PATH

                        Path to the labels (.pbtxt) file.

  -o OUTPUT_PATH, --output_path OUTPUT_PATH

                        Path of output TFRecord (.record) file.

  -i IMAGE_DIR, --image_dir IMAGE_DIR

                        Path to the folder where the input image files are stored. Defaults to the same directory as XML_DIR.

  -c CSV_PATH, --csv_path CSV_PATH

                        Path of output .csv file. If none provided, then no file will be written.

"""



import os

import glob

import pandas as pd

import io

import xml.etree.ElementTree as ET

import argparse



os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'    # Suppress TensorFlow logging (1)

import tensorflow.compat.v1 as tf

from PIL import Image

from object_detection.utils import dataset_util, label_map_util

from collections import namedtuple



# Initiate argument parser

parser = argparse.ArgumentParser(

    description="Sample TensorFlow XML-to-TFRecord converter")

parser.add_argument("-x",

                    "--xml_dir",

                    help="Path to the folder where the input .xml files are stored.",

                    type=str)

parser.add_argument("-l",

                    "--labels_path",

                    help="Path to the labels (.pbtxt) file.", type=str)

parser.add_argument("-o",

                    "--output_path",

                    help="Path of output TFRecord (.record) file.", type=str)

parser.add_argument("-i",

                    "--image_dir",

                    help="Path to the folder where the input image files are stored. "

                         "Defaults to the same directory as XML_DIR.",

                    type=str, default=None)

parser.add_argument("-c",

                    "--csv_path",

                    help="Path of output .csv file. If none provided, then no file will be "

                         "written.",

                    type=str, default=None)



args = parser.parse_args()



if args.image_dir is None:

    args.image_dir = args.xml_dir



label_map = label_map_util.load_labelmap(args.labels_path)

label_map_dict = label_map_util.get_label_map_dict(label_map)





def xml_to_csv(path):

    """Iterates through all .xml files (generated by labelImg) in a given directory and combines

    them in a single Pandas dataframe.



    Parameters:

    ----------

    path : str

        The path containing the .xml files

    Returns

    -------

    Pandas DataFrame

        The produced dataframe

    """



    xml_list = []

    for xml_file in glob.glob(path + '/*.xml'):

        tree = ET.parse(xml_file)

        root = tree.getroot()

        for member in root.findall('object'):

            value = (root.find('filename').text,

                     int(root.find('size')[0].text),

                     int(root.find('size')[1].text),

                     member[0].text,

                     int(member[4][0].text),

                     int(member[4][1].text),

                     int(member[4][2].text),

                     int(member[4][3].text)

                     )

            xml_list.append(value)

    column_name = ['filename', 'width', 'height',

                   'class', 'xmin', 'ymin', 'xmax', 'ymax']

    xml_df = pd.DataFrame(xml_list, columns=column_name)

    return xml_df





def class_text_to_int(row_label):

    return label_map_dict[row_label]





def split(df, group):

    data = namedtuple('data', ['filename', 'object'])

    gb = df.groupby(group)

    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]





def create_tf_example(group, path):

    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:

        encoded_jpg = fid.read()

    encoded_jpg_io = io.BytesIO(encoded_jpg)

    image = Image.open(encoded_jpg_io)

    width, height = image.size



    filename = group.filename.encode('utf8')

    image_format = b'jpg'

    xmins = []

    xmaxs = []

    ymins = []

    ymaxs = []

    classes_text = []

    classes = []



    for index, row in group.object.iterrows():

        xmins.append(row['xmin'] / width)

        xmaxs.append(row['xmax'] / width)

        ymins.append(row['ymin'] / height)

        ymaxs.append(row['ymax'] / height)

        classes_text.append(row['class'].encode('utf8'))

        classes.append(class_text_to_int(row['class']))



    tf_example = tf.train.Example(features=tf.train.Features(feature={

        'image/height': dataset_util.int64_feature(height),

        'image/width': dataset_util.int64_feature(width),

        'image/filename': dataset_util.bytes_feature(filename),

        'image/source_id': dataset_util.bytes_feature(filename),

        'image/encoded': dataset_util.bytes_feature(encoded_jpg),

        'image/format': dataset_util.bytes_feature(image_format),

        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),

        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),

        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),

        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),

        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),

        'image/object/class/label': dataset_util.int64_list_feature(classes),

    }))

    return tf_example





def main(_):



    writer = tf.python_io.TFRecordWriter(args.output_path)

    path = os.path.join(args.image_dir)

    examples = xml_to_csv(args.xml_dir)

    grouped = split(examples, 'filename')

    for group in grouped:

        tf_example = create_tf_example(group, path)

        writer.write(tf_example.SerializeToString())

    writer.close()

    print('Successfully created the TFRecord file: {}'.format(args.output_path))

    if args.csv_path is not None:

        examples.to_csv(args.csv_path, index=None)

        print('Successfully created the CSV file: {}'.format(args.csv_path))





if __name__ == '__main__':

    tf.app.run()

Or, there is another way to create Tf Records and apply Agumentation so that we can increase the size of the data.

We can do the following:

Create TFRecord ourselves

Upload the annotations to Roboflow and get the dataset in TFRecord Format.

Creating the TFRecords ourselves is a bit tedious as the XML created after annotating may sometimes vary, so for the sake of ease, I suggest using Roboflow to perform the above task. They also provide an option to perform additional Data Augmentation which will increase the size of the dataset.

How to make changes in pipeline.config file?

Copying the pipeline.config file of SSD MobileNet V2 FPNLite 320x320 to our model's folder so that we can change the configurations with respect to our use case and without making changes to our original pipeline config file.

Now, let's make changes to our pipeline_config file.

The most important ones we will need to change are -

batch_size is the number of batches the model will train in parallel. A suitable number to use is 8. It could be more/less depending on the computing power available.

A good suggestion given on StackOverflow is:

Max batch size= available GPU memory bytes / 4 / (size of tensors + trainable parameters)

fine_tune_checkpoint is the last trained checkpoint (a checkpoint is how the model is stored by Tensorflow).

If you are starting the training for the first time, set this to the pre-trained model.

If you want to continue training on a previously trained checkpoint, set it to the respective checkpoint path. (This will continue training, building upon the features and loss instead of starting from scratch).

Let's Train!

Now, let's train the model

Use the output generated from this print statement to train the model for 5000 steps.

Now, Let's complete the inference

Here I have added text to speech using pyttsx3 so that when the user shows a sign we could hear the output.

This could be helpful to people who don't know sign language.

First, Let's define the paths of our weights.

Now, Let's restore our weights

We can either upload images and test our model or else we can use our camera.

First Let's see how we can do it with our camera

We are using pyttsx3 for text to speech conversion.

Now let's see how can we do this using uploading images.