Hand-Based Biometric System Using Convolutional Neural Networks

Today, data security is an increasingly hot topic, and thus also the security and reliability of end-user identity verification, i.e. authentication. In recent years, banks began to substitute password authentication by more secure ways of authentication because passwords were not considered to be secure enough. Current legislation even forces banks to implement multifactor authentication of their clients. Banks, therefore, consider using biometric authentication as one of the possible ways. To verify a user's identity, biometric authentication uses unique biometric characteristics of the user. Examples of such methods are facial recognition, iris scanning, fingerprints, and so on. This paper deals with another biometric feature that could be used for authentication in mobile banking applications; as almost all mobile phones have an integrated camera, hand authentication can make a banking information system more secure and its user interface more convenient. Although the idea of hand biometric authentication is not entirely new and there exist many ways of implementing it, our approach based on using convolutional neural networks is not only innovative, but its results are promising as well. This paper presents a modern approach to identifying users by convolutional neural networks when this type of neural network is used both for hand features extraction and bank user identity validation.


Introduction
In general, security is a topical issue both for the public and the private sectors (Fuka, Baťa & Lešáková, 2017). By a directive of the European Banking Authority (EBA) on payment services in the internal market (PSD2), banks are forced to implement a double authentication method for payment transaction verification. In the past, it was enough for a bank to authenticate its electronic banking client only in one way (often by a static password or by a one-time password sent by SMS). Nowadays, however, at least two independent authentication methods must be used. It means that authentication must be implemented by combining two of the three authentication options: knowledge-based (what you know), token-based (what you have), and biometrics (what you possess) (Joshi, 2012).
Higher e-banking security requirements, which are simultaneously enforced by corresponding legislation, force banks to look for new ways of verifying their clients' identities. Therefore, banks often consider implementing biometric systems in their banking information systems, whose part is a mobile application through which the bank's clients control their bank accounts. By definition, biometric systems use unique biometric characteristics to determine a person's identity (Clodfelter, 2010). Generally, these biometric characteristics can be split into two categories based on their nature: physiological (e.g. fingerprint, iris, face, hand) and behavioral characteristics (e.g. signature, gait, gesture) (Faundez-Zanuy, 2006). In practice, biometric systems can work in two modesverification mode or identification mode. The goal of verification mode is to determine whether the authenticated user has provided a true identity, based on a comparison with the user's record in the database. In identification mode, a user does not claim his or her identity; instead, their identity is automatically determined based on a comparison with the records of all users in the database (Jain et al., 2000). Biometric systems are considered to be highly secure and practical compared to other methods (Prabhakar, 2003;Unar, 2014).
Of course, banks' concern is not only increasing the security of their information systems (double authentication) but also the convenience of the user interface of the applications through which their clients control their bank accounts. At present, banks often allow their clients to sign in the mobile banking application by a fingerprint. Fingerprint readers are quite often integrated into present-day mobile devices but are still not standard equipment for cheaper mobile devices. Also, with this method of verification, certain groups of users, especially the elderly, have problems. Unlike the fingerprint reader, an integrated camera is quite common in today's mobile devices.
The integrated camera on mobile devices can be used to recognize a person by face or hand. Face recognition encounters security issues; for example, there are cases when an impostor signed in by using a freely available photograph of the victim's face. Moreover, using a face as a biometric feature requires a front camera called a selfie camera. If we want to maximize the number of users for whom biometric authentication will be accessible on a mobile device, it is preferable to choose the hand as a biometric characteristic. We believe that the use of these integrated cameras represents a considerable potential for hand-based biometric systems, and we focus on increasing the reliability of this way of verifying end-user identity.

Current hand-based biometric systems
In the relevant current literature on hand-based biometric systems, we encounter the traditional model of a biometric system, where images of the hand are obtained by using a camera or by a scanner in the first step. Characteristic features are then extracted for subsequent comparison with the saved templates.
The biometric sample is most often obtained by a CCD camera , Duta, 2009 or by a scanner (Gonzalez et al., 2003, Ferrer et al., 2007. Most hand-based biometric systems work with 2D hand images (Xu, 2013). However, there are also hand-based biometric systems emerging that work with 3D models which can be obtained by using two cameras, mirrors (Sanchez-Reillo 2000), or by using a commercially available 3D digitizer (Ilua et al., 2014).
When the hand image is obtained, the image is pre-processed, and the features are extractedthus creating a feature set. A requirement for these features is that they have the lowest internal variability and the highest external variability. It means that they are as stable as possible for a given user and that they differ as much as possible between different users. Feature sets are obtained mainly by traditional methods when geometric measurements are used: the width and length of the fingers, finger area, the radii of circles on the fingers and palm. The number of measurements can vary, commonly from 13 to 50 measurements is used (Morales et al., 2008, Adán et al., 2008. The following traditional method extracts the shape of the hand by the boundary-selection algorithm for silhouette extraction (Fouquier et al., 2008). Feature extraction is followed by classification in which statistical methods based on different distances can be use as classifiers, e.g. Euclidean (Charfi, 2015), Hamming (Sanchez-Reillo, 2000), or Gaussian mixture model , or alternatively the Active Appearance Model (Gross et al., 2007). Moreover, it is possible to use Machine Learning algorithms using the method of k-nearest neighbor (Osslan et al., 2011), Support Vector Machines (Marcos 2005, Shanmukhappa & Sanjeevakumar 2016), or use neural networks, e.g., BPNN (Firas et al., 2014). The latest methods used to extract biometric features include the use of Siamese convolutional neural networks. These networks have a symmetrical structure and share the values of some parameters. (Ungureanu et al., 2020)

Research objective and methodology
So far, deep learning neural networks have been used for feature extraction in biometric systems (Qin, & Yacoubi, 2017, Baldominos et al., 2018, and convolutional neural networks can be included among them. The objective of this paper is to increase the security of information systems by proposing, validating, and evaluating a modern approach to hand-based biometric systems based on convolutional neural networks. This type of neural network is used not only for feature extraction but both for hand feature extraction and user identity validation. The primary basic concept (Prihodova, 2019) will be extended to further types of convolutional neural networks, and their parameters will be optimized in accordance with user validation accuracy.
Generally, convolutional neural networks can be used for features extraction to images classification. The problem of identification can be understood as a type of classification problem. The typical architecture of the convolutional neural network is shown in Figure 1. The architecture of a convolutional neural network comprises the input layer and the hidden layers with a different function.
The first hidden layer is a convolutional layer with activation function, for example Relu. This convolutional layer extracts functions and serves for color and edge detection. Deeper convolution layers detect more complex tasks. The second hidden layer is the pooling layer. The tie layer reduces the image and is followed by other convolution layers with an activation function followed by a pooling layer. At the end of the convolutional neural network are fully connected layers with activation function Softmax, used for classification.
For our convolutional neural network learning and testing, the acquired data (hand images) will be split to training and testing data by using the hold-out method, where the data will be divided at a rate of 80% training data and 20% test data. Images of users' hands will be resampled using the nearest neighbor interpolation method to the required size, which is to be the input for individual convolutional neural networks.
Convolutional neural networks will then be used for the identification of particular persons. Firstly, the convolutional neural networks will be taught on training data by optimizing the Learning Rate and Momentum parameters and the backpropagation method. Then, the proposed model will be validated by using the training data, and their accuracy will be determined, compared with each other, and compared with existing approaches.

Model creation and validation
The hand-based biometric system model based on a convolutional neural network was created and validated in MATLAB, and experiments were conducted on Intel Core i5 at 1,3GHz. Three pre-taught convolutional neural networks were gradually used in the modelsthe most commonly used convolutional neural networks AlexNet, GoogLeNet, and ResNet (Karpathy, 2017). The reason why AlexNet was chosen is its testing, which in 2012 was more popularized by CNN in the field of computer vision, compared to newer networks. One of the more modern networks is GoogLeNet; the authors of this network have dramatically reduced the number of parameters (to 4 million, compared to 60 million from AlexNet). The case study will test whether the enormous reduction in the number of parameters affects the recognition of hand images. The last tested network is ResNet, which, as one of the few convolutional neural networks, does not have a fully interconnected output layer. Alex consists of 8 layers, GoogLeNet consists of 22 layers, and ResNet consists of 18 layers. For the proposed model learning and validation, the 'The Hong Kong Polytechnic University Contact-free 3D/2D Hand Images Database version 1.0' (HKPU, 2019) was used. It is a commonly used reference database and therefore it can be compared with the results of previous research. This database consists of 570 hand images. There were right-hand images from 114 persons in the database, each person having provided five hand images. The participants were mainly students and employees of Hong Kong Polytechnic University, aged 18 to 50, from multiple ethnic groups. The database was created within four months. All hand images were taken indoors. During the measurement, it was necessary to hold the hand at a distance of 0.7 m from the scanner, empirically chosen to maximize the relative size of the hand in the acquired image frame. Participants were asked to keep their hands parallel to the digitizer, palm facing the camera. Each time a scan was taken, the participant changed the position of the hand slightly in order to simplify the hand segmentation task, the background behind the user's hand was ensured to be of black color. The obtained images were numbered for each user by an identification number. The resolution of these images is 640 x 480 pixels (Kanhangad et al., 2011). A sample of the right-hand pictures from the database is shown in Figure 2.   Fig. 2. Right hand images from the database. Source: (Kanhangad et al., 2011). These datasets of 570 hand images were split into training data (456 images) and test data (114 images) by using the hold-on method with the rate 80:20.
After that, these images were converted to the required size of 227x227x3 for AlexNet and 224x224x3 for both GoogLeNet and ResNet model by using the nearest neighbor interpolation method. In the pre-processing phase, adjusting the size of the input is a single point.
After that, the Mini-batch training method, which is one of a variety of back-propagation algorithms, was used for the training of individual variants of convolutional neural networks. Mini-batch training methods were chosen because the frequency of model update is higher than the batch gradient descent, which allows more robust convergence and avoids local minima. Another advantage is that Mini-batch is computationally more efficient than stochastic gradient descent. However, its disadvantage is that Error information must be accumulated across minibatches of training examples as batch gradient descent. Next, the size of the batch had to be determined. Given the computing technology on which the implementation was performed, and the size of the dataset, the batch size was determined to be 128.
The training of neural networks was performed experimentally. Finding the network parameters that ensured the value of the learning error kept below the desired limit is the result of many different test scenarios. The properties of neural networks during the optimization phase are, for example, Learning Rate and Momentum. Parameters were optimized for Learning Rate 0.001 and Momentum 0.9. The contribution of the previous step was set quite high. The learning rate was set very low, since learning takes longer, but when the learning rate decreased, the classification results deteriorated significantly.
A large amount of input data is needed to retrain a convolutional neural network that will provide satisfactory results. Our used dataset is relatively small. In order not to over-train the network, premature termination of learning was used at the moment when the test error started to grow again. All experiments were performed for max 30 epochs with 45 iterations; at these values, there was no over-training of the network. The learning of individual networks lasted from 30 minutes to 5 hours.
All 114 hand images were used to test the models. As a result, the person's identity was determined by using our convolutional neural networks. The success rate of the biometric system is expressed by the ratio of correctly classified persons to the total number of classified persons.
The proposed models of hand-based biometric systems based on the convolutional neural network had an accuracy of 98% to 100 % during the training phase. After the training, the trained models were tested against testing data, and the accuracy ranged from 94.74% to 100.00%. In the testing, the user identification accuracy reached 100% in the model with ResNet. The results of training and testing accuracy of all convolutional neural networks are shown in Tab. 1, and Tab. 2 shows the comparison of our results with related works. Table 3 compares our approach with other biometric characteristics.

Discussion
As pointed out, hand-based biometric systems have a vast potential in mobile electronic banking applications that are subject to increased security requirements because almost all current mobile devices are equipped with a rear integrated camera through which biometric features can be obtained.
In this paper, the hand-based biometric system based on convolutional neural networks used both for handheld feature extraction and user identity validation was proposed, modeled, optimized, verified, and evaluated on actual data. Overall, three types of neural networks (AlexNet, GoogLeNet, and ResNet) were tested, and the results were compared with each other and also with the existing methods.
The results of this comparison support our proposed hand-based biometric authentication model. All three variants (AlexNet, GoogLeNet, and ResNet) achieve more accurate results than previous work; ResNet was able to identify users with 100% reliability on our data. This paper is a case study aiming to demonstrate whether it is at all possible to use such a system. The results of the system are partly influenced by the relatively small number of persons. A small dataset often causes network overtraining. To partially solve the problem of network overtraining which the properties of manual images can bring, the data used was obtained in a partially limited system (hand was always at the same distance from the sensor, fingers always slightly outstretched, even black background).
Another security threat are attacks on the biometric system. The biometric system can be attacked in all its phases, the most common attacks being the forgeries of biometric patterns.
Since the proposed method does not have liveness detection, it is possible to bypass it by photographing a hand, which can be done, for example, when a person is sleeping. However, achieving it is more complicated than obtaining a fingerprint, of which we leave a large number. One way of preventing these types of attacks is to use thermal images of the hand, which could detect liveliness.
Further research can be directed to a combination of two-hand and face characteristics that could give much better results. With the ever-improving parameters of cameras in mobile devices, it is also possible to focus on research in which a better camera would be used.