Computer vision systems and deep learning for the recognition of athlete's movement: A review article

The process of detecting people in videos and then tracking their movement is one of the very important topics. The process of tracking people and studying their behaviour could result in a large set of information that can help researchers in studying reactions. The techniques of detection and tracking the movement of people are used in the sports field, where the athlete's movement is studied and analyzed within the game. Based on the information obtained from the process of tracking the athlete's movement, it is possible to improve the playing performance as well as avoid injuries and choose the best playing strategy. In some games, the accuracy of athlete's performance is a measure of the points given to the athlete's like gymnastics. This study reviews a set of articles that relied on computer vision as well as deep learning in the process of distinguishing and analysing the athlete's movement. The articles are confined to the years from 2015 to 2022, dealing with different indoor and outdoor sports. Certainly, the study of indoor games is better because the influence


Introduction
Computer vision is one of the branches of computer science.Currently, it has been employed in various applications and fields, including medicine, engineering and industry.In the past, the factories used complicated mechanisms to monitor the products.The worker must be very careful in monitoring the product.In case of ignoring or insufficient expertise of the worker, some products that have been passed may not be good.In such a case, the process will be inaccurate and at the same time it will exhaust the workers, and then there will be a demand for more workers [1].In general, the purpose behind building a computer vision system in any field, whether medicine, sport or any other field, is to obtain an image.After that, the components of this image can be analyzed, i.e., studying its content.For example, identifying the movement of an object or a specific color in an image can be performed through tracking it in that image, especially in the medical diagnosis [2,3].In fact, the process of building a computer vision system is a simulation of the human vision system.The difference between them lies in using the digital cameras in the computer vision system, while the human eye is used in the human vision system [2][3][4].As for the algorithms used in the process of analyzing the relationship between the components, they are equivalent to the way the human mind works, as clarified in figure (1).

Figure 1: Human vision vs. Computer Vision
Many fields, including sports, have begun to rely on computer vision systems and deep learning in order to get the best results.In the sports field, the movement of players inside stadiums is analyzed through the use of computer vision and deep learning techniques.These techniques help coaches to choose the playing strategy as well as the best players who can implement this strategy through reviewing the results obtained from analyzing the player's performance.

Computer Vision in the Sports Field
Computer vision is used in many fields, including sports.In this regard, many computer vision systems can be utilized in determining the location of players or the ball as well as tracking their path inside the field.In addition, coaches can benefit from such systems in the process of analyzing the opponent's strategy.These systems could be of benefit to the referee in order to correct a decision that might be taken due to insufficient view; therefore, this decision will be inaccurate.Such a case reflects the need to use computer vision systems in the sports field [5].In [5], the most important computer vision techniques used in the sports field, whether to track the ball or the player inside the stadium or to measure the player's performance, are discussed.

Detection and Tracking of the Player
One of the basic elements to win any match is the player's performance on the field.Any coach can measure the performance of the opponent team's players.When analyzing the player's performance, this player will be monitored well so that he may not present his real performance [6].In this concern, STATS SportVu is one of the most important computer vision systems that are used to track the movement of basketball players.This system uses six cameras with two cameras on each side of the field and two cameras from the top.It analyzes such information as the player's movement, speed and position as well as the ball movement.It is now used in the NBA and football matches.It uses 3 high-resolution cameras in addition to an optional set of three cameras in order to collect information about the players and the ball inside the field.In the game of baseball, there are systems, such as the Sportvision, used to track the player's movement [5,7].

Detection and Tracking of the Ball
Many computer vision systems are provided for tracking the ball.Through such systems, it is possible to measure the speed of the ball as well as the accuracy of passing it [5].One of the most famous systems used in tracking the ball is the eye of the hawk.This system was introduced in 2001 and was initially used for the cricket game, but now it is used in tennis.This system shows whether the ball crossed the field line or not.In this system, there are ten cameras at 340 frames per second.In addition, there are systems that can be used in football, such as monitoring the goal line, the ball bypassing the field lines, as well as the player's infiltration [5].

3-Types of Sports
In general, sports are divided into games that need a ball and those which do not need a ball, but depend on the player's performance.In addition, some games are divided into indoor (i.e., they are held inside the sports halls) and outdoor (they are held in open stadiums), as shown in figure (2) [8].

Figure 2: Classification of different types of sports
The use of computer vision techniques on sports video clips and then analyzing these clips is of interest to a large group of audience who are passionate about sports, as well as very important for the sports shop, coaches and experts.The sports video analysis process will give a lot of important information that can be used in many applications.For example, the process of determining the player and the team to which he belongs, as well as tracking the player inside the stadium can help detect the player's performance on the field.It is now considered one of the criteria and degrees given to measure the player's performance.These techniques can also be used in the process of tracking the ball to calculate the number of passes on the field, as well as to determine whether the ball is inside or outside the field.By studying how the players move inside the arena, the players can understand the strategy employed by the team [8].The processes of detecting and tracking a player within the sports arena is not an easy subject, but there are many obstacles that will hinder the accurate detection process.Such obstacles include there will be many players having the same features and therefore it will be difficult to detect and track these players, especially in one team, as the sportswear will be the same.Additionally, there may be a convergence between the background and the player, so the process of identifying and discovering the object will be difficult.Also, if the player is far from the camera, the number of pixels used to represent this player will be few and therefore it will be difficult to track this players and others [8].

4.Deep Learning
Artificial intelligence (AI) in general is an attempt to simulate human thinking, build a solution to problems and reach the best way to solve them.In this respect, deep learning is one of the most important ways in AI.Now, deep learning is widely used in many fields, particularly the medical fields.Recently, many articles have been published to discover and identify the Covid-19 disease.Hence, many researchers have used deep learning techniques to reveal the most important characteristics for distinguishing and identifying this disease.Moreover, they use these techniques to discover objects in the video clip or image, as the characteristics of these objects are studied by analyzing their movement and reactions.In general, deep learning is divided into two basic stages.The first stage is extracting the features, while the second stage is exploiting these features to distinguish and identify the objects in the image or video clip.

Literature Review
This section reviews the articles that relied on computer vision in the process of analyzing sports videos and different games.Table (1) shows the analysis of these articles.

Basketball
Basketball is one of the most popular games, and it is preferable to play it in closed rooms in order to avoid external effects, such as air.The basketball team consists of five players for each team.The process of determining the winning team depends on the number of points scored by the team.Therefore, the team which score more points will be the winning team.Also, the process of calculating points is different.There is a registration that counts the points if the player is outside the area surrounding the basket.As for the points recorded for the player who is inside the area surrounding the basket, only two points are scored, and one point can also be calculated if it is caused by the foul throw [9][10][11][12][13][14].
In 2016, Rajiv and Rob presented an article on the process of tracking the ball when shooting it to obtain the triple points.They relied on RNN to discover the path of the ball and thus the shot accuracy could be determined.As for data, they relied on 20,000 shots taken from 631 NBA games.They divided the data into 80% for training and 20% for testing.Then, they took three spatial values of the ball at each moment in time, representing the width and length of the field, as well as the height of the ball from the field.In addition, they used other variables to increase the accuracy of prediction, such as the speed of the ball and the angle at which the ball falls on the loop.The experiments proved that the amount of data used in the training phase was large.The prediction accuracy was 0.870 when training only 40% of the data.However, the prediction accuracy became 0.906 when training 80% of the data.It also proved that the results obtained using the proposed method were better than those obtained using traditional statistical methods or physical measures.In spite of this fact, the results did not reach an accurate prediction due to the effect of noise on some video clips as well as the lack of weights ideal for network [15].
In some basketball games, the result is decided in the last minutes, especially when the difference between the points of the two teams is very close.This article suggested a method based on deep learning in order to choose the best player who can shoot the ball, as well as choosing the attack plan that can be used to win.Also, (4) players can be selected in this plan.The data relied upon in the study was collected from the matches of 25 women's teams in a private university for the 2021 season.From the input data, 21 features related to the player, player's position, the opponent's defensive plan, and the places from which the shots were made correctly, were deduced.Then, the best 4 players who can take the step for making progress were selected.This method also takes into account the time remaining to the end of the game.The greater the time remaining, the greater the chance of scoring for the point.The results showed that if the time remaining is 20 seconds, the probability of scoring is 73%, while if it is 6 seconds, then the percentage is 13%.By comparing the results of the method adopted in this article, it turns out that the accuracy is 64.4% and the total time is 826 milliseconds, but this method did not take into account whether the point to be scored is a point for a tie or victory.Furthermore, the method of choosing the best player to play the plan did not take into account the changes that might take place in the other team during the attack to score the point [16].
Basketball is a team game that depends on the performance of the entire team, with an advantage if the team has two individual players with a high ability of shooting the ball.Hence, the coach determines the ideal positions for these players to improve the team's performance.At the same time, there is an essential point that analyzing the game data in real time is better because making the decision at this time will affect the outcome of the current game.If the analysis is after the end of the game, in this case the importance of this analysis will be only for statistical matters that can be used in other games.This article is about filming the game with a high-resolution camera, and through this filming, some features can be taken, such as the two best players who make three-point or two-point throws, as well as plans and ways to move the players.Then, this data is entered into deep networks to be used in the analysis process [17] Researchers presented a method for analyzing basketball videos to reduce injury, improve the players' skills, as well as help the coach to choose the best players who can decide the outcome in favor of the team [18] A group of researchers suggested a way to identify basketball players through one side camera.It is clear that the side image does not give enough information about the body because a lot of information can be provided in the player's identification coin.The side image will hide many features which may help in the process of analysis, such as the player's facial features and number which help distinguish one player from other players.Therefore, a convolutional neural network was relied on, making the accuracy of distinguishing players to be 95% [19].

Soccer
Soccer is the most famous game among all other games that receives many followers.In this game, the team consists of eleven players who compete with the opponent team.The winner of the match will be the team that scores more goals.The process of analyzing the soccer videos is very important, containing a set of information for measuring the player's performance inside the field, the accuracy of shooting the ball at the goal and the accuracy of passing the ball.By studying how the players move inside the arena, the strategy used by the coach to manage the team can be analyzed.In addition, the ball can be tracked inside the field, i.e. whether it crossed the goal line or not [20][21][22][23][24][25].
A group of researchers presented a way to determine the player and the ball in the arena through the use of YOLO3.Here, it should be noted that the process of determining the ball inside the arena is very difficult, because the ball is in a continuous movement with different speeds.This applies to the player as well.After determining the ball and player, they are traced using SORT, which results in an accuracy of 93.7% in the selection criteria [26].
In general, the main components found in the soccer field are divided into three main components: the players; the main referee; and the assistant referees.The researchers presented a method using the convolutional neural network by taking the most important features in order to classify the individuals inside the soccer field.Firstly, the data is received, and then the augmentation process is performed in order to increase the images in the database.Finally, the object inside the stadium is classified [27].The convolutional neural network was relied on to identify the players on the field, whereas ISSIA-CNR was relied on as a database, where the five-stage convolutional neural network was designed [29].
A group of researchers presented a method to distinguish three of the main movements in soccer.It was used to distinguish goals as well as yellow cards.In this regard, 400 video clips were used through relying on VGG16 to characterize them.Then, a Recurrent Neural Network (Bi-LSTM) was used to carry out the classification process [28].
A group of researchers presented a method to determine the parameters of soccer through relying on YOLO to discover objects inside the stadium.Images were taken with dimensions of 608 * 608, the number of frames was 16 and the accuracy was 97.6%.After that, the image dimensions and the number of frames were changed to (416 * 416) and (30), respectively, and the accuracy was 90.03% [30].

Others Games
Volleyball is a group sport in which the team consists of 6 players.Both teams compete in order to score more goals than the other team.This game depends on the players' movements, such as jumping, shooting the ball, preventing the player from touching the median net between the two players, as well as determining whether the ball is inside or outside the field.This clarifies that there are many areas that can be considered and studied in this game [31][32][33].
There are many other games in which the player's performance can be analyzed, such as the gymnastics game and how to perform some jumps.There are other games for analyzing the player or tracking the ball inside the stadium, such as cricket or badminton [34][35][36].
A group of researchers presented a method for tracking a badminton player by relying on faster RCNN to detect the player and determine his movement inside the court, whether the match was single or pair.The researchers initially created a database by collecting clips from YouTube for three competitions held in different years, two of which were single and one was pair.Then, these videos were dismantled into frames by using the Virtual dub program and only 100 frames were taken from each video clip.There were five methods.The first method was the training process, the second one was applied to the second method, while the third one was applied to the first and second methods.The fourth method was based on the third rule.The fifth was based on the first, third, and last method.The results showed that relying on the last method was better, because all the characteristics were studied [37].
A group of researchers presented an article that characterizes the movement of a badminton player, distinguishing only two movements: hitting the badminton and not hitting it.The study was based on the application of four models of the convolutional neural network (VGG16, AlexNet, VGG19, GoogleNet).The data for this game was taken from the tennis tournament held in 2017 whose videos are available on YouTube.Then, the videos were cut into a set of frames by using a program made for this purpose.Then, the data was installed in the convolutional neural network models, for determining a set of features based on which the badminton player's movement can be classified.Then, a comparative study of the tested models was conducted.It was found that (GoogleNet) was the best model, as it obtained an accuracy of 87.5%, while (AlexNet) obtained an accuracy of 81.3%.The other models got an accuracy of 50%.However, this method did not discuss the other player's movements, as it only discussed two of them [38].
AlexNet, which is one of the convolutional neural network models, was relied on in the process of feature extraction and classification using SVM.Another model of the convolutional neural network is the GooleNet model, in order to compare the results obtained from five videos on YouTube.Then, these videos were disassembled into 1496 frames.The study was conducted in order to classify five major movements in the game of badminton, relying on the local and global features.The results clarified that using the GooleNet model provided good results by reliance on the global features, while the other model provided good results in terms of local features [39].Four models of convolutional neural networks (VGG19, VGG16, AlexNet, GooleNet) were relied on to distinguish five badminton player movements.The database was represented by five badminton clips available on YouTube, and then disassembled to 1496 frames.Then, 1196 frames were used for the training process.As for the testing phase, 300 images were used.After applying these four models, it was found that the AlexNet model gave the best results in terms of accuracy, as well as the least time in the training process, followed by GooleNet.While the other two models had similar results.After that, the experiment was repeated by studying one movement discrimination, which was smash.It showed that the GooleNet model was better than AlexNet, with an accuracy of 90.3%, but AlexNet was better in terms of the training time by about 23 seconds.The movement of badminton players was also analyzed for professional players with experience of up to five years.The data was collected from Pahang University.The smash movements were analyzed through disassembling the videos into 8324 frames.After that, the color system of the image was converted from the RGB color system to a grayscale image in order to reduce the data and speed up the training process.After collecting the data, three models of the convolutional neural network were applied, namely VGG-16, ResNet-18, and GoogleNet.The performance of these models was evaluated.Their practical implementation revealed that ResNet-18 got the most accurate value, which reached 98.86% [41].Cricket is one of the games that are popular in some countries.A group of researchers presented a method based on deep learning in order to analyze the movements of a cricket player withing 5 or 6 seconds and in every second the number of frames was 4. After creating the cricket database, two methods of experiment were used after collecting 800 video clips, which were divided into training and another test.The first method was VGG16 and the second method was LSTM.As for the second method, it was based on 3D CNN counting.Eight cricket player movements were distinguished.The results showed that the accuracy rate obtained using the second method was 90%, while the first method was 80% [42].Cricket is one of the games that have a lot of movements that allow the interested and researchers to perform many operations to distinguish the movements of the game or to distinguish one of the players on the court.The research presented a method to discover one of the players who performed cricket bowlers.The database consisting of 8100 was collected.The image was compiled from videos on YouTube, where the image was taken for 18 players doing this movement from seven countries.The images were divided into 80% for training and 20% for testing.VGG16 was used to identify the player.The accuracy reached 93.3% [43].The cricket game is important in India and has a strong impact on the public in terms of results, as is the case with football in other countries.The researcher analyzed the process of shooting the ball and compared the results with the standard shooting process.The database was created from YouTube clips and then these clips were disassembled into 2074 frames, divided into 80% for training process and 20% for test process.The proposed algorithm was applied using the convolutional neural network, which was built based on four convolutional layers.After conducting the proposed model and taking the RMSE scale, it was found that the accuracy of the system was 90% [44].Making the wrong decisions in any match will lead to the loss of a team that does not deserve to lose and therefore this will be injustice to that team.Some games are complex, they may not be satisfied with the referees present in the arena, but they need a third match referee to make the last decision.This process requires a high accuracy, sometimes it takes a long time and it can be incorrect and therefore relying on technology will be better in terms of speed and accuracy.Therefore, this article suggested a method that distinguishes the movement of the cricket player, where the third referee depended on the convolutional neural network in making decisions.The article relied on six databases in addition to a set of images added to these rules.The data set was divided into 80% for training and 20% for testing.This proposed method gave an average accuracy of 73.3% [45].Volleyball is among the games that have popularity and followers in the world.The videos of volleyball players were analyzed through identifying the players in the video.After that, the most important points of the player's skeleton were determined to define the most important points in the player's movement.The player's movement was determined accurately, as the study adopted the convolutional neural network, and the accuracy of results was 93% [46].Tracking a single player may be easy, but identifying and tracking a group of players will be more difficult and require a lot of operations and accuracy, especially since these players are in a hidden movement state.This article suggested the process of tracking the movements of handball players in order to identify the most active players, which can be used in the process of determining the best player in the match.In addition, it can help the coach to identify the most efficient player through his movements [47].

Results and Analysis
This section discusses the results and observations found when reviewing some articles on the use of deep learning in the sports field.It is clear that most of the games where the player's movement is studied are football and basketball.This is mainly because these games are popular, which means that there is a great interest in analyzing the player's movement in this type of games.Figure (5) shows the application of deep learning to indoor and outdoor sports.It is noted that most of the games under study are of indoor type, because the effect of the weather is almost ineffective in the indoor halls.Therefore, the results obtained will be better than studying the outdoor games.Figure (6) shows the type of databases, namely dataset, collect and both of them.It is clear that the convolutional neural network can be used in building the deep learning models, which can be employed in analyzing the player's movements.In addition, the network architecture to be used is built to suit the work.

Conclusion
The great progress made in the field of technology is attributed to increasing the processor speed, the memory size, and the development of some methods and algorithms that perform some operations.This helps to reduce complexity and improve speed in the process of accessing results.One of the most important areas that witnessed a great development in artificial intelligence is deep learning.Deep learning is now considered a revolution in the progress of science because it finds the necessary requirements to increase the number of data as well as the speed of the processor.One of the most important topics that can benefit from deep learning is the process of tracking objects in several fields, including medicine, engineering, as well as sports.In this article, the benefit of applying deep learning algorithms to the analysis of sports games, whether held in indoor halls or in open stadiums, was studied.The process of analyzing the match was applied to tracking the player and the ball.The process of tracking using a computer vision system was better than relying on the human vision, which is subject to error.All articles obtained high accuracy, where their analysis reflects that the best results were obtained by relying on YOLO3 in the process of detecting players.It was found that there was no dataset for most games, and most researchers built the data to work on by relying on the video clips on YouTube.The videos on YouTube were the best source in building the dataset.

Figure ( 4 )
shows the games where deep learning is used in order to analyze the player's performance.

Figure 4 :
Figure 4: Deep learning in the sports field

Figure 6 :
Figure 6: DataSet vs. collectFigure(7) shows the object under tracking.It clarifies that the process of tracking the player is performed more than tracking the ball for several reasons including that most games do not contain a ball.In addition, tracking the ball is more difficult than tracking the player due to the small size of the ball and at the same time its speed.