Prediction of Rising Stars in the Game
Among all sports, cricket that was
originated from England, is the second most popular game and now has its roots
round the globe. Rising Star Prediction (RSP) is made based on the current
contributions of rising star. A star in the cricket is an experienced player
with extraordinary performance throughout his career. Whereas, a rising
cricketer or a rising star is an emerging player, who currently has a low
profile, but could be a star cricketer in future based on consistent
improvements in performance. The players can
be categorized into different classes based on their performance evolution.
figure gives the different classes of performance evolution. Finding RSs in
academic networks future prediction of citation count and temporal expert finding
are such proposal among many other results. But these proposals did not
consider sports domain. While considering the sports domain, some issues
related to sports but proper prediction mechanism is not presented by the
authors. There are further proposals for ranking of batsman based on
performance metric and for ranking of cricket teams by employing h-index and Page
Rank . A criterion for RSP is in such a way that the early years performance of
an emerging player and the performances of his Co-players are incorporated. Such that an
emerging player finds the opportunity to learn from the playing strategies of
Co-players under the same playing conditions in order to improve its performance.
There are three categories of features (Co
batsmen,Team and Opposite teams) for batsmen as well as similar categories
(Co-bowlers, Team and Opposite teams) for bowlers. The brief description of the work is given by
efficient methodology for RSP within the cricket domain while incorporating the
concept of Co players. A set of 9 features is formulated for RSP of
as well as a set of 11 features for the bowlers.
By testing different classification algorithms on the datasets, four
appropriate machine learning classification algorithms are selected for binary
The performance of employed machine learning algorithms is critically examined during
the evaluation phase.
RSP is made with high accuracy, and rankings of leading RSs from both domains
based on three defined metrics are presented and compared with the ICC rankings
of players from 2013-2016.
This innovative idea can be used for RSP in other sports domains such as
baseball, football and basketball.
basic concepts and terminologies including a brief introduction of cricket
game, its rules and regulations and ranking metrics are explained. ICC issues
the rankings on a regular basis. Each team gets or loses points while winning
orlosing cricket matches, respectively. Considering a particular time span,
these points are utilized for ranking the teams by ICC. Some of basic metrics
for ranking the batsmen are listed as: 1) runs 2) batting average and 3) batting
strike rate. Similarly, the metrics for ranking bowlers are: 1) bowling average
2) bowling strike rate and 3) bowling economy.
a set of tuples with n training examples(X1; yi); (X2; yi); (Xn; yi), where, Xi
denotes the feature vector of cricketer ci, while Xi 2 Rm, R is the real
features space, m is the total count of features, n is the total count of
cricketers. Moreover, for RSPa prediction
function PRS is defined as follow :
= PRS(ci=X); (1)
= < 0 if y = -1; notRS; > 0 if y = +1; RS:
The important characteristics for central
comparative analysis of two widespread generative classifiers: BN and NB are
Network (BN): A BN is a directed acyclic graph representing joint probability distribution over a set of
random variables in terms of their conditional dependencies.Nat?ve
Bayesian (NB): NB is the first successor classifier of BN, but with additional
difference of independence between the features.
1) Support Vector Machines (SVM): Among
state-of-theart binary classifiers based on supervised machine learning, Support
Vector Machines (SVM) have gained broader popularity due to efficient
investigation of data while identifying
patterns . More precisely, for efficient separation of two different classes,
SVM model constructs the optimal hyper plane with largest functional margin.
Moreover, it can handle linear and non-linear data.
Classification And Regression Tree (CART): CART is fundamentally a
non-parametric model used for making prediction on underlying data. Basically,
CART is comprised of three main steps
1) Maximum tree construction
2) Right selection of tree size
3) Classification of unseen data based
on former trained tree.
Recall and balanced F-measure are standard metrics that are employed to check
the performance of binary classification models.
impact of defined features for RSP and find that all state-of-the-art
classifiers are showing outclass performance. The underlying subsection
provides the analysis for learning RSs from the defined features for WA (B)
measure based batting dataset. Thus, proposed features can be generalized for
RSP in cricket domain. For
number of instances, every classifier is predicting RSs with 100% accuracy.
However, overall NB is dominating all the remaining classifiers while achieving
the average of 94.5% learning accuracy for 10-100 instances. The second best performance
is exposed by SVM model with the average of 92.6% accuracy. BN stands at third
with the average of 91.1% learning accuracy, while CART is ranked at last with
the average of 90.1% accuracy for 10-100 instances.
influence of defined features for RSP and find that all the state-of-the-art
classifiers are showing excellent performance. The underlying subsection
provides the analysis for learning RSs from the defined features for WA (Bow)
measure based bowling dataset. Thus, proposed features can be generalized for
RSP in cricket domain
batting domain,the ranking of top 10RS batsmen based on WA (B), PE (B) and RS
(B). Themetrics WA (B) and PE (B) are formally presented .The third metric RS
(B) is composed of aggregate score that is calculated by adding all the
positively. Correlated features to the batting performance, while the negative
correlated features are subtracted. More precisely,among the 9 defined features
for RS, 6 features belonging to Co-batsmen and Team categories are positively
correlated to the RS batsmen performance, because higher values of these
indicate the higher chance for an emerging batsman of becoming a RS. On
contrary, the three features of Opposite teams category are negatively
correlated with the performance of batsman.
bowling domain ranking of top 10RS bowlers based on WA (Bow), PE (Bow) and RS
(Bow).The metrics WA (Bow) and PE (Bow) are formally presented in the former
subsection. The third metric RS (Bow) is composed of aggregate score that is
calculated by adding all the positively correlated features to the bowling
performance, while the negative correlated features are subtracted.
are explicitly adopted for rising star prediction in batting and bowling
domains. More precisely, three categories(Co-players, Team and Opposite teams)
are incorporated, in which 9 and 11 features are defined for the prediction of batting
and bowling rising stars, respectively. Two types of datasets are generated
based on weighted average and performance evolution metrics. The defined
features are tested while employing generative (BN and NB) and discriminative
(SVM and CART) machine learning algorithms. For batting domain, Co-batsmen
category suppresses the remaining two categories, while in bowling realm, Team category
outperforms for rising star prediction. Overall, it is observed that NB
outperforms the remaining models. Finally, ranking lists of rising stars based
on weighted average, performance evolution and rising star score are presented
for both domain. These rankings are compared with the ICC rankings during
2013-16 and it is found that our presented approaches are functional for rising
star prediction. Therefore, these features can also be used for rising star
prediction in test and T20 formats. Moreover, some additional features such as
opposite team diversity, home or away, 100s, 50s (for batsmen) and 4, 5 wickets
also be incorporated in order to get even better results. Finding RSs within
the cricket and other domains is quite useful, so that the authorities
(coaches, managers etc.) can put efforts to maximize the expertise of such RSs
in order to get the optimal performances in future. Similar methodology can be
adopted for RSP in different sports domains and other organizations.