Cricket result (match outcome) prediction using Random forest algorithm in python

 

Cricket result (match outcome) prediction using Random forest algorithm

 

 PROJECT ID: PYTHON06

 

PROJECT NAME: Cricket result (match outcome) prediction using Random forest algorithm

 

PROJECT CATEGORY: MCA / BCA / BCCA / MCM / POLY / ENGINEERING

 

PROJECT ABSTRACT:

Cricket is a sport that contains a lot of statistical data. There is data about batting records, owling records, individual player records, scorecard of different matches played, etc. This data can be put to proper use to predict the results of games and so this problem has become an interesting problem in today’s world. Most of viewers nowadays try to do some sort of prediction at some stage of the tournaments to see which team will eventually win the upcoming matches and thereby the tournament. This report aims at solving the problem of predicting the results of games by identifying the important attributes from the data set and using the data mining algorithms. I have limited my area of study to the domestic Twenty 20 tournament which is held in India every year during the summer i.e. the Indian Premier League. The previous work that I read and used as a reference were either predicting game results for all sports in general or sports like Basketball, Soccer, etc. My report describes in detail the different attribute selection techniques as well as the data mining algorithms used to solve this problem of result prediction in cricket. I have also used accuracy as the evaluation criteria to evaluate how well the prediction performs. Some future work is also suggested in this report in case some other student in the future is interested in continuing the study of this problem and improving upon my results.

Cricket is being played in many countries all around the world. There are a lot of domestic and international tournaments being held in many countries which play cricket. Cricket is a game played between two teams comprising of 11 players in each team. The result is either a win, loss or a tie. However, sometimes due to bad weather conditions the game is also washed out as Cricket is a game which cannot be played in rain. Moreover, this game is also extremely unpredictable because at every stage of the game the momentum shifts to one of the teams between the two. A lot of times the result gets decided on the last ball of the match where the game gets really close. Considering all these unpredictable scenarios of this unpredictable game, there is a huge interest among the spectators to do some prediction either at the start of the game or during the game. Many spectators also play betting games to win money. So, keeping in mind all these possibilities, this report aims at studying the problem of predicting the game results before the game has started based on the statistics and data available from the data set. The study uses the Indian Premier League data set of all 8 seasons played till now i.e. from 2008 to 2016.

Data Set Description

The new combined data set that I generated had the data from both the .csv files and the .json files. The data set consists of 21 different attributes and 584 different instances. The data set spans across all the seasons of the Indian Premier League from 2008 to 2016 season. The Winning Team is the classifier in the data set and this project is to predict the winning team in the match. The attributes in the data set are as follows:

1) Season: The season in which the match was played

2) Match_Number: The match number in the current season

3) Team1: The Playing team 1

4) Team2: The Playing team 2

5) Venue: Playing venue

6) Home_Team: The team playing at the home venue

7) Toss_Winner: The team winning the toss

8) Toss_Decision: The team winning the toss will decide to bat or field first

9) Player_of_Match: The best player in the match in terms of performance

10) Team_Batting_First: The team which bats first

11) Team_Batting_Second: The team which bats second

12) First_Innings_Score: The total score of the team batting in 1st innings

13) Overs_Played_In_First_Innings: The total number of overs played in 1st innings

14) Wickets_Lost_In_First_Innings: The total number of wickets lost in 1st innings

15) First_Innings_Run_Rate: The rate at which the batting team scores in 1st innings

16) Second_Innings_Score: The total score of the team batting in 2nd innings

17) Overs_Played_In_Second_Innings: The total number of overs played in 2nd innings

18) Wickets_Lost_In_Second_Innings: The total number of wickets lost in 2nd innings

19) Second_Innings_Run_Rate: The rate at which the batting team scores in 2nd innings

20) Winning_Margin: The margin of runs by which the winning team won the match

21) Winning_Team: This is the class attribute i.e. the winning team

SOFTWARE REQUIREMENTS:

OS                                : Windows

Python IDE                  : Python 2.7.x and above

Language                             : Python Programming

Database                             : MYSQL

 

HARDWARE REQUIREMENTS:

 RAM                :  4GB and Higher

Processor          :  Intel i3 and above

Hard Disk         : 500GB Minimum

 

CONCLUSION

The model which is used to predict the results of the matches was built successfully with an accuracy rate of about 60% to 70%. The list of attributes was cut down to 10 important ones out of the 21 attributes available in the data set by using the attribute selection algorithms. The 4 data mining algorithms that were performed on the model were J48, Random Forest, Naïve Bayes and KNN. The prediction results were better when K-Fold Cross Validation method was used as compared to the Percentage Split. The accuracy of the Random Forest algorithm was the best with 71.08%. Although the accuracy was between 60% and 70% but it was still low because of the fact that the total number of instances in the data set was 574 and the total number of classes were 11. We need at least 100 instances per class to identify the patterns in the data set and perform a prediction with a high accuracy rate. So, with 11 classes in the data set we needed at least 1100 instances to perform a prediction with a high accuracy rate. Since, the data set consisted of 574 instances; in future it may improve the accuracy with more number of instances in the data set because with a larger number of instances the model will have the flexibility to deduce better rules and identify more patterns in the data set as compared to with a lesser number of instances.

TABLE OF CONTENTS

·        Title Page      

·        Declaration

·        Certification Page

·        Dedication

·        Acknowledgements

·        Table of Contents

·        List of Tables

·        Abstract

 

CHAPTER SCHEME

CHAPTER ONE: INTRODUCTION

CHAPTER TWO: OBJECTIVES

CHAPTER THREE: PRELIMINARY SYSTEM ANALYSIS

·         Preliminary Investigation

·         Present System in Use

·         Flaws In Present System

·         Need Of New System

·         Feasibility Study

·         Project  Category

CHAPTER FOUR: SOFTWARE ENGINEERING AND PARADIGM APPLIED   

·         Modules

·         System / Module Chart

CHAPTER FIVE: SOFTWARE AND HARDWARE REQUIREMENT

CHAPTER SIX: DETAIL SYSTEM ANALYSIS

·         Data Flow Diagram

·         Number of modules and Process Logic

·         Data Structures  and Tables

·         Entity- Relationship Diagram

·         System Design

·         Form Design 

·         Source Code

·         Input Screen and Output Screen

CHAPTER SEVEN: TESTING AND VALIDATION CHECK

CHAPTER EIGHT: SYSTEM SECURITY MEASURES

CHAPTER NINE: IMPLEMENTATION, EVALUATION & MAINTENANCE

CHAPTER TEN: FUTURE SCOPE OF THE PROJECT

CHAPTER ELEVEN: SUGGESTION AND CONCLUSION

CHAPTER TWELE: BIBLIOGRAPHY& REFERENCES          

Other Information

 

PROJECT SOFWARE

ZIP

PROJECT REPORT PAGE

60 -80 Pages

CAN BE USED IN

Marketing (MBA)

PROJECT COST

1500/- Only

PDF SYNOPSIS COST

250/- Only

PPT PROJECT COST

300/- Only

PROJECT WITH SPIRAL BINDING

1750/- Only

PROJECT WITH HARD BINDING

1850/- Only

TOTAL COST

(SYNOPSIS, SOFTCOPY, HARDBOOK, and SOFTWARE, PPT)

2500/- Only

DELIVERY TIME

1 OR 2 Days

(In case Urgent Call: 8830288685)

SUPPORT / QUERY

www.projectsready.in

CALL

8830288685

Email

help@projectsready.in

[Note: We Provide Hard Binding and Spiral Binding only Nagpur Region]

Download

 

Comments

Popular posts from this blog

Online Salon & Spa Booking System

Fake Review Identification in php

Clothes Recommendation System project in php