Rainfall prediction using machine learning algorithm in python

 

Rainfall prediction using machine learning algorithm

 

 PROJECT ID: PYTHON22

 

PROJECT NAME: Rainfall prediction using machine learning algorithm

 

PROJECT CATEGORY: MCA / BCA / BCCA / MCM / POLY / ENGINEERING

 

PROJECT ABSTRACT:

Rainfall Prediction is the application of science and technology to predict the amount of rainfall over a region. It is important to exactly determine the rainfall for effective use of water resources, crop productivity and pre-planning of water structures.

In this article, we will use Linear Regression to predict the amount of rainfall. Linear Regression tells us how many inches of rainfall we can expect.

The dataset is a public weather dataset from Austin, Texas available on Kaggle. The dataset can be found here.

Data Cleaning:

Data comes in all forms, most of it being very messy and unstructured. They rarely come ready to use. Datasets, large and small, come with a variety of issues- invalid fields, missing and additional values, and values that are in forms different from the one we require. In order to bring it to workable or structured form, we need to “clean” our data, and make it ready to use. Some common cleaning includes parsing, converting to one-hot, removing unnecessary data, etc.

 

SOFTWARE REQUIREMENTS:

OS                                : Windows

Python IDE                  : Python 2.7.x and above

Language                             : Python Programming

Database                             : MYSQL

 

HARDWARE REQUIREMENTS:

 RAM                :  4GB and Higher

Processor          :  Intel i3 and above

Hard Disk         : 500GB Minimum

 

Models

We chose different classifiers each belonging to different model family (such as Linear classifier, Tree-based, Distance-based, Rule-based and Ensemble). All the classifiers were implemented using scikit-learn except for Decision table which was implemented using weka.

The following classification algorithms have been used to build prediction models to perform the experiments:

3.3.1 Logistic Regression

is a classification algorithm used to predict a binary outcome (1 / 0, Yes / No, True / False) given a set of independent variables. To represent binary / categorical outcome, we use dummy variables. We can also think of logistic regression as a special case of linear regression when the outcome variable is categorical, where we are using log of odds as dependent variable. In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function. Hence, this makes Logistic Regression a better fit as ours is a binary classification problem.

3.3.2 Decision Tree

have a natural “if … then … else …” construction that makes it fit easily into a programmatic structure. They also are well suited to categorization problems where attributes or features are systematically checked to determine a final category. It works for both categorical and continuous input and output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input variables. This characteristics of Decision Tree makes it a good fit for our problem as our target variable is binary categorical variable.

3.3.3 K - Nearest Neighbour

is a non-parametric and lazy learning algorithm. Non-parametric means there is no assumption for underlying data distribution. In other words, the model structure is determined from the dataset. Lazy algorithm means it does not need any training data points for model generation. All training data used in the testing phase. KNN performs better with a lower number of features than a large number of features. We can say that when the number of features increases than it requires more data. Increase in dimension also leads to the problem of overfitting. However, we have performed feature selection which helps to reduce dimension and hence KNN looks a good candidate for our problem.

 

3.3.4 Decision table

Provides a handy and compact way to represent complex business logic. In a decision table, business logic is well divided into conditions, actions (decisions) and rules for representing the various components that form the business logic. [11] This was implemented using Weka.

3.3.5 Random Forest

is a supervised ensemble learning algorithm. ‘Ensemble’ means that it takes a bunch of ‘weak learners’ and have them work together to form one strong predictor. Here, we have a collection of decision trees, known as “Forest”. To classify a new object based on attributes, each tree gives a classification and we say the tree “votes” for that class. The forest chooses the classification having the most votes (over all the trees in the forest).

TABLE OF CONTENTS

·        Title Page      

·        Declaration

·        Certification Page

·        Dedication

·        Acknowledgements

·        Table of Contents

·        List of Tables

·        Abstract

 

CHAPTER SCHEME

CHAPTER ONE: INTRODUCTION

CHAPTER TWO: OBJECTIVES

CHAPTER THREE: PRELIMINARY SYSTEM ANALYSIS

·         Preliminary Investigation

·         Present System in Use

·         Flaws In Present System

·         Need Of New System

·         Feasibility Study

·         Project  Category

CHAPTER FOUR: SOFTWARE ENGINEERING AND PARADIGM APPLIED   

·         Modules

·         System / Module Chart

CHAPTER FIVE: SOFTWARE AND HARDWARE REQUIREMENT

CHAPTER SIX: DETAIL SYSTEM ANALYSIS

·         Data Flow Diagram

·         Number of modules and Process Logic

·         Data Structures  and Tables

·         Entity- Relationship Diagram

·         System Design

·         Form Design 

·         Source Code

·         Input Screen and Output Screen

CHAPTER SEVEN: TESTING AND VALIDATION CHECK

CHAPTER EIGHT: SYSTEM SECURITY MEASURES

CHAPTER NINE: IMPLEMENTATION, EVALUATION & MAINTENANCE

CHAPTER TEN: FUTURE SCOPE OF THE PROJECT

CHAPTER ELEVEN: SUGGESTION AND CONCLUSION

CHAPTER TWELE: BIBLIOGRAPHY& REFERENCES          

Other Information

 

PROJECT SOFWARE

ZIP

PROJECT REPORT PAGE

60 -80 Pages

CAN BE USED IN

Marketing (MBA)

PROJECT COST

1500/- Only

PDF SYNOPSIS COST

250/- Only

PPT PROJECT COST

300/- Only

PROJECT WITH SPIRAL BINDING

1750/- Only

PROJECT WITH HARD BINDING

1850/- Only

TOTAL COST

(SYNOPSIS, SOFTCOPY, HARDBOOK, and SOFTWARE, PPT)

2500/- Only

DELIVERY TIME

1 OR 2 Days

(In case Urgent Call: 8830288685)

SUPPORT / QUERY

www.projectsready.in

CALL

8830288685

Email

help@projectsready.in

[Note: We Provide Hard Binding and Spiral Binding only Nagpur Region]

Download

 

 

Comments

Popular posts from this blog

Online Salon & Spa Booking System

Fake Review Identification in php

Clothes Recommendation System project in php