data mining in agriculture research paper

Open access
Published: 05 July 2017

Analysis of agriculture data using data mining techniques: application of big data

Jharna Majumdar 1 ,
Sneha Naraseeyappa 1 &
Shilpa Ankalaki 1

Journal of Big Data volume 4 , Article number: 20 ( 2017 ) Cite this article

105k Accesses

128 Citations

1 Altmetric

Metrics details

In agriculture sector where farmers and agribusinesses have to make innumerable decisions every day and intricate complexities involves the various factors influencing them. An essential issue for agricultural planning intention is the accurate yield estimation for the numerous crops involved in the planning. Data mining techniques are necessary approach for accomplishing practical and effective solutions for this problem. Agriculture has been an obvious target for big data. Environmental conditions, variability in soil, input levels, combinations and commodity prices have made it all the more relevant for farmers to use information and get help to make critical farming decisions. This paper focuses on the analysis of the agriculture data and finding optimal parameters to maximize the crop production using data mining techniques like PAM, CLARA, DBSCAN and Multiple Linear Regression. Mining the large amount of existing crop, soil and climatic data, and analysing new, non-experimental data optimizes the production and makes agriculture more resilient to climatic change.

Today, India ranks second worldwide in the farm output. Agriculture is demographically the broadest economic sector and plays a significant role in the overall socio-economic fabric of India. Agriculture is a unique business crop production which is dependent on many climate and economy factors. Some of the factors on which agriculture is dependent are soil, climate, cultivation, irrigation, fertilizers, temperature, rainfall, harvesting, pesticide weeds and other factors. Historical crop yield information is also important for supply chain operation of companies engaged in industries. These industries use agricultural products as raw material, livestock, food, animal feed, chemical, poultry, fertilizer, pesticides, seed and paper. An accurate estimate of crop production and risk helps these companies in planning supply chain decision like production scheduling. Business such as seed, fertilizer, agrochemical and agricultural machinery industries plan production and marketing activities based on crop production estimates [ 1 , 2 ]. There are 2 factors which are helpful for the farmers and the government in decision making namely:

It helps farmers in providing the historical crop yield record with a forecast reducing the risk management.

It helps the government in making crop insurance policies and policies for supply chain operation.

Data mining technique plays a vital role in the analysis of data. Data mining is the computing process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database system. Unsupervised (clustering) and supervised (classifications) are two different types of learning methods in the data mining. Clustering is the process of examining a collection of “data points,” and grouping the data points into “clusters” according to some distance measure. The goal is that data points in the same cluster have a small distance from one another, while data points in different clusters are at a large distance from one another. Cluster analysis divides data into well-formed groups. Well-formed clusters should capture the “natural” structure of the data [ 3 ]. This paper focuses on PAM, CLARA and DBSCAN clustering methods. These methods are used to categorize the different districts of Karnataka which are having similar crop production.

Literature survey

Clustering is considered as an unsupervised classification process [ 4 ]. A large number of clustering algorithms have been developed for different purposes [ 4 , 5 , 6 ]. Clustering techniques can be categorised into Partitioning clustering, Hierarchical clustering, Density-based methods, Grid-based methods and Model based clustering methods.

Partitioning clustering algorithms, such as K-means, K-medoids PAM, CLARA and CLARANS assign objects into k (predefined cluster number) clusters, and iteratively reallocate objects to improve the quality of clustering results. Hierarchical clustering algorithms assign objects in tree structured clusters, i.e., a cluster can have data point’s representatives of low level clusters [ 7 ]. The idea of Density-based clustering methods is that for each point of a cluster the neighbourhood of a given unit distance contains at least a minimum number of points, i.e. the density in the neighbourhood should reach some threshold. The idea of the density-based clustering algorithm is that, for each point of a cluster, the neighbourhood of a given unit distance has to contain at least a minimum number of points [ 8 ].

There are different forecasting methodologies developed and evaluated by the researchers all over the world in the field of agriculture. Some of such studies are: Researchers like Ramesh and Vishnu Vardhan are analysed the agriculture data for the years 1965–2009 in the district East Godavari of Andhra Pradesh, India. Rain fall data is clustered into 4 clusters by adopting the K means clustering method. Multiple linear regression (MLR) is the method used to model the linear relationship between a dependent variable and one or more independent variables. The dependent variable is rainfall and independent variables are year, area of sowing, production. Purpose of this work is to find suitable data models that achieve high accuracy and a high generality in terms of yield prediction capabilities [ 9 ].

Bangladesh offers several varieties of rice which has different cropping season [ 10 ]. For this a prior study of climate (effect on temperature and rainfall) in Bangladesh and its effect on agricultural production of rice has been done. Then this study was being taken into regression analysis with temperature and rainfall. Temperature puts an adverse consequence on the crop production. The data has been taken from the “Bangladesh Agricultural Research Council (BARC)” for past 20 years with 7 attributes: “rainfall”, “max and min temperature”, “sunlight”, “speed of wind”, “humidity” and “cloud-coverage”. In Pre-processing, the whole dataset was divided in 3 month duration phases (March to June, July to October, November to February). For this duration, the average for every attribute has been taken and associated with it. This pre-processing has been done for each kind of rice variety. In clustering, the different pre-processed table has been analysed to find the sharable group of region based on similar weather attribute.

Soil characteristics are studied and analysed using data mining techniques. As an example, the k-means clustering is used for clustering soils in combination with GPS-based technologies [ 11 ]. Authors like Alberto Gonzalez-Sanchez, Juan Frausto-Solis and Waldo Ojeda-Bustamante have done extensive study on predictive ability of machine learning techniques such as multiple linear regression, regression trees, artificial neural network, support vector regression and k-nearest neighbour for crop yield production [ 12 ]. Wheat yield prediction using machine learning and advanced sensing techniques has done by Pantazi, DimitriosMoshou, Thomas Alexandridis and Abdul MounemMouazen [ 13 ]. The aim of their work is to predict within field variation in wheat yield, based on on-line multi-layer soil data, and satellite imagery crop growth characteristics. Supervised self-organizing maps capable of handling existent information from different soil and crop sensors by utilizing an unsupervised learning algorithm were used. The software tool ‘Crop Advisor’ has been developed by S. Veenadhari, B. Misra and CD Singh [ 14 ] is an user friendly web page for predicting the influence of climatic parameters on the crop yields. C4.5 algorithm is used to find out the most influencing climatic parameter on the crop yields of selected crops in selected districts of Madhya Pradesh.

The objective of proposed work is to analyse the agriculture data using data mining techniques. In proposed work, agriculture data has been collected from following sources:

Dataset in agricultural sector [ https://data.gov.in/ , http://raitamitra.kar.nic.in/statistics ],

Crop wise agriculture data [html://CROPWISE_NORMAL_AREA],

Agriculture data of different districts [ http://14.139.94.101/fertimeter/Distkar.aspx ], http://raitamitra.kar.nic.in/ENG/statistics.asp ],

Agriculture data based on weather, temperature, and relative humidity [ http://dmc.kar.nic.in/trg.pdf ].

Input dataset consist of 6 year data with following parameters namely: year, State-Karnataka (28 districts), District, crop (cotton, groundnut, jowar, rice and wheat.), season (kharif, rabi, summer), area (in hectares), production (in tonnes), average temperature (°C), average rainfall (mm), soil, PH value, soil type, major fertilizers, nitrogen (kg/Ha), phosphorus (Kg/Ha),Potassium(Kg/Ha), minimum rainfall required, minimum temperature required.

In proposed work, modified approach of DBSCAN method is used to cluster the data based on districts which are having similar temperature, rain fall and soil type. PAM and CLARA are used to cluster the data based on the districts which are producing maximum crop production (In proposed work wheat crop is considered as example). Based on these analyses we are obtaining the optimal parameters to produce the maximum crop production. Multiple linear regression method is used to forecast the annual crop yield.

Modified approach of DBSCAN

DBSCAN is a base algorithm for density based clustering containing large amount of data which has noise and outliers. DBSCAN has two parameters namely Eps and MinPts. However, traditional DBSCAN cannot produce optimal Eps value [ 15 ]. Determination of the optimal Eps value automatically is the one of the most necessary modification for the DBSCAN. Figure 1 briefs the modified approach of the DBSCAN method.

Determine the Eps value automatically

Modified DBSCAN proposes the method to find the minimum points and Epsilon (radius value) automatically. KNN plot is used to find out the epsilon value where input to the KNN plot (K value) is user defined. To avoid the user define K value as input to the KNN plot, Batchelor Wilkins clustering algorithm is applied to the database and obtain the K value along with its respective cluster centres. This K value is given as input to the KNN Plot.

Determination of Eps and Minpts

The Epsilon (Eps) value can be found by drawing a “K-distance graph” for entire data-points in dataset for a given ‘K’, obtained by the Batchelor Wilkins Algorithm [ 16 ]. Initially, the distance of a point to every ‘K’ of its nearest-neighbours is calculated. KNN plot is plotted by taking the sorted values of average distance values. When the graph is plotted, a knee point is determined in order to find the optimal Eps value [ 15 ].

Partition around medoids (PAM)

It is a partitioning based algorithm. It breaks the input data into number of groups. It finds a set of objects called medoids that are centrally located. With the medoids, nearest data points can be calculated and made it as clusters. The algorithm has two phases:

BUILD phase, a collection of k objects are selected for an initial set S.

Arbitrarily choose k objects as the initial medoids.

Until no change, do.

(Re) assign each object to the cluster with the nearest medoid.

Improve the quality of the k-medoids (randomly select a non medoid object, O random, compute the total cost of swapping a medoid with O random).

SWAP phase, one tries to improve the quality of the clustering by exchanging selected objects with unselected objects. Choose the minimum swapping cost.

Example: For each medoid m1, for each non-medoid data point d; Swap m1 and d, recompute the cost (sum of distances of points to their medoid), if total cost of the configuration increased in the previous step, undo the swap Fig. 2 depicts the steps involved the PAM algorithms.

PAM Algorithm steps

CLARA (clustering large applications)

It is designed by Kaufman and Rousseeuw to handle large datasets, CLARA (clustering large applications) relies on sampling [ 17 , 18 ]. Instead of finding representative objects for the entire data set, CLARA draws a sample of the data set, applies PAM on the sample, and finds the medoids of the sample. To come up with better approximations, CLARA draws multiple samples and gives the best clustering as the output. Here, for accuracy, the quality of the clustering is measured based on the average dissimilarity of all objects in the entire data set. Figure 3 briefs about the steps involved in the CLARA Algorithm.

Steps involved in CLARA algorithm

Multiple linear regression to forecast the crop yield

Multiple linear regression is a variant of “linear regression” analysis. This model is built to establish the relationship that exists between one dependent variable and two or more independent variables [ 19 ].For a given dataset where x 1 … x k are independent variables and Y is a dependent variable, the multiple linear regression fits the dataset to the model:

where β 0 is the y-intercept and \( \beta_{1} , \beta_{2} , \ldots , \beta_{k} \) parameters are called the partial coefficients. In matrix form

Before applying the multiple linear regression to forecast the crop yield, it’s necessary to know the significant attributes from the database. All the attributes used in the database will not be significant or changing the value of these attributes will not affect anything on the dependent variables. Such attributes can be neglected. P value test is performed on the database to find the significant attributes and multiple linear regression is applied only on the significant values to forecast the crop yield.

Evaluation methods

Data mining algorithms work with different principles, being able to be influenced by different kinds of associations on data. To ensure fairer conditions in evaluation, this work finds the optimal clustering method for agriculture data analysis. Proposed work adopts the external quality metrics [ 3 ] like Purity, Homogeneity, Completeness, V Measure, Rand Index, Precision, Recall and F measure to compare the PAM, CLARA and DBSCAN clustering methods.

Purity of the clustering is computed by assigning each cluster to the class which is most frequent in the cluster. Homogeneity represents the each cluster contains only members of a single class. Completeness represents the all members of a given class are assigned to the same cluster. V-measure is computed as the harmonic mean of distinct homogeneity and completeness scores. Rand Index measures the percentage of decisions that are correct. Precision is calculated as the fraction of pairs correctly put in the same cluster. Recall represents the fraction of actual pairs that were identified. F measure indicates the harmonic mean of precision and recall. Higher quality metrics value represents the better cluster quality.

Experimental results

Before applying DBSCAN algorithm on the dataset user needs to determine the Minpts and Eps values. The Batchelor Wilkins algorithm is applied on the dataset in order to determine the K value (Number of clusters) automatically. For the dataset used in the proposed work, K value obtained from the Batchelor Wilkins is 7 with following districts as cluster centres. Results of Batchelor Wilkins algorithm are shown in Fig. 4 .

Cluster centres obtained from the Batchelor Wilkins algorithm

KNN plot is plotted using K value obtained from the Batchelor & Wilkins’ Algorithm to determine the epsilon value and the min points for the DBSCAN.

Figure 5 depicts the result of KNN plot. The KNN plot is plotted using K value obtained from the Batchelor & Wilkins’ Algorithm (i.e. here K = 7). Eps value is calculated by taking the slope of the line from any point and sought-after pair of points that have the greatest slope to locate the point. The slope of the line is located at the point of 0.4, a point which is the optimal value Eps [ 20 ].

KNN plot for given dataset

Districts of Karnataka considered for the analysis

DBSCAN clustering algorithm is applied on the dataset to cluster the different districts of Karnataka which are having similar rain fall, temperature and soil type using optimal Eps value.

Figure 6 depicts the different districts of Karnataka which are considered for the purpose of analysis.

Figures 7 , 8 and 9 depicts the different districts of Karnataka which are having similar temperature range, rain fall range and soil types respectively.

Districts which are having similar temperature range

Districts which are having similar rain fall during 6 year duration

Districts which are having similar soil type

To apply the PAM algorithm on the dataset, initially user need to give k (Number of clusters), where k is given as 3 in current experiment. Crop yield is categorised into LOW, MODERATE and HIGH production. Total districts are clustered into 3 clusters using PAM clustering method. Resultant clusters are shown in the Table 1 .

Study and analysis of wheat crop production in different districts of Karnataka as shown in Fig. 10 .

Production of yield in tonnes per hectare of different districts

As a result of the analysis, North Karnataka districts such as Bijapur, Dharwad, Bagalkot, Belgaum, Raichur, Bellary, Chitradurga and Davangere are the districts which have maximum wheat crop production.

Districts in the dataset are clustered into 3 clusters using CLARA algorithm. Clusters are shown in the Fig. 11 . It represents the districts which are having similar factors like area, production, rainfall and temperature. Result of the CLARA algorithm is shown in the Table 2 .

Study and analysis of temperature and wheat crop production in different districts of Karnataka as shown in Fig. 12 . From the Fig. 12 , we can analyze that the optimal temperature for Wheat crop production is 29.9 °C.

Results of CLARA in R language

Plot temperature vs. production

Multiple linear regression

Before applying the multiple linear regression, the “p value test” is performed on the dataset to determine the significant attributes. Table 3 depicts the significant values. An independent variable which has a “p value” of less than 0.05, specifies that the “null-hypothesis” can be rejected means it will have effect on regression analysis. So these independent values can be added to the model. Whereas if the p value is more than common alpha level i.e. 0.05, the variable will said to be not significant to the model.

Table 4 shows the multiple linear regression equation for different crop yield. For example, for Wheat crop, if all the independent variables are zero, the yield becomes 112. 1 unit increase in temperature level reduces the yield by 4.14e−02 units, 1 unit increase in rainfall will increase yield by 1.34e−04 units, 1 unit increase in pH will increase the yield by 0.079153 units, 1 unit increase in Nitrogen reduces the yield by 1.31e−03 units, 1 unit increase in potassium level decreases the yield by 0.00167 units and 1 unit increase in water requirement decreases the yield by 0.28125 unit.

For 1 unit increase in pH, the crops like Jowar, Rice, and Wheat yield will increase but Groundnut and Cotton yield will decrease.

Results for optimal temperature and rainfall for wheat—Table 5

Table 5 shows the optimal parameters to achieve the higher wheat production.

Comparison of clustering methods

As mentioned earlier, clustering comparison has done using four performance quality metrics. Table 6 shows the comparison of PAM, CLARA and DBSCAN methods for clustering the districts which are having similar crop productivity.

Table 6 and Fig. 13 depicts the comparison of PAM, CLARA and DBSCAN clustering methods. Higher quality metric values indicates better clustering quality. Analysis of the quality metrics parameters for different clustering methods is shown in the Fig. 13 . From Fig. 13 , DBSCAN has higher value for most of the quality metrics parameter. DBSCAN gives the better clustering quality than PAM and CLARA, CLARA gives the better clustering quality than the PAM.

The crops are usually selected by its economic importance. However, the agricultural planning process requires a yield estimation of several crops. In this sense, five crops were selected for this work using the data availability as the key measure. Thus, a crop was selected when enough data samples appeared in the range of 6 years under analysis. In presents works, research is commonly limited to the 5 crops those are cotton, wheat, ground nut, jowar and rice. Example wheat crop analysis is discussed in this paper.

The present work covers the PAM, CLARA, Modified DBSCAN clustering methods and multiple linear regression method. PAM and CLARA are the traditional clustering methods where as DBSCAN method is modified by introducing the Batchelor Wilkins clustering method to determine the ‘k’ value and KNN method to determine the minimum points and radius value automatically. Using these methods crop data set is analysed and determined the optimal parameters for the wheat crop production. Multiple linear regression is used to find the significant attributes and form the equation for the yield prediction.

Some works measure the quality of the clustering methods using internal quality metrics [ 21 ], some other uses the external quality metrics. However, in these works, research is limited to the external quality metrics which are combination of several metrics those are [ 22 ]: set matching metrics, metrics based on counting pairs and metrics based on Entropy. The quality metrics were ranked, from the best to the worst, according to purity, homogeneity, completeness, v measure, precision, recall and rand index results, in the following order: DBSCAN, CLARA and PAM.

Various data mining techniques are implemented on the input data to assess the best performance yielding method. The present work used data mining techniques PAM, CLARA and DBSCAN to obtain the optimal climate requirement of wheat like optimal range of best temperature, worst temperature and rain fall to achieve higher production of wheat crop. Clustering methods are compared using quality metrics. According to the analyses of clustering quality metrics, DBSCAN gives the better clustering quality than PAM and CLARA, CLARA gives the better clustering quality than the PAM. The proposed work can also be extended to analyse the soil and other factors for the crop and to increase the crop production under the different climatic conditions.

Veenadhari S, Misra B, Singh CD. Data mining techniques for predicting crop productivity—A review article. In: IJCST. 2011; 2(1).

Gleaso CP. Large area yield estimation/forecasting using plant process models.paper presentation at the winter meeting American society of agricultural engineers palmer house, Chicago, Illinois. 1982; 14–17

Majumdar J, Ankalaki S. Comparison of clustering algorithms using quality metrics with invariant features extracted from plant leaves. In: Paper presented at international conference on computational science and engineering. 2016.

Jain A, Murty MN, Flynn PJ. Data clustering: a review. ACM Comput Surv. 1999;31(3):264–323.

Article Google Scholar

Jain AK, Dubes RC. Algorithms for clustering data. New Jersey: Prentice Hall; 1988.

MATH Google Scholar

Berkhin P. A survey of clustering data mining technique. In: Kogan J, Nicholas C, Teboulle M, editors. Grouping multidimensional data. Berlin: Springer; 2006. p. 25–72.

Chapter Google Scholar

Han J, Kamber M. Data mining: concepts and techniques. Massachusetts: Morgan Kaufmann Publishers; 2001.

Ester M, Kriegel HP, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Paper presented at International conference on knowledge discovery and data mining. 1996

Ramesh D, Vishnu Vardhan B. Data mining techniques and applications to agricultural yield data. In: International journal of advanced research in computer and communication engineering. 2013; 2(9).

MotiurRahman M, Haq N, Rahman RM. Application of data mining tools for rice yield prediction on clustered regions of Bangladesh. IEEE. 2014;2014:8–13.

Google Scholar

Verheyen K, Adrianens M, Hermy S Deckers. High resolution continuous soil classification using morphological soil profile descriptions. Geoderma. 2001;101:31–48.

Gonzalez-Sanchez Alberto, Frausto-Solis Juan, Ojeda-Bustamante W. Predictive ability of machine learning methods for massive crop yield prediction. Span J Agric Res. 2014;12(2):313–28.

Pantazi XE, Moshou D, Alexandridis T, Mouazen AM. Wheat yield prediction using machine learning and advanced sensing techniques. Comput Electron Agric. 2016;121:57–65.

Veenadhari S, Misra B, Singh D. Machine learning approach for forecasting crop yield based on climatic parameters. In: Paper presented at international conference on computer communication and informatics (ICCCI-2014), Coimbatore. 2014.

Rahmah N, Sitanggang IS. Determination of optimal epsilon (Eps) value on DBSCAN algorithm to clustering data on peatland hotspots in Sumatra. IOP conference series: earth and environmental. Science. 2016;31:012012.

Forbes G. The automatic detection of patterns in people’s movements. Dissertation, University of Cape Town. 2002.

Ng RT, Han J. CLARANS: A Method for Clustering Objects for Spatial Data Mining. In: IEEE Transactions on Knowledge and Data Engineering. 2002; 14(5).

Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis. Wiley. 1990. doi: 10.1002/9780470316801 .

Multiple linear regression- http://www.originlab.com/doc/Origin-Help/Multi-Regression-Algorithm . Accessed 3 July 2017.

Elbatta MNT. An improvement for DBSCAN algorithm for best results in varied densities. Dissertation, Gaza (PS): Islamic University of Gaza. 2012

Kirkl O, De La Iglesia B. Experimental evaluation of cluster quality measures. 2013. 978-1-4799-1568-2/13. IEEE.

Meila M (2003) Comparing clustering. In: Proceedings of COLT 2003.

Download references

Authors’ contributions

JM, Dean R&D, Prof & HOD of Dept of M.Tech CSE at NMIT, has 40 years of experience in India and abroad has guided and given extensive help to develop the data mining algorithms. SN, Assistant Professor of Dept of M.Tech CSE at NMIT has developed the PAM and CLARA algorithms with the help of Dr. Jharna Majumdar. SA Assistant Professor of Dept of M.Tech CSE at NMIT has developed Modified approach of DBSCAN, Multiple Linear Regression and quality metrics for cluster comparison with the guidance and help of Dr. Jharna Majumdar. All authors together analysed the crop data set to determine the optimal parameters to maximise the crop yield. All authors read and approved the final manuscript.

Acknowledgements

The authors express their sincere gratitude to Prof N.R Shetty, Advisor and Dr H.C Nagaraj, Principal, Nitte Meenakshi Institute of Technology for giving constant encouragement and support to carry out research at NMIT.

The authors extend their thanks to Vision Group on Science and Technology (VGST), Government of Karnataka to acknowledge our research and providing financial support to setup the infrastructure required to carry out the research.

Competing interests

The authors declare that they have no competing interests.

This work was supported by the Research Department of Computer science, Nitte Meenakshi Institute of Technology.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and affiliations.

Department of M.Tech CSE, NMIT, Bangalore, 560064, India

Jharna Majumdar, Sneha Naraseeyappa & Shilpa Ankalaki

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jharna Majumdar .

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License ( http://creativecommons.org/licenses/by/4.0/ ), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article.

Majumdar, J., Naraseeyappa, S. & Ankalaki, S. Analysis of agriculture data using data mining techniques: application of big data. J Big Data 4 , 20 (2017). https://doi.org/10.1186/s40537-017-0077-4

Download citation

Received : 25 February 2017

Accepted : 31 May 2017

Published : 05 July 2017

DOI : https://doi.org/10.1186/s40537-017-0077-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

CLARA and DBSCAN

data mining in agriculture research paper

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

We're Hiring!
Help Center

ANALYSIS AND PREDICTION IN AGRICULTURAL DATA USING DATA MINING TECHNIQUES

Agriculture contributes nearly sixteen percent to total GDP of India and ten percent of the total exports which helps in increasing foreign exchange. The population of India is continuously increasing and to meet the food necessities of this growing population, agricultural yield should be boosted. Knowledge discovered from raw data is useful for many purposes. Data mining techniques are better choices for the same. This paper aims to analyze the agricultural data of India using data mining algorithms and to find useful information from the results of these techniques which would help to improve the agricultural yield. Various mining algorithms applied on agricultural data were studied. Data mining techniques applied in this paper include clustering algorithms-K-means, DBSCAN, EM; the results of these algorithms are analyzed.

Related Papers

Divya Sindhu

Manoj Chaudhari

This Paper is Basically applied to the Advancement in Farming by technological Evolution as the growth in Computing and information Assessment, Retrieval and Storage have provided vast amount of Data. Data mining Techniques have been extensively seed n large amount of Datasets and Variables. But the main challenge is to extract information from this data which results n various methodologies and techniques such as Data Mining that can easily provide Results and Conclusions. Data Mining is emerging research field in Agriculture crop yield analysis. In this paper our focus is on the applications of Data Mining techniques in agricultural field. Different Data Mining techniques are in use, such as HM, K-Nearest Neighbor(KNN), Decision Tree(DT) and Support Vector Machines(SVM) in Agricultural Data as a tool for mining. Different Data sets are evaluated and hence outcomes with Different Data Mining Techniques. This paper discusses a process model for analyzing data, and describes the supp...

International Journal for Research in Applied Science and Engineering Technology

Sachin Chawhan

Crop management of certain agriculture region is depends on the climatic conditions of that region because climate can make huge impact on crop productivity. Real time weather data can helps to attain the good crop management. Data mining is the process of discovering of new pattern from large data sets, this technology which is employed in inferring useful knowledge that can be put to use from a vast amount of data, various data mining techniques such as classification, prediction, clustering and outlier analysis can be used for the purpose. Real time weather data can helps to attain good crop management. Weather is one of the meteorological data that is rich by important knowledge. In this paper we include the hybrid model to improve the agriculture productivity by using data mining techniques. KeywordsData mining techniques, Existing System, Problem definition, Proposed System, Market Size, etc.

Bulgarian Journal of Agricultural Science

Boris Milovic

MiloviC, B. and v. RadojeviC, 2015. application of data mining in agriculture. Bulg. J. Agric. Sci., 21: 26-34 Today, agricultural organizations work with large amounts of data. Processing and retrieval of significant data in this abundance of agricultural information is necessary. Utilization of information and communications technology enables automation of extracting significant data in an effort to obtain knowledge and trends, which enables the elimination of manual tasks and easier data extraction directly from electronic sources, transfer to secure electronic system of documentation which will enable production cost reduction, higher yield and higher market price. Data mining in addition to information about crops enables agricultural enterprises to predict trends about customer’s conditions or their behavior, which is achieved by analyzing data from different perspectives and finding connections and relationships in seemingly unrelated data. Raw data of agricultural enterpris...

eSAT Journals

Agrarian sector in India is facing rigorous problem to maximize the crop productivity. More than 60 percent of the crop still depends on monsoon rainfall. Recent developments in Information Technology for agriculture field has become an interesting research area to predict the crop yield. The problem of yield prediction is a major problem that remains to be solved based on available data. Data Mining techniques are the better choices for this purpose. Different Data Mining techniques are used and evaluated in agriculture for estimating the future year's crop production. This paper presents a brief analysis of crop yield prediction using Multiple Linear Regression (MLR) technique and Density based clustering technique for the selected region i.e. East Godavari district of Andhra Pradesh in India.

International Journal of Advanced Research in Computer Science

Gurpinder Singh

iir publications

Dr. S . Rajeswari

Data mining is used to fetch the needed information from large database. Now a day's data mining concept and techniques used to resolve the agriculture problems. Here we discussed about how data mining techniques are applied in agriculture field.

International Journal of Engineering Sciences & Research Technology

Ijesrt Journal

With the evolution of computer based data storage systems we have come across a huge amount of repository of data. But this data is not very helpful until we know what we can do with it. We need to make inferences from this immense data so that we can make decisions driven by knowledge. Data mining is the process of knowledge discovery in database. Mining the agricultural patterns is one of its applications. From last few decades data mining in agriculture is recent research area. Till now data mining techniques were used in the businesses and corporate sectors, but now these techniques are also being used for extraction of efficacious agricultural data. With the help of KDD and data mining we extract the meaningful data sets from the gigantic amount of data. The k-means clustering is used to classify the given set of data. This technique when applied on the large set of data then it results into improved quality of mined data. We have applied this method to study the production and consumption of crops in various parts of India. The various factors which affect the production of crops like soil type and weather are taken into consideration. For graphically representation we have used spatial join with the algorithm.

International Journal of Engineering Research and Technology (IJERT)

IJERT Journal

https://www.ijert.org/data-mining-in-agriculture-a-novel-approach https://www.ijert.org/research/data-mining-in-agriculture-a-novel-approach-IJERTV9IS080107.pdf Data mining is an approach through which in an synchronized manner we can find a workable solution that will be beneficial to increase the growth. The Farmers in agriculture sectors face a lot of issues and difficulties due to the improper understanding and implementation of the activities to enhance their growth and productivity. A large amount of data is available for analyses and scrutiny,however those related to agriculture sector is in a small quantity. Hence segregation and processing of the same from the sources has to be done with proper methodology. Places having multiple grain growth and different soil structure makes it complex to have a perfect estimation of the crops yield both in quantity and quality. Creating a close link between the customer expectation and the producing capabilities of the agriculture sector can be win-win situation at both ends, this can be achieved with capturing data segment wise and in a structured manner. Thus, the customer will be able to fulfil his requirement as per his wish, rather than being satisfied by what is being offered to him. The application of such techniques enables us to predict and make analysis of various problems and helps farmers to make difficult farming decisions based on the conditions, soil fertility, crop duration, disease and other important factors that can result in poor yield production. Agrarian economy can get a boost and can up their financials by making use of such data mining techniques and they can become self-reliant with their needs.

A Survey of Application of ML and Data Mining Techniques for Smart Irrigation System

Ieee account.

Change Username/Password
Update Address

Purchase Details

Payment Options
Order History
View Purchased Documents

Profile Information

Communications Preferences
Profession and Education
Technical Interests
US & Canada: +1 800 678 4333
Worldwide: +1 732 981 0060
Contact & Support
About IEEE Xplore
Accessibility
Terms of Use
Nondiscrimination Policy
Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

This paper is in the following e-collection/theme issue:

Published on 3.4.2024 in Vol 26 (2024)

Public Discourse, User Reactions, and Conspiracy Theories on the X Platform About HIV Vaccines: Data Mining and Content Analysis

Authors of this article:

There are no citations yet available for this article according to Crossref .

Data Mining Techniques in the Agricultural Sector

Conference paper
First Online: 01 December 2021
Cite this conference paper

B. G. Mamatha Bai 41 &
N. S. Rashmi 41

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 790))

801 Accesses

Data mining denotes discovering the useful information from large volume of data. It has useful areas of implementation in many sectors. In this work, we are mainly concentrating on Data Mining Techniques in the Agricultural sector (DMTA). Agriculture is a fundamental human need. The economy is greatly affected by the Agricultural Sector in a Nation like India. Agricultural sector's success or failure depends on the weather conditions and soil parameters. Presently, farmers are growing crops based on their knowledge acquired from the past generation. Since the traditional technique of farming is practiced, plants are excessive or scarce without meeting the real necessity. No scheme is in place to educate the farmers, and there is a variety of new techniques available to solve such issues. This paper presents the results obtained by analyzing the trends followed in the past 10 years using DMTA Model to forecast optimal parameters required to get highest production for Ragi, Groundnut and Paddy. Techniques used for analysis are Bisecting K-Means, DBSCAN, OPTICS, Hierarchical Complete Linkage and STING. All the Districts of Karnataka and various parameters of individual crops are considered.

Agriculture
Bisecting K-means
Data Mining
Hierarchical Complete Linkage

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Available as PDF
Read on any device
Instant download
Own it forever
Available as EPUB and PDF
Compact, lightweight edition
Dispatched in 3 to 5 business days
Free shipping worldwide - see info
Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bharati M, Ramageri (2015) Data mining techniques and applications. Indian J Comput Sci Eng 1(4):301–305. ISSN: 0976–5166

Google Scholar

Gandge Y, Sandhya (2017) A study on various data mining techniques in agriculture. In: International conference on electrical, electronics, communication computer and optimization techniques, vol 3, pp 420–423

Jain N, Srivastava V (2013) A study on data mining techniques. Int J Res Eng Technol 2:116–119. eISSN: 2319-1163 | pISSN: 2321-7308

Zhu X, Davidson I (2014) Knowledge discovery and data mining: challenges and realities. Int J Adv Res Comput Eng Technol 301–304. ISBN 9781-59904-252, Hershey, New York

Rodriguez MZ, Comin CH (2016) Clustering algorithms: a comparative approach. Int J Plus One Res 4:921–930

Wala T (2015) Various data mining techniques in agriculture. Int J Adv Res Comput Eng Technol 3:235–240

Mann AK, Kaur N (2013) Survey paper on clustering techniques. Int J Sci Eng Technol Res 2(4):803–806

Surya P, Laurence Aroquiaraj I (2018) Crop yield prediction in agriculture using data mining predictive analytic techniques. Int J Res Anal Rev 5(4)

Menaka K, Yuvaraj N (2016) A survey on crop yield prediction models. Indian J Innov Dev 5(12):783–786

Murari K, Sandeep (2019) Extreme crop yields and temperature in Karnataka. Research Gate publications, vol 3, pp 112–116

Silas NM, Nderu L (2017) Prediction of tea production in Kenya using clustering and association rule mining techniques. Am J Comput Sci Information Technol 5(2):1–8. ISSN 2349-3917

Venkatkumar IA, Jayantibhai S, Shardaben K (2016) Comparative study of data mining clustering algorithms. In: IEEE international conference on data science and engineering, vol 3, issue 3, pp 111–116. ISSN (P): 2349-3968, ISSN (O): 2349-3976

Suman, Pinkirani (2017) A survey on STING and CLIQUE grid based clustering methods. Int J Adv Res Comput Sci 8(5):245–250. ISSN No. 0976–5697, May–June 2017

Bellundagi V, Umesh KB, Ravi SC (2016) Growth dynamics and forecasting of finger millet (Ragi) production in Karnataka. Econ Affairs 2:195–201

Bhanose SS, Bogawar KA (2016) Crop and yield prediction model. Int J Adv Sci Res Eng Trends 1(1):23–28. ISSN: 2456-0774

Mamatha Bai BG, Nalini BM, Majumdar J (2018) Analysis and detection of diabetes using data mining techniques—a big data application in healthcare. ERCICA 2018, vol 1, ISSN 2194-5357 ISSN 2194-5365 (electronic), Advances in intelligent systems and computing, ISBN 978-981-13-5952-1. ISBN 978-981-13-5953-8 (eBook). https://doi.org/10.1007/978-981-13-5953-8

Agricultural crops dataset including weather parameters and soil parameters are taken from http://raitamitra.kar.nic.in/statistics.html . https://data.gov.in/catalog/district-wise-season-wise-crop-production-statistics

Download references

Acknowledgements

The authors express their sincere gratitude to Prof. N. R Shetty, Advisor, Dr. H C Nagaraj, Principal, Dr. Jharna Majumdar, Dean R&D, Dr. Ramachandra A C, HoD, ECE, Nitte Meenakshi Institute of Technology for giving constant encouragement and support to carry out research at NMIT. The authors extend their thanks to Vision Group on Science and Technology (VGST), Government of Karnataka, to acknowledge our research and providing financial support to setup the infrastructure required to carry out the research.

Author information

Authors and affiliations.

Department of CSE, Nitte Meenakshi Institute of Technology, Bangalore, Karnataka, India

B. G. Mamatha Bai & N. S. Rashmi

You can also search for this author in PubMed Google Scholar

Editor information

Editors and affiliations.

Nitte Meenakshi Institute of Technology, Bengaluru, Karnataka, India

N. R. Shetty

National Institute of Advanced Studies, Bengaluru, Karnataka, India

L. M. Patnaik

H. C. Nagaraj

Prasad N. Hamsavath

Algorithm: Bisecting K-means.

Input: Distinct crops dataset, K value, Distance function.

Output: Grouping into Clusters.

Step 1 : Choose a cluster to divide

Step 2 : Use the fundamental K-means algorithm to find 2 sub-clusters. (Step bisecting)

Step 3 : Repeat step 2 and take the split with the greatest general resemblance producing the clustering.

Step 4 : Repeat first 3 steps until you reach the required amount of clusters.

Algorithm: Agglomerative Complete Linkage

Input : Dataset of Distinct crops

Output : Clusters

Step 1 : Preparation of data computing information (dis)similarity between each pair of objects in the data set.

Step 2 : Using the linkage feature to group objects in a hierarchical cluster tree based on the distance data produced at step 1.

Step 3 : Using the linkage feature, objects / clusters in close proximity are connected together.

Step 4 : Determine where the hierarchical tree should be cut into clusters. This generates a data partition.

Algorithm: STING

Input: Dataset of Distinct crops

Output: Clusters

Step 1 : To start with, determine a layer.

Step 2 : Calculate the trust interval (or estimated range) of probability to the cell which is applicable to the request for each cell in the layer.

Step 3 : Label the cell as appropriate or not appropriate from the interval calculated above.

Step 4 : If the bottom layer is this layer, go to Step 6; if not, go to Step 5.

Step 5 : We’re going one level down the hierarchy system. For those cells that form the appropriate higher-level layer cells, go to Step 2.

Step 6 : Go to Step 8 if the request requirement is met; else go to Step 7.

Step 7 : Retrieve and process these information into the appropriate cells. Give the outcomes that fulfill the query’s requirement. If requirement is met then stop

Step 8 : Find the appropriate cell areas. Return those areas that fulfill the query's requirement. Go to the 9th step.

Step 9 : Stop.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper.

Mamatha Bai, B.G., Rashmi, N.S. (2022). Data Mining Techniques in the Agricultural Sector. In: Shetty, N.R., Patnaik, L.M., Nagaraj, H.C., Hamsavath, P.N., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Lecture Notes in Electrical Engineering, vol 790. Springer, Singapore. https://doi.org/10.1007/978-981-16-1342-5_7

Download citation

DOI : https://doi.org/10.1007/978-981-16-1342-5_7

Published : 01 December 2021

Publisher Name : Springer, Singapore

Print ISBN : 978-981-16-1341-8

Online ISBN : 978-981-16-1342-5

eBook Packages : Engineering Engineering (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Publish with us

Policies and ethics

Find a journal
Track your research

COMMENTS

A comprehensive review of Data Mining techniques in smart agriculture
Data Mining is expected to play an important role in Smart Agriculture for managing real-time data analysis with massive data. The aim of this paper is to review ongoing studies and research on smart agriculture using the recent practice of Data Mining, to solve a variety of agricultural problems. Previous.
Analysis of agriculture data using data mining techniques: application
This paper focuses on the analysis of the agriculture data and finding optimal parameters to maximize the crop production using data mining techniques like PAM, CLARA, DBSCAN and Multiple Linear Regression. ... The data has been taken from the "Bangladesh Agricultural Research Council (BARC)" for past 20 years with 7 attributes: "rainfall ...
(PDF) Data Mining for Smart Agriculture
Data mining enables farmers to identify potentially interesting and unknown patterns in large volume of datasets. This paper discusses about what are the techniques related to data mining and ...
Data Mining in Agriculture
Data mining is a process of extracting hidden information and knowledge that people do not know in advance and have potential utilization value from a large number of noisy, incomplete, fuzzy, and random data. It is the integration of multiple discipline involving database technology, artificial intelligence, mathematical statistics, machine learning, pattern recognition, high-performance ...
A review of the application of data mining techniques for decision
This paper provides a review of research on the application of data mining techniques for decision making in agriculture. The paper reports the application of a number of data mining techniques including artificial neural networks, Bayesian networks and support vector machines. The review has outlined a number of promising techniques that have been used to understand the relationships of ...
The Role of Innovative Data Mining Approaches for Analyzing and
In agriculture, data mining might aid in yield prediction, climate and rainfall forecasts, seed and soil conditions, and crop production. ... Aim of this paper is to defining role of "Data Mining Techniques" for estimating crop yield in agricultural context among emerging nations. This research paper has considered mixed method technique ...
Data Mining in Smart Agriculture
Abstract. This paper aims to present the ways in which data from agriculture can be analyzed in order to make predictions that could contribute in decision-making processes. With the industrialization of agriculture, the amount of data collected through this environment has increased considerably. In this paper, data mining methodologies were ...
Data Mining in Agriculture
Students interested in a hands-on approach using MATLAB may also find the book useful due to the sample solutions provided." (R. Wan, Journal of the Operational Research Society, Vol. 61, 2010) "The book … presents in a comprehensive way most up-to-date data mining techniques and their application to problems from agriculture domain. …
DATA MINING TRENDS IN AGRICULTURE : A REVIEW
Data mining in agriculture is a relatively novel research field. Agriculture data are highly diversified in terms of nature, interdependency and use of resources for farming. The major problem of ...
A comprehensive review of Data Mining techniques in smart agriculture
Data Mining is expected to play an important role in Smart Agriculture for managing real-time data analysis with massive data. The aim of this paper is to review ongoing studies and research on smart agriculture using the recent practice of Data Mining, to solve a variety of agricultural problems.
Agriculture Analysis Using Data Mining And Machine Learning Techniques
Agriculture is an important application in India. The modern technologies can change the situation of farmers and descision making in agricultural field in a better way. Python is used as a front end for analysing the agricultural data set. Jupyter Notebook is the data mining tool used to predict the crop production. The parameter includes in the dataset are precipitation, temperature ...
Machine Learning for Smart Agriculture and Precision Farming ...
It can be thought of as a form of fusion of agriculture, making agricultural domain knowledge compatible with data mining research. 4.1.12 Scalability of Data Mining algorithms. Smart agriculture is responsible for generating enormous amounts of data as a result of all the various gadgets in use.
(Pdf) Analysis and Prediction in Agricultural Data Using Data Mining
Data Mining is emerging research field in Agriculture crop yield analysis. In this paper our focus is on the applications of Data Mining techniques in agricultural field. Different Data Mining techniques are in use, such as HM, K-Nearest Neighbor(KNN), Decision Tree(DT) and Support Vector Machines(SVM) in Agricultural Data as a tool for mining.
Computers and Electronics in Agriculture
Gandhi and Armstrong published a review paper on the application of data mining in the agricultural sector in general, dealing with decision making. They concluded that further research needs to be done to see how the implementation of data mining into complex agricultural datasets could be realized ( Gandhi and Armstrong, 2016a , Gandhi and ...
PDF A Survey on Data Mining Techniques in Agriculture
A Survey on Data Mining Techniques in Agriculture. Abstract--Data mining is a fast emerging and highly rising research oriented field in agriculture for formulating and analysing various conditions on crop yield. In this paper our focus is on studying and experimenting the applications of data mining techniques in agricultural field.
A Survey of Application of ML and Data Mining Techniques for Smart
This paper reviews our current research in agriculture analytics on an open-source platform using data mining and machine learning techniques. Various sensors are used to collect data that provides real-time analytics on the weather forecast, soil moisture, air temperature, PH, humidity. The smart irrigation system is paired with different hardware and development application. The science of ...
data mining techniques Latest Research Papers
The information gain, gain ratio, gini decrease, chi-square, and relieff are used to rank the features. This work comprises the introduction, literature review, and proposed methodology parts. In this research paper, a new method of analyzing skin disease has been proposed in which six different data mining techniques are used to develop an ...
Data Mining in Precision Agriculture: Management of Spatial ...
This paper gives a short overview of the available data, points out in detail the main issue with the classical learning approaches and presents a novel spatial cross-validation technique to overcome the problems with the classical approach towards the aforementioned yield prediction task. Keywords. Precision Agriculture; Spatial Data Mining ...
Agriculture
Farmers' participation in ecological tourism management in nature reserves is an important way to increase income. Based on 921 pieces of household survey data from 44 villages in six nature reserves in Liaoning Province, this paper uses multiple linear regression (OLS) and propensity score matching (PSM) to explore the impact of ecotourism on rural household income. The research results ...
Predictive machine learning in optimizing the performance of electric
The paper introduces an operation research model for optimizing the performance of EV Batteries. It also looks at challenges unique to battery systems and ways to overcome them. The study showcases ML models' ability to predict battery behavior for real-time monitoring, efficient energy use, and proactive maintenance.
Big data in agriculture: Between opportunity and solution
Still, the use cases may serve as demonstrative examples of the status of big data in the agricultural domain. CYBELE is a collaborative project, funded by a competitive grant by the EU Horizon 2020 programme. Therefore, the selected use cases are at the forefront of state of the art in big data technology and the agriculture domain in Europe.
A survey of data mining techniques applied to agriculture
This survey provides a brief overview of data mining techniques applied to agriculture. Data mining and their applications to different research areas are a fertile research field. Different techniques have been proposed for mining data over the years. The 10 most used data mining techniques are discussed in a recent paper (Wu et al. 2008 ).
Journal of Medical Internet Research
Background: The initiation of clinical trials for messenger RNA (mRNA) HIV vaccines in early 2022 revived public discussion on HIV vaccines after 3 decades of unsuccessful research. These trials followed the success of mRNA technology in COVID-19 vaccines but unfolded amid intense vaccine debates during the COVID-19 pandemic. It is crucial to gain insights into public discourse and reactions ...
Mapping smart farming: Addressing agricultural challenges in data
The research and applications in SF adhere to the basic agricultural IoT architecture. Fig. 1 illustrates various agricultural applications, including monitoring, precision irrigation, fertilization, and prediction services facilitated by a fundamental three-layer IoT conceptual framework. The perception layer includes agricultural sensing devices, actuators, and controllers for data ...
Data Mining Techniques in the Agricultural Sector
Data mining denotes discovering the useful information from large volume of data. It has useful areas of implementation in many sectors. In this work, we are mainly concentrating on Data Mining Techniques in the Agricultural sector (DMTA). Agriculture is a fundamental human need. The economy is greatly affected by the Agricultural Sector in a ...

Analysis of agriculture data using data mining techniques: application of big data

Literature survey

Modified approach of DBSCAN

Determination of Eps and Minpts

Partition around medoids (PAM)

CLARA (clustering large applications)

Multiple linear regression to forecast the crop yield

Evaluation methods

Experimental results

Multiple linear regression

Results for optimal temperature and rainfall for wheat—Table 5

Authors’ contributions

Acknowledgements

Competing interests

Publisher’s Note

Author information

Corresponding author

Rights and permissions

About this article

Share this article

ANALYSIS AND PREDICTION IN AGRICULTURAL DATA USING DATA MINING TECHNIQUES

Related Papers

RELATED PAPERS

A Survey of Application of ML and Data Mining Techniques for Smart Irrigation System

Purchase Details

Profile Information

This paper is in the following e-collection/theme issue:

Public Discourse, User Reactions, and Conspiracy Theories on the X Platform About HIV Vaccines: Data Mining and Content Analysis

Data Mining Techniques in the Agricultural Sector

Access this chapter

Acknowledgements

Author information

Editor information

Rights and permissions

Copyright information

About this paper

Download citation

Share this paper

COMMENTS