Identify and describe some approaches to Data Mining and Analytics.
Data mining helps in finding new patterns and relationship from the data. For predictive modeling applications and complex data mining, analytics tools such as spreadsheets with statistics are used. It opens a new and exciting way for a company to do business by allowing them to dig into the information and find new, revolutionizing conclusions (O’Brien & Marakas, 2013). Data analytics, on the other hand, helps in making conclusions by examining and analyzing raw data. It scans and filters data to come to a conclusion on things (Duan & Xiong, 2015). It helps the business make an informed decision like providing a platform for science centered industries to validate current theories.
Tools such as regression, classification, decision tree, neural network, and market basket analysis are used as approaches for data mining (Mikut & Reischl, 2011). Regression is used to predict continuous value often using scatter plot while classification makes use of discrete categories to assign data. An example of these would be in case of buying a house, if you’re looking at the price, location, size; it’ll be called as a regression but if you’re looking at the crime rate in the area, walkability and alike; that’s called as classification. Decision tree on the hand looks at the possible outcomes of each of the statement or even the outcome itself to have multiple branches with each depicting a different situation and each having a branch to more. Neutral network, inspired by biological neural network helps in determining the class using a linear combination of attributes. And, the market basket analysis is one of the most commonly used approaches in data mining. It helps in determining what products a customer purchases along with another product. This can then be used as part of cross-selling, product placement and even affinity promotion to increase the sales of the business.
There are several approaches that a company can make use of to gather all the data, group them and to visualize it with different approaches to data analytics. One of the approaches would be with the use of import.io. It helps in grabbing information from different websites. If you’re looking for information about mobile phones, the application would take that keyword and look out while pulling data relevant for you. Another tool would be NodeXL. It visualizes networks and relationships while also providing exact calculation. With the tool, if you’re looking for people talking about your product in twitter, you can feed the system with the keyword and it’ll provide you with a visual representation via a graph of people talking about it on twitter. Furthermore, google search operators can be another tool or approach. This tool allows filtering results that are relevant and important to the organization. An example would be that you can filter the search to just include the results of the last one year with the word monthly report included in each of the search terms. Google fusion table is another approach. It allows in visualizing the data as well. With the use of the tool, people or a company can gather all data, visualize and even share it through the platform. Finally, there is another tool in OpenRefine, it can also be considered as housecleaner software. It helps in checking spelling, spaces and other errors by grouping similar entities and making them all ready for analysis. Consider that there are two differently formatted reports made by two different people of the same thing, one of them have capitalized each word while the other has extra spaces between each data. With the use of this approach, a company can ensure that the formatting is the same and constant by nicely grouping the data.
Duan, L., & Xiong, Y. (2015). Big data analytics and business analytics. Journal Of Management Analytics , 2 (1), 1-12.
Mikut, R., & Reischl, M. (2011). Data mining tools. Wiley Interdisciplinary Reviews: Data Mining And Knowledge Discovery , 1 (5), 431-443.
O’Brien, J., & Marakas, G. (2013). Introduction to information systems (6th ed.). New York: McGraw-Hill.
Data mining is extraction of hidden predictive information from large databases. It is a powerful new technology with great potential to help banks and financial institutions focus on the most important information in their data warehouses (Pei, Han, & Lakshmanan, 2001). The tools of data mining predicts future trends and behaviors, allowing banks and financial institutions to make proactive, knowledge-driven decisions. It also provides response to business questions that traditionally were too time consuming to resolve. Tools also help to search the data for hidden pattern. For example most of corporate houses, business houses and commercial banks and financial institutions already collect and refine large quantities of data, implementation of data mining solution. Nowadays these techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line (Hong & Mozetic, 2001).
Data Mining is supported by three technology and these are
Huge amount of data collection or large amount of data collection
Fast speed, high storage and powerful multi-processor computer
The Powerful data mining algorithm
The size of a bank or financial institution’s databases would in general depend on the kind of activities, which are being carried on by it. However a typical bank or financial institution engaged in retail activities may have databases of size in petabyte range. The accompanying need for improved computational engines can now be met in a cost effective manner with parallel multiprocessor computer technology. The last component of data mining algorithm techniques, which cull out information from the large mass of raw data residing in the databases (Hecht-Nielsen, 2001).
The most common using approach and technique in data mining (Hosking & .Pednault, 2007) is
Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure. The data is arrange automatically by using machine learning and deep learning teaching and analyze these data as a similar process.
Decision trees: Tree-shaped structures that represent sets of decisions. These decision generate rules for the classification of dataset. Specific decision tree methods include Classification and Regression Trees (CART) and chi Square Automatic Interaction Detection (CHAID). We have different algorithm like Heuristic algorithm, A stare and other algorithm for proper data mining.
Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution.
Rule induction: The extraction of useful if-then rules from data based on statistical significance. It is rule based system we analysis the data step by step as rule wise.
Generally, these technique have been use for more than a decade in specialized analysis tools work with relatively big volumes of data. These capabilities arc now evolving to integrate directly with industry-standard data warehouse and OLAP platforms. When we have huge amount of data like Facebook, Google, and Amazon and Alibaba server then we definitely follow the OLAP (Online analytical processing server) Technique for data warehousing and data mining technique.
Hecht-Nielsen, R. (2001). Neurocomputing and data mining. MA: Addison-Wesley.
Hong, J., & Mozetic, I. (2001). Incremental learning of attribute-based descriptions from examples, the method and user’s guide. In Report ISG 85-5 UIUCDCS-F Department of Computer Science, University of Illinois.
Hosking, J., & .Pednault, E. (2007). A statistical perspective on data mining. Future Generation Computer Systems.
Pei, J., Han, J., & Lakshmanan, S. (2001). Mining frequent itemsets with convertible con- straints. In Proc. Int. Conf. Data Engineering (ICDE’01).
After the collection and classification of data through business intelligence, data mining is the next step to unlock patterns and make decisions. These patterns are identified by using statistical tools and in most cases predict future possibilities based on the past information. (Wallace, 2015, p. 202)
For example I once used a free software called Meta Trader 4. This is a data mining software related to the stock. Copying date, opening price, closing price, high price of the day, low price of the day and the volume sold of a particular share of the company form Nepal share market you paste the data in excel in a pre-specified format. This excel file is imported into the software where by the data is presented in a candle stick format which shows you how trend of the stock rising and falling over the months. This is used to predict future fall or rise of stock. There are other functions like moving averages, accelerator oscillator that help you interpret if the stock price will raise.
"Data mining supports data analysis software, and the process consists of five major operational phases: extracting, transforming and loading transactional data into the data warehouse system; storing and managing it; allowing data access; analyzing data; and presenting data in a visual form. (Mraovic, 2008)”
There is another data mining software "https://www.uclassify.com” which presents you with analysis options for on sentiment, gender analysis, text language, topic, age analyzer, mood etc. You can click into any of these as per your need. For example, let us look into gender analysis. After you click into this there is a box to add your text or you can add the url of the site you wish to analyze. A word of advice is also shared above the box that says this analyzer tries to figure out if the texts have been written by male or female and is based on 11000 blogs (5500 written by female and 5500 by male). I went to the facebook page of Dalle restaurant and feed in the url into the analyzer. The following was displayed.
Next I did the same with the sentiment, which show positive and negative comments from the page.
From this we can see that majority of the visitor in the restaurant’s page is female and they are not happy with the service of Dalle. Dalle needs to address these concerns and make it more suitable for the females who seem to be the higher costomer.
These are some ways in which you can do data mining to understand the business. There are other free sites you can use to analyze and mine data which are very useful indicators to take decisions.
Mraovic, B. (2008). Relevance of data mining for accounting: social implications. Social Responsibility Journal; Bingley , 439-455.
Wallace, P. (2015). Information System in Action. In P. Wallace, Introduction to Information System (pp. 4-9). New Jersey: Pearson Education, Inc.
Data mining and data analytics go in a sync where the collected and stored data’s from various sources such as big data, transactional database, data warehouse, and other internal and external sources are mined by using analytic efforts to identify the patterns and support decision making.
"Identifying irrelevant data from databases is a significant task” (Deepti Mishra, 2014). Thus, Data mining is a systematic process of arranging the raw data and recognizing their different hidden patterns in the large data sets by using mathematical and computational algorithms. Data analytics, on the other hand, is the process of extracting facts from the information to answer some specific questions. For example, ML-Flex, WEKA, Text Analysis, Orange, etc. are some of the open source data mining that helps in structuring the data and find their hidden patterns so that it can be used for some purpose. Walmart uses data mining to find the sales trends, improve marketing campaigns, and find a pattern that can be used to recommend the product to the customers by observing their current and past buying behaviour.
Some of the approaches to Data mining and Analytics are explained below.
1. Cluster analysis
Cluster analysis divides data into groups. Data mining helps in identifying the cluster groups with similar age, gender, geography, sex, etc. and segment the database. After segmenting the database, the marketing company can use it to set their targets for marketing gains. For example, Johnson is promoting and targeting their products to small babies, while Pantene is targeting the young woman who needs smooth and healthy hair.
2. Regression analysis
According to this approach, the changes in one factor affects the other. The change in the buying pattern of the buyer can affect the marketing of that product. Hence, his analysis helps to forecast the changes in buying behaviour, habits and satisfaction level of customers so that they can modify their advertising campaigns costs.
3. Association mining
This is a traditional approach used to know to discover the links between large volumes of product sales activities. According to Bansal (2017), "It’s a rule-based ML method for discovering interesting relations between variables in large databases.” This approach identifies the relationship between different unrelated data in a relational database. For example, if a customer is buying bread, then he is likely to also purchase bread or chicken with it.
4. Decision Tree
In this approach, the data are formed in a tree structure and a parent-child formation. Here each parent represents a class and their child represent the data that comes inside the child. The decision tree analysis is the right computer tool to organize the various decision choices and present in detail with its costs and benefits so that the management can choose the best decisions that are favourable to their company.
5. Markov Model
This approach is known as the best tool to identify a pattern in prediction based applications. It gives results with higher accuracy and strength than the other approaches as it works with both structured and unstructured data. This model is useful in identifying the patterns over the series of data that is used in decision-making.
6. Data warehouse
Most of the data warehouse depends on the relational database, therefore organizations are adopting Hadoop and NoSQL to handle less structured data’s. For example, the application called Sears are used to save time to consign terabytes of data into Hadoop ignoring the ETL process used for the data warehouse.
7. Online analytical processing (OLAP)
This approach allows users to extract and recover useful information from data after observing and analyzing it from various perspectives and penetrating into specific groups. "The software allows users to "slice and dice” massive amounts of data stored in data warehouses to reveal significant patterns and trends” (Wallace, 2013).
In this way, the Business tool is becoming an extremely useful tree to help the organization in the decision-making process by providing all the correct information at the right time and innovative tools to envision data from graphs, pie charts, tables through the use of colour, shapes, 3D views, etc.
Bansal, M. (2017, March 28). Association Rule Mining. Deciphered . Retrieved from A Medium Corporation(US): https://medium.com/data-science-group-iitr/association-rule-mining-deciphered-d818f1215b06
Deepti Mishra, D. S. (2014). A Comprehensive Overview of Data Mining: Approaches and Applications. International Journal of Computer Science and Information Technologies, Volume 5, Issue 6 , 7814-7816.
Wallace, P. (2013). Introduction to Information System, Second Edition. New Jersey: Pearson Eductaion Inc
The database of an organization contains enormous data but the data is of no use if we fail to use it effectively for the benefit of organization. In order to take business decisions, we need to look carefully through the available data, identify the trends, discover the patterns, establish meaningful relationships and find hidden correlation between different variables. Data mining facilitates the same. It is the process of transforming data into meaningful insights after properly analyzing it (Wallace, 2015).
Some approaches to data mining are;
Market Basket Analysis: Market basket analysis is used to identify association between different items people buy. It tries ti find out the combination of items that occur frequently in a transaction based on the assumption that people are likely to buy certain set of items following the set of previous items they purchase. For example: People who buy toothpaste are more likely to buy tooth brush. Similarly, people buying copies and pencils will also buy erasers and sharpeners.
Decision Tree: In this approach, we try to analyze all the possible alternatives related with a decision . Decision tree starts with a question/problem that has multiple answers/solutions and each answer/solution leads to new set of questions or conditions that affect our decision (Zentut). For example: If the profit margin of a company is decreasing, it might be due to decrease in sales or increase in production costs. If the reason is decrease in sale, it might be due to change in weather, introduction of substitute product or decrease in outlets. Similarly, if profit is decreased due to increase in production cost, it may be because of increase in cost of raw material or increased wages of labor. Each of these conditions also has multiple reasons. We keep on building possible scenarios until we reach at the conclusion to make a decision.
Clustering: In clustering, we group data together based on their similarities. For example, we may produce different products for different age group as the need and preferences of the people in same group will be similar (Alton, 2017).
In addition to these, regression, classification, association, outlier detection, sequential patterns and prediction are also used for data mining.
Data analytics is the set of skills, practices, technologies and applications used by the decision maker to examine the available data and draw conclusion out of it. In data analytics, we tend to look at the past performance to get the insight for future planning. Data mining helps to find the relationship between variables and data analytics tries to find the reason for certain happenings in order to take strategic decision. It is used for fact based decision making, quantitative analysis, exploratory modelling and prediction (O’Brien & Marakas, 2013).
The approaches for data analytics are:
What-if Analysis: In what if analysis, we tend to analyze the effect on other variables resulting from the change in one variable. For example, we are currently selling a product at Rs.20, if we decide to change its price to Rs.25, it will effect in our sales, income, tax and profit. The what-if analysis provokes the question what if the price is increased by Rs.5 and we analyze the data to find the effect of the price hike.
Sensitivity Analysis: Sensitivity analysis is similar to what-if analysis where we analyze the change in dependent variables due to the change in an independent variable. For example: The sensitivity of market price to change in interest rate.
Goal Seeking Analysis: In goal seeking analysis, we analyze whether the business activities are contributing to the attainment of the goals of the organization or not. It requires backward planning. For example: If the business has the target to increase the profit by 10 %, all the activities should be planned in such a way that they contribute to the target accomplishment his method aims to make this happen.
Optimization Analysis: It is the extension of goal seeking analysis where the goal of the organization is to obtain the optimal value possible instead of a pre-determined value.
To conclude, with the advancement of technology, the data in the organization is also increasing. So we need to use data mining and data analytics to extract the relevant information that will help us in taking the best possible decision in a given scenario.
Alton, L. (2017, December 22). The 7 Most Important Data Mining Techniques. Retrieved from Data Science Central: https://www.datasciencecentral.com/profiles/blogs/the-7-most-important-data-mining-techniques
O’Brien, J. A., & Marakas, G. M. (2013). Introduction to Information Systems (16 ed.). Irvin: McGraw-Hill.
Wallace, P. (2015). Introduction to Information Systems (Second ed.). New Jersey: Pearson Education Inc.
Zentut. (n.d.). Data Mining Techniques. Retrieved from Zentut: http://www.zentut.com/data-mining/data-mining-techniques/
O’Brien & Marakas (2006), in their book Introduction to information systems, data mining approaches and statistical techniques used to predict future behaviour, especially to unlock the value of business intelligence for strategy. Data in data warehouses are analyzed to reveal hidden patterns and trends likes’ market-basket analysis to identify new product bundles, find root cause of qualify or manufacturing problems, prevent customer attrition, acquire new customers, cross-sell to existing customers, and profile customers with more accuracy.
According to Kantardzic (2011), the goals of prediction and description are achieved by using data mining techniques for the following primary data - mining tasks:
Classification: Discovery of a predictive learning function that classifies a data item into one of several predefined classes. Here, it assigns items in a collection to target categories. For instance, the customer of bank asking for loan should be analyzing their profile first. The manager identifies loan applicants as low, medium, or high credit risks.
Regression: Discovery of a predictive learning function that maps a data item to a real value prediction variable. Profit, sales, mortgage rates, house values, square footage, temperature, or distance could all be predicted using regression techniques.
Clustering: A common descriptive task in which one seeks to identify a finite set of categories or clusters to describe the data. This analysis is broadly used in many applications such as market research, pattern recognition, data analysis, and image processing.
Summarization: An additional descriptive task that involves methods for finding a compact description for a set (or subset) of data. Excel is a best tool to summarize the data and the use of formula can interpret the relation. This is applied for data analysis, data visualization and automated report generation.
Dependency Modeling: Finding a local model that describes significant dependencies between variables or between the values of a feature in a data set or in a part of a data set. Retailers use dependency modeling to analyze consumer behaviour such as purchasing habits.
Change and Deviation Detection: Discovering the most significant changes in the data set. It focuses on discovering the most significant changes in the data from previously measured or normative values.
Kantardzic, M. (2011). Data mining: concepts, models, methods, and algorithms . John Wiley & Sons.
O’Brien, J. A., & Marakas, G. M. (2006). Management information systems (Vol. 6). McGraw-Hill Irwin.