.

Saturday, March 9, 2019

Data mining

This is an accounting system calculation, followed by the industry of a threshold. However, guessing the advantageousness of a unseasoned node would be info archeological site. Dividing the customers off phoner according to their profitability. Yes, this is a entropy archeological site task be grammatical case it requires culture synopsis to determine who the costumers are that brings to a greater extent blood to the compe rattling. Computing the total gross sales of the comp whatever. No, this is non a info excavation task because there Is non abstract involve, this information dismiss be pull out of any booking program. Sorting a disciple informationbase based on student ID numbers.No, this Is not a info milling activity because sorting by ID numbers doesnt Involved any info mining task. This is a straightforward infobase query Predicting the future simple eye price of a company utilise historical records. Yes. We would attempt to grow a g ood example that mickle prefigure the continuous value of the bourgeon price. This is an example of the area of data mining hit the sackn as prognostic stumpering. We could use regression for this theoretical accounting, although researchers in legion(predicate) line of merchandises have throw a wide variety of proficiencys for predicting term series. Monitoring the heart arrange of a patient for abnormalities. Yes.We would institute a model of the normal air of heart rate and raise an offend when an unusual heart style occurred. This would involve the area of data mining k like a shotn as anomaly detection. This could also be considered as a classification problem If we had examples of two normal and abnormal heart behavior. For for to each one one of the chase, identify the relevant data mining task(s) The Boston Celtic would uniform to approximate how more points their following opponent depart get against them. A military intelligence officer is inte rested in learning close to the captives proportions of Sunnis and Shies in a peculiar(a) strategic region. A NORA defense estimator mustiness(prenominal) decide like a shot whether a blip on the microwave radar is a flick of geese or an incoming nuclear missile. A policy-making strategist is seeking the outflank throngs to crowd outvass for donations in particular county. A homeland security decreed would like to determine whether a trustworthy sequence of financial and lobby moves implies a tendency to terrorist acts. A mole Street analyst has been asked to cay out out the expected change in stock price for a learn of companies with similar price/ boodle ratios.Question 3 For each of the following conflicts, explain which mannequin in the CRISP-DIM answer is represented Managers want to know by next week whether deployment will take place. on that pointfore, analysts meet to debate how useful and accurate their model is. This is the military rating variant in the CRISP-DIM process. In the evaluation phase the data mining analysts determine if the model and technique utilize meets craft designs established in the first phase. The data mining project coach meets with data origin manager to wrangle how the data will be serene. This is theData discretion phase in the CRISP-DIM process. The data storage warehouse is identified as a takeence during the argumentation concord phase however the actual data line of battle takes place during the Data Understanding Phase. In this phase data is absorbed and accessed from the re starts listed and identified in the Business Understanding phase. The data mining consultant meets with the vice president for selling, who says that he would like to move forward with customer relationship management. The main objective of business is to review during the Business Understanding Phase.So, therefore after the meeting it seems the data mining consultant gained triumph in convincing UP of merchandiseing to provide approval for playacting data mining on the customer relationship management system. The data mining project manager meets with the cropion line supervisor to discuss execution of instrument of changes and improvements. The discussion of implementation of changes and improvements in the project whether specific improvements or process changes are necessary to t every last(predicate)y that any important aspects of the business are accounted is performed under the military rank Phase.The meeting held with business objective to collect and cleanse the data to ensure the quality of data. The analysts meet to discuss whether the queasy network or close tree model should be apply Question 4 10 points define the realistic negative effects of exploit directly to mine data that has not been preprocessed. Before data mining algorithms can be apply, a target data treated must be assembled. As data mining can only debunk patterns actually present in the data, the target data sterilize must be large generous to contain these patterns while imagining concise teeming to be mined inside an brookable time limit.A common source for data is a data mart or data warehouse. Pre-processing is infixed to analyze the multivariate data trammels before data mining. The target narrow down is then cleaned. Data. Question 5 1 5 points Which of the three stages for handling abstracted values do you prefer? Which method is the most conservative and in all probability the safest, meaning that it fabricates the least(prenominal) add up of data? What are some drawbacks to this method? Methods for replacing lacking(p) field values with User defined constants essence or modesRandom draws from the distri thoion of the inconstant Question 6 Describe the differences amid the suppuration train, test sort out, and cogent evidence set. The training set is utilise to build the model. This contains a set of data that has fricasseed target a nd predictor variables. Typically a hold-out dataset or test set is used to try how well the model does with data outside the training set. The test set contains the fricasseed results data but they are not used when the test set data is turn through the model until the end, when the fricasseed data are compared against the model results.The model is change to minimize error on the test set. Another hold-out dataset or governing body set is used to evaluate the adjusted model in graduation 2 where, again, the validation set data is run against the adjusted model and results compared to the unused fricasseed data. The training set (seen data) to build the model (determine its parameters) and the test set (unseen data) to measure its performance (holding the parameters constant). Sometimes, we also need a validation set to tune the model (e. G. , for prune a decision tree). The validation set cant be used for examination (as its not unseen).Data miningThis is an accounting calcu lation, followed by the application of a threshold. However, predicting the profitability of a recent customer would be data mining. Dividing the customers off company according to their profitability. Yes, this is a data mining task because it requires data analysis to determine who the costumers are that brings more than business to the company. Computing the total sales of the company. No, this is not a data mining task because there Is not analysis involve, this information can be pull out of any booking program. Sorting a student database based on student ID numbers.No, this Is not a data milling activity because sorting by ID numbers doesnt Involved any data mining task. This is a simple database query Predicting the future stock price of a company using historical records. Yes. We would attempt to create a model that can predict the continuous value of the stock price. This is an example of the area of data mining known as prophetical modeling. We could use regression for this modeling, although researchers in many fields have developed a wide variety of techniques for predicting time series. Monitoring the heart rate of a patient for abnormalities. Yes.We would build a model of the normal behavior of heart rate and raise an alarm when an unusual heart behavior occurred. This would involve the area of data mining known as anomaly detection. This could also be considered as a classification problem If we had examples of some(prenominal) normal and abnormal heart behavior. For each of the following, identify the relevant data mining task(s) The Boston Celtic would like to approximate how many points their next opponent will score against them. A military intelligence officer is interested in learning somewhat the captives proportions of Sunnis and Shies in a particular strategic region. A NORA defense computer must decide immediately whether a blip on the radar is a flick of geese or an incoming nuclear missile. A political strategist is seeking the best groups to canvass for donations in particular county. A homeland security official would like to determine whether a certain sequence of financial and residence moves implies a tendency to terrorist acts. A Wall Street analyst has been asked to find out the expected change in stock price for a set of companies with similar price/earnings ratios.Question 3 For each of the following meetings, explain which phase in the CRISP-DIM process is represented Managers want to know by next week whether deployment will take place. Therefore, analysts meet to discuss how useful and accurate their model is. This is the Evaluation phase in the CRISP-DIM process. In the evaluation phase the data mining analysts determine if the model and technique used meets business objectives established in the first phase. The data mining project manager meets with data warehousing manager to discuss how the data will be collected. This is theData Understanding phase in the CRISP-DIM process. The data wareh ouse is identified as a resource during the Business Understanding phase however the actual data collection takes place during the Data Understanding Phase. In this phase data is collected and accessed from the resources listed and identified in the Business Understanding phase. The data mining consultant meets with the vice president for marketing, who says that he would like to move forward with customer relationship management. The main objective of business is to review during the Business Understanding Phase.So, therefore after the meeting it seems the data mining consultant gained success in convincing UP of marketing to provide approval for performing data mining on the customer relationship management system. The data mining project manager meets with the production line supervisor to discuss implementation of changes and improvements. The discussion of implementation of changes and improvements in the project whether specific improvements or process changes are required to ensure that all important aspects of the business are accounted is performed under the Evaluation Phase.The meeting held with business objective to collect and cleanse the data to ensure the quality of data. The analysts meet to discuss whether the neural network or decision tree model should be applied Question 4 10 points Describe the practicable negative effects of proceeding directly to mine data that has not been preprocessed. Before data mining algorithms can be used, a target data set must be assembled. As data mining can only uncover patterns actually present in the data, the target data set must be large enough to contain these patterns while imagining concise enough to be mined within an acceptable time limit.A common source for data is a data mart or data warehouse. Pre-processing is essential to analyze the multivariate data sets before data mining. The target set is then cleaned. Data. Question 5 1 5 points Which of the three methods for handling missing values do you prefer? Which method is the most conservative and probably the safest, meaning that it fabricates the least amount of data? What are some drawbacks to this method? Methods for replacing missing field values with User defined constants Means or modesRandom draws from the distribution of the variable Question 6 Describe the differences between the training set, test set, and validation set. The training set is used to build the model. This contains a set of data that has fricasseed target and predictor variables. Typically a hold-out dataset or test set is used to evaluate how well the model does with data outside the training set. The test set contains the fricasseed results data but they are not used when the test set data is run through the model until the end, when the fricasseed data are compared against the model results.The model is adjusted to minimize error on the test set. Another hold-out dataset or validation set is used to evaluate the adjusted model in step 2 where, agai n, the validation set data is run against the adjusted model and results compared to the unused fricasseed data. The training set (seen data) to build the model (determine its parameters) and the test set (unseen data) to measure its performance (holding the parameters constant). Sometimes, we also need a validation set to tune the model (e. G. , for pruning a decision tree). The validation set cant be used for testing (as its not unseen).Data tapDetermine the benefits of data mining to the businesses when employing 1. prognostic analytics to represent the behavior of customers Predictive analytics is business intelligence applied science that produces a predictive score for each customer or other organisational element. Assigning these predictive scores is the job of a predictive model, which has, in turn been trained over your data, learning from the experience of your organization. Predictive analytics optimizes marketing campaigns and website behavior to extend customer re sponses, conversions and clicks, and to decr silence churn.Each customers predictive score informs actions to be taken with that customer. 1. Associations discovery in products sold to customers The way in which companies interact with their customers has changed dramatically over the past few years. A customers keep business is no longer guaranteed. As a result, companies have imbed that they need to understand their customers better, and to quickly respond to their wants and needs. In addition, the time frame up in which these responses need to be made has been shrinking.It is no longer possible to wait until the signs of customer dissatisfaction are obvious before action must be taken. To succeed, companies must be proactive and anticipate what a customer desires. For an example in the old days, the storekeepers would simply keep track of all of their customers in their heads, and would know what to do when a customer walked into the store. Today store associates face a much m ore complex situation, more customers, more products, more competitors, and less time to react means that understanding your customers is now much harder to do.A number of forces are working together to increase the complexity of customer relationships, much(prenominal) as compressed marketing cycles, increase marketing costs, and a stream of new product offers. There are many kinds of models, such as linear formulas and business rules. And, for each kind of model, there are all the weights or rules or other mechanism that determine precisely how the predictors are combined. In fact, there are so many choices, it is literally impossible for a person to try them all and find the best one.Predictive analytics is data mining technology that uses the companys customer data to automatically build a predictive model specialized for the business. This process learns from the organizations collective experience by supplement the existing logs of customer purchases, behavior and demograph ics. The wisdom gained is encoded as the predictive model itself. Predictive modeling software has computer science at its core, travail a mixture of number crunching, trial, and error. 2. Web mining to discover business intelligence from Web customers The fast business growth has made both business community and customers face a new situation. Due to utmost(a) competition on the one hand and the customers option to prefer from a number of alternatives, the business community has realized the essential of intelligent marketing strategies and relationship management. Web servers record and accumulate data nigh substance abuser relations whenever requirements for resources are received. Analyzing the Web access logs can support understand the user behavior and the web structure.From the business and applications point of view, experience obtained from the web usage patterns could be directly applied to efficiently manage activities correlated to e-business, e-services and e-educ ation. Accurate web usage information could help to attract new customers, retain current customers, improve cross marketing/sales, effectiveness of promotional campaigns, tracking leaving customers etc. The usage information can be exploited to improve the performance of Web servers by developing decorous perfecting and caching strategies so as to decrease the server response time.User profiles could be built by combining users? navigation paths with other data features, such as page viewing time, hyperlink structure, and page content, according to Sonal Tiwari. 3. constellate to find related customer information Clustering is a regular unsupervised learning technique for grouping similar data points. A clod algorithm assigns a large number of data points to a smaller number of groups such that data points in the same group share the same properties while, in several(predicate) groups, they are dissimilar.Clustering has many applications, including part family formation for gr oup technology, image segmentation, information retrieval, web pages grouping, market segmentation, and scientific and engineering analysis. Many clustering methods have been proposed and they can be broadly classified into four categories such as partitioning methods, hierarchical methods, density-based methods and grid-based methods. Customer clustering is the most important data mining methodologies used in marketing and customer relationship management (CRM).Customer clustering would use customer-purchase transaction data to track buying behavior and create strategic business foremosts. Companies want to keep high-profit, high-value, and low-risk customers. This cluster typically represents the 10 to 20 percent of customers who create 50 to 80 percent of a companys profits. A company would not want to lose these customers, and the strategic initiative for the segment is obviously retention. A low-profit, high-value, and low-risk customer segment is also an magnetic one, and th e obvious goal here would be to increase profitability for this segment.Cross-selling (selling new products) and up-selling (selling more of what customers currently buy) to this segment are the marketing initiatives of choice. Assess the reliableness of the data mining algorithms. Decide if they can be trusted and predict the errors they are likely to produce. Most methods for validating a data-mining model do not answer business questions directly, but provide the metrics that can be used to guide a business or development decision. There is no comprehensive rule that can tell you when a model is good enough, or when you have enough data.Accuracy is a measure of how well the model correlates an outcome with the attributes in the data that has been provided. There are various measures of accuracy, but all measures of accuracy are mutualist on the data that is used. In reality, values might be missing or approximate, or the data might have been changed by five-fold processes. Part icularly in the phase of exploration and development, you might decide to accept a certain amount of error in the data, especially if the data is fairly uniform in its characteristics.For example, a model that predicts sales for a particular store based on past sales can be strongly correlated and very accurate, even if that store consistently used the wrong accounting method. Therefore, measurements of accuracy must be equilibrize by assessments of reliability. Reliability assesses the way that a data-mining model performs on different data sets. A data-mining model is reliable if it stimulates the same type of predictions or finds the same general kinds of patterns egardless of the test data that is supplied. For example, the model that you would use to generate for the store that used the wrong accounting method would not conclude well to other stores, and therefore would not be reliable. Analyze concealing doctors raised by the collection of own(prenominal) data for mining purposes. 1. exact and describe three (3) come tos raised by consumers. Recent surveys on secretiveness show a great concern about the use of personal data for purposes other than the one for which data has been collected.The handling of misinformation can cause sound and long-term damage, so individuals should be able challenge the rightness of data about themselves, such as personal records. The last concern is granulated access to personal information, such as personal information about someones health when applying for a job. 2. Decide if each of these concerns is valid and explain your decision for each. These concerns are valid, the first concerned mentioned caused an extreme case to occurred in 1989, collecting over $16 million USD by selling the driver-license data from 19. million Californian residents, the Department of Motor Vehicles in California revised its data selling policy after Robert Brado used their services to obtain the address of actress Rebecca Schaeffe r and later killed her in her apartment. While it is very unlikely that KDDM tools will reveal directly precise confidential data, the beta Knowledge Discovery and Data tap (KDDM), tools may correlate or disclose confidential, sensitive facts about individuals resulting in a significant reducing of possibilities.The second concern is valid due to incident happening in Washington Cablevision fired an employee crowd Russell Wiggings, on the basis of information obtained from Equifax, Atlanta, about Wiggings conviction for cocaine possession the information was actually about James Ray Wiggings, and the case ended up in court. This illustrates a serious issue in defining property of the data containing personal records. The third base issue is For example, employers are obliged to perform a background take when hiring a worker but it is widely accepted that information about diet and exercise habits should not affect hiring decisions. . Describe how each concern is being allayed. KDDM revitalizes some issues and possess new threats to privacy. Some of these can be directly attributed to the fact that this powerful technique may enable the correlativity of separate data sets in other to significantly reduce the possible values of private information. Other can be more attributed to the interpretation, application and actions taken from the inferences obtain with the tools.While this raises concerns, there is a body of knowledge in the field of statistical databases that could potentially be extended and alter to develop new techniques to balance the rights to privacy and the needs for knowledge and analysis of large volumes of information. Some of these new privacy protection methods are emergent as the application of KDD tools moves to more controversial datasets. Provide at least three (3) examples where businesses have used predictive analysis to gain a competitive advantage and evaluate the effectiveness of each businesss strategy.The first advantage a nalysis helps when it comes to validity of a product by making a distinction between the positioning of a product and its ability to satisfy customer requirements. Another important attributes include ease of use, innovation, how well the product integrates with other technologies that customers need. The second advantage is the technology provides to customers. however if a product is well designed, it must be able to help businesses achieve their business goals. Goals range from gaining insight about customers in order to be more competitive, to using the technology to increase revenue.A key attribute that is measured in this dimension is how well the product supports companies in meeting their objectives. The third advantage is the strength of the companys strategy. It is not enough to simply have a good vision a company must also have a well-designed road use that can support this vision. Vision attributes also include more tactical aspects of the companys strategy such as a t echnology platform that can scale, well-articulated messaging, and positioning. A key component of this dimension is clearness it must be clear what business problem the company is solve for which customer.ReferencesAlexander, D. (2012). Data Mining. Retrieved from http//www.laits.utexas.edu/anorman/BUS.FOR/course.mat/Alex/8Josh, K. (2012). Analysis of Data Mining Algorithms. Retrieved from http//www-users.cs.umn.edu/desikan/research/dataminingoverview.htmlExforsys. (2006). Execution for trunk Connection between Data Mining and Customer Interaction. Retrieved from http//www.exforsys.com/tutorials/data-mining/the-connection-between-data-mining-and-customer-interaction.htmlFrand, J. (1996). Data Mining What is Data Mining? Retrieved from http//www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/index.htmPupo, E. (2010). HIMSS News retirement and Security Concerns in Data Mining. Retrieved from http//www.himss.org/ASP/ContentRedirector.asp?type=HIMSSNewsItem&Conte ntId=73526Stein, J. (2011). Data Mining How Companies Now Know Everything About You. Retrieved from http//www.time.com/time/magazine/article/0,9171,2058205,00.htmlixzz25MwYNhuh

No comments:

Post a Comment