Data Mining & BI Report - Professional blog

Unit
Assessment Type	Group Assignment
Assessment Number	A3
Assessment Name	Data Mining & BI Report
Weighting	30%
Alignment with Unit and Course	ULO1, ULO2, ULO3, ULO4

Due Date and Time	Friday, Week 12
Assessment Description	In this assessment, the students will extend their previous work from assessment A2. Here, the students have to submit a report of the data mining process on a real-world scenario. The report will consist of the details of every step followed by the students.
Detailed Submission Requirements	Cover Page: Title Group members Introduction: Importance of the chosen area Why this data set is interesting What has been done so far Which can be done Description of the present experiment Data preparation and Feature extraction: Select data Task Select data Decide on the data to be used for analysis. Criteria include relevance to the data mining goals, quality and technical constraints such as limits on data volume or data types. Output Rationale for inclusion/exclusion List the data to be used/excluded and the reasons for these decisions. Clean data Task Clean data Raise the data quality to the level required by the selected analysis techniques. This may involve selection of clean subsets of the data, the insertion of suitable defaults or more ambitious techniques such as the estimation of missing data by modeling. Output Data cleaning report Describe the decisions and actions that were taken to address the data quality problems reported during the Verify Data Quality Task. The report should also address what data quality issues are still outstanding if the data is to be used in the data mining exercise and what possible affects that could have on the results. Activities reconsider how to deal with observed type of noise. Correct, remove or ignore noise. Decide how to deal with special values and their meaning. The area of special values can give rise to many strange results and should be carefully examined. Examples of special values could arise through taking results of a survey where some questions were not asked or nor answered. This might result in a value of ‘99’ for unknown data. For example, 99 for marital status or political affiliation. Special values could also arise when data is truncated – e.g. ‘00’ for 100 year old people or all cars with 100,000 km on the clock. Reconsider Data Selection Criteria (See Task 2.1) in light of experiences of data cleaning (i.e. one may wish include/exclude other sets of data). Construct data/ feature extraction Task Construct data This task includes constructive data preparation operations such as the production of derived attributes, complete new records or transformed values for existing attributes. Output Derived attributes Derived attributes are new attributes that are constructed from one or more existing attributes in the same record. An example might be area = length * width. Why should we need to construct derived attributes during the course of a data mining investigation? It should not be thought that only data from databases or other sources is the only type of data that should be used in constructing a model. Derived attributes might be constructed because: Background knowledge convinces us that some fact is important and ought to be represented although we have no attribute currently to represent it. The modeling algorithm in use handles only certain types of data, for example we are using linear regression and we suspect that there are certain non-linearities that will be not be included in the model.The outcome of the modeling phase may suggest that certain facts are not being covered. Activities Derived attributes Decide if any attribute should be normalized (e.g. when using a clustering algorithm with age and income in lire, the income will dominate). Consider adding new information on the relevant importance of attributes by adding new attributes (for example, attribute weights, weighted normalization). How can missing attributes be constructed or imputed? [Decide type of construction (e.g., aggregate, average, induction)]. Add new attributes to the accessed data Good idea! Before adding Derived Attributes, try to determine if and how they ease the model process or facilitate the modeling algorithm. Perhaps “income per head” is a better/easier attribute to use that “income per household.” Do not derive attributes simply to reduce the number of input attributes. Another type of derived attribute is single-attribute transformations, usually performed to fit the needs of the modeling tools. Activities Single-attribute transformations Specify necessary transformation steps in terms of available transformation facilities (for example. change a binning of a numeric attribute). Perform transformation steps. Hint! Transformations may be necessary to transform ranges to symbolic fields (e.g. ages to age ranges) or symbolic fields (“definitely yes,” “yes,” “don’t know,” “no”) to numeric values. Modeling tools or algorithms often require them. Output Generated records Generated records are completely new records, which add new knowledge or represent new data that is not otherwise represented, e.g., having segmented the data, it may be useful to generate a record to represent the prototypical member of each segment for further processing. Activities Check for available techniques if needed (e.g., mechanisms to construct prototypes for each segment of segmented data). 2 Modeling 2.1 Select modeling technique Task Select modeling technique As the first step in modeling, select the actual modeling technique that is to be used initially. If multiple techniques are applied, perform this task for each technique separately. It should not be forgotten that not all tools and techniques are applicable to each and every task. For certain problems, only some techniques are Appropriate. From among these tools and techniques there are “Political Requirements” and other constraints, which further limit the choice available to the miner. It may be that only one tool or technique is available to solve the problem in hand – and even then the tool may not be the absolutely technical best for the problem in hand. Output Modeling technique Record the actual modeling technique that is used. Activities Decide on appropriate technique for exercise bearing in mind the tool selected. Output Modeling assumption Many modeling techniques make specific assumptions about the data, data quality or the data format. Activities Define any built-in assumptions made by the technique about the data (e.g. quality, format, distribution). Compare these assumptions with those in the Data Description Report. Make sure that these assumptions hold and step back to the Data Preparation Phase if necessary. You can explain the data file here, even when it is pre prepared. Draw some basic chart to understand the data relations (if any) using MS Excel or Tableau. 2.2 Generate test design Task Generate test design Prior to building a model, a procedure needs to be defined to test the model’s quality and validity. For example, in supervised data mining tasks such as classification, it is common to use error rates as quality measures for data mining models. Therefore the test design specifies that the dataset should be separated into training and test set, the model is built on the training set and its quality estimated on the test set. Output Test design Describe the intended plan for training, testing and evaluating the models. A primary component of the plan is to decide how to divide the available dataset into training data, test data and validation test sets. Activities Check existing test designs for each data mining goal separately. Decide on necessary steps (number of iterations, number of folds etc.). Prepare data required for test. (You can use 66% of records for model Building and rest for Testing) 2.3 Build model Task Build model Run the modeling tool on the prepared dataset to create one or more models. (Using Weka or Knime or any Tool of your choice) Output Parameter settings With any modeling tool, there are often a large number of parameters that can be adjusted. List the parameters and their chosen values, along with the rationale for the choice. Activities Set initial parameters. Document reasons for choosing those values. Output Models Run the modeling tool on the prepared dataset to create one or more models. Activities Run the selected technique on the input dataset to produce the model. Post-process data mining results (e.g. editing rules, display trees). Output Model description Describe the resultant model and assess its expected accuracy, robustness and possible shortcomings. Report on the interpretation of the models and any difficulties encountered. You can add the screenshots of the various output you go when you run the Model. Activities Describe any characteristics of the current model that may be useful for the future. Record parameter settings used to produce the model. Give a detailed description of the model and any special features. For rule-based models, list the rules produced plus any assessment of per-rule or overall model accuracy and coverage. For opaque models, list any technical information about the model (such as neural network topology) and any behavioral descriptions produced by the modeling process (such as accuracy or sensitivity). Describe the model’s behavior and interpretation. State conclusions regarding patterns in the data (if any); sometimes the model reveals important facts about the data without a separate Assessment process (e.g. that the output or conclusion is duplicated in one of the inputs). 3 Evaluation Previous evaluation steps dealt with factors such as the accuracy and generality of the model. This step assesses the degree to which the model meets the business objectives and seeks to determine if there is some business reason why this model is deficient. It compares results with the evaluation criteria defined at the start of the project. A good way of defining the total outputs of a data mining project is to use the equation: RESULTS = MODELS + FINDINGS In this equation we are defining that the total output of the data mining project is not just the models (although they are, of course, important) but also findings which we define as anything (apart from the model) that is important in meeting objectives of the business (or important in leading to new questions, line of approach or side effects (e.g. data quality problems uncovered by the data mining exercise). Note: although the model is directly connected to the business questions, the findings need not be related to any questions or objective, but are important to the initiator of the project. 3.2 Review process Activities Rank the possible actions. Select one of the possible actions. Document reasons for the choice. Content Marks Cover Page Table of contents 2 Executive Summary 2 Introduction 2 Data Pre-processing and feature extraction 3 Experiment 5 Result analysis 5 Conclusion 2 References 1 Presentation and QA 8
Misconduct	The AIH misconduct policy and procedure can be read on the AIH website (https://aih.nsw.edu.au/about-us/policies-procedures/).
Special consideration Any assessment submitted past the specific due date and time will be classified as Late. Any Late submission will be subject to a reduction of the mark allocated for the assessment item by 5% per day (or part thereof) of the total marks available for the assessment item. A ‘day’ for this purpose is defined as any day of the week including weekends. Assignments submitted later than one (1) week after the due date will not be accepted, unless special consideration is approved as per the formal process. Students whose ability to submit or attend an assessment item is affected by sickness, misadventure or other circumstances beyond their control, may be eligible for special consideration. No consideration is given when the condition or event is unrelated to the student’s performance in a component of the assessment, or when it is considered not to be serious. Students applying for special consideration must submit the form within 3 days of the due date of the assessment item or exam. The form can be obtained from the AIH website (https://aih.nsw.edu.au/current-students/student-forms/) or on-campus at Reception. The request form must be submitted to Student Services. Supporting evidence should be attached. For further information please refer to the Student Assessment Policy and associated Procedure available on (https://aih.nsw.edu.au/about-us/policies-procedures/).	Students whose ability to submit or attend an assessment item is affected by sickness, misadventure or other circumstances beyond their control, may be eligible for special consideration. No consideration is given when the condition or event is unrelated to the student’s performance in a component of the assessment, or when it is considered not to be serious. Students applying for special consideration must submit the form within 3 days of the due date of the assessment item or exam. The form can be obtained from the AIH website (https://aih.nsw.edu.au/current-students/student-forms/) or on-campus at Reception. The request form must be submitted to Student Services. Supporting evidence should be attached. For further information please refer to the Student Assessment Policy and associated Procedure available on (https://aih.nsw.edu.au/about-us/policies-procedures/).

Rubrics

Marking criteria

ULO1: Demonstrate broad understanding of data mining and business intelligence and their benefits to business practice
ULO 2: Choose and apply models and key methods for classification, prediction, reduction, exploration, affinity analysis, and customer segmentation that can be applied to data mining as part of a business intelligence strategy
ULO3:Analyse appropriate models and methods for classification, prediction, reduction, exploration, affinity analysis, and customer segmentation to data mining
ULO4: Propose a data mining approach using real business cases as part of a business intelligence strategy

Report addresses all the tasks.
Report consists of no/minor mistakes.

(25-30 marks)

Report addresses all the tasks.
Report consists of a few number of mistakes.

(20-24 marks)

Report addresses most of the contents.
Report consists of a few number of mistakes.

(15-19 marks)

Report addresses a few of the contents.
Report consists of a good number of mistakes.

(15 marks)

Incomplete report.
Unable to perform the experiment/data pre- processing/ conclude result.
(0-14 marks)

The post Data Mining & BI Report appeared first on My Assignment Online.

Share this:

Like this:

Related Posts