BIZ 4194: DMBA Take‐Home Final Exam

A national veterans’ organization wishes to develop a predictive model to improve the cost-effectiveness
of their direct marketing campaign. The organization, with its in-house database of over 13 million donors,
is one of the largest direct-mail fundraisers in the United States. According to their recent mailing records,
the overall response rate is 49.4% (=1541/3120). We take a sample of this dataset to develop
classification models that can effectively capture potential donors in their direct marketing campaign
efforts. Descriptions of variables in the dataset (“Fundraising_final_exam_sp20.csv”) are as follows:
(Adapted from p. 521 in “Data Mining for Business Analytics: Concepts, Techniques, and Application in R”
by R, Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R. Patel, Kenneth C. Lichtendahl Jr. Wiley. 1st
edition. Wiley, 2017.)

Run the following classification models on the data to classify the records and report confusion
matrix for the train and validation data.
(1) naïve Bayes
(2) classification tree
(3) knn
(4) neural nets
Please note the following when running classification models on the data
Add comments for source codes, as much as possible, using annotations (#) in R with sufficient
detail so that the codes you have added can be better communicated to the grader.
Unlike other assignments, you are not allowed to get help from other students for this assignment.
However, if you have any questions for clarification, please post it on the Facebook Group page.
I will monitor the page and answer the questions as soon as I can.
For (1) – (4), use TARGET_B (not TARGET_D) as the outcome variable. Do not use Row.ID and
TARGET_D as predictors.
Preprocess data that is necessary for applying a classification model to the data.
For (1) – (4), partition the data into training (60%) and validation (40%) data. Use set.seed(2).
For (1) and (2), use all variables in the dataset as predictors.
For (3), use k = 10 and use all variables as predictors except zip codes (e.g., zipconvert_1,
zipconvert2, etc.) and HOMEOWNER.
For (4), use one hidden layer with five hidden nodes. Identify 7 most important predictors from
Variable Importance Plot using random forest model. Use the 7 variables as predictors in applying
the data to neural nets.
Task
(a). Create a table that displays the overall accuracy, sensitivity, and specificity when each of the
classification models is run on train and valid data.
(b). Briefly explain whether there exists the issue of overfitting for each of (1) – (4).
Deliverables
Create an MS Word report using RMarkdown for each of (1) – (4), merge them into a single Word
file (using the “copy and paste” method or other methods of your choice). Add Tasks (a) and (b)
at the end of the Word file. Save the file as “4194__final.docx” (e.g.
4194_2000999999_final.docx) and submit it to YSCEC.

The post BIZ 4194: DMBA Take‐Home Final Exam appeared first on My Assignment Online.

Share this:

Like this:

Related Posts