Potential Sources for Datasets

Part 1

Throughout the semester you will incrementally create your individual final project.

1. Identify at least 3 different datasets and perform some initial exploration.

Potential Sources for Datasets

a. Kaggle

b. Open Data Network

c. American Community Survey

d. Bureau of Labor Statistics

e. Bureau of Economic Analysis

f. Open Data Cincinnati

g. Data.gov

h. Healthdata.gov

i. Amazon Web Services Datasets

j. The General Society Survey

2. Once you have narrowed down the 3 data set candidates then you need to start thinking about what type of questions you would want to ask and answer of these data sets. Answer the following questions below based on the datasets which you picked.

  1. Provide an introduction that explains the problem statement you are addressing. Why would someone be interested in this?
  2. Draft 5-10 Research questions that focus on the problem statement.
  3. Provide a concise explanation of how you plan to address this problem statement.
  4. Discuss how your proposed approach will address (fully or partially) this problem.
  5. Do some digging on a dataset that you can use to address the issue.
  • Original source where the data was obtained is cited and, if possible, hyperlinked.
  • Source data is thoroughly explained (i.e. what was the original purpose of the data, when was it collected, how many variables did the original have, explain any peculiarities of the source data such as how missing values are recorded, or how data was imputed, etc.).
  1. Identify the Packages that are needed for your project.
  2. What types of plots and tables will help you to illustrate the findings to your research questions?
  3. What do you not know how to do right now that you need to learn to answer your research questions?

Part 2

  1. Tell me why you are doing the data cleaning activities that you perform.
  2. With a clean dataset, show what the final data set looks like. However, do not print off a data frame with 200+ rows; show me the data in the most condensed form possible.
  3. What do you not know how to do right now that you need to learn to import and cleanup your dataset?
  4. Discuss how you plan to uncover new information in the data that is not self-evident.
  5. What are different ways you could look at this data to answer the questions you want to answer?
  6. Do you plan to slice and dice the data in different ways, create new variables, or join separate data frames to create new summary information? Explain.
  7. How could you summarize your data to answer key questions?
  8. What types of plots and tables will help you to illustrate the findings to your questions? Ensure that all graph plots have axis titles, legend if necessary, scales are appropriate, appropriate geoms used, etc.).
  9. What do you not know how to do right now that you need to learn to answer your questions?
  10. Do you plan on incorporating any machine learning techniques to answer your research questions? Explain.

Part 3

  1. Discuss how you plan to uncover new information in the data that is not self-evident.
  2. What are different ways you could look at this data to answer the questions you want to answer?
  3. Do you plan to slice and dice the data in different ways, create new variables, or join separate data frames to create new summary information? Explain.
  4. How could you summarize your data to answer key questions?
  5. What types of plots and tables will help you to illustrate the findings to your questions? Ensure that all graph plots have axis titles, legend if necessary, scales are appropriate, appropriate geoms used, etc.).
  6. What do you not know how to do right now that you need to learn to answer your questions?
  7. Do you plan on incorporating any machine learning techniques to answer your research questions? Explain.
  8. What features could you filter on?
  9. How could arranging your data in different ways help?
  10. Can you reduce your data by selecting only certain variables?
  11. Could creating new variables add new insights?
  12. Could summary statistics at different categorical levels tell you more?
  13. How can you incorporate the pipe (%>%) operator to make your code more efficient?

Part 4

  1. Overall, write a coherent narrative that tells a story with the data as you complete this section.
  2. Summarize the problem statement you addressed.
  3. Summarize how you addressed this problem statement (the data used and the methodology employed).
  4. Summarize the interesting insights that your analysis provided.
  5. Summarize the implications to the consumer (target audience) of your analysis.
  6. Discuss the limitations of your analysis and how you, or someone else, could improve or build on it.
  7. In addition, submit your completed Project using R Markdown or provide a link to where it can also be downloaded from and/or viewed.

The post Potential Sources for Datasets appeared first on My Assignment Online.

WeCreativez WhatsApp Support
Our customer support team is here to answer your questions. Ask us anything!
šŸ‘‹ Hi, how can I help?
Scroll to Top