Limited Offer Get 25% off — use code BESTW25
No AI No Plagiarism On-Time Delivery Free Revisions
Claim Now

Using a real-world scenario

SECTION A: 3 QUESTIONS, 25 MARKS

Write your answers to this section in a separate answer book and write “Section A” on the first page of the book.

  1. (10 marks) Understanding Big Data:
    • (2 marks) Using a real-world scenario, explain in 50 words what the Big Data problem is.
    • (2 marks) Explain in 50 words what the problem of organizing big data is. Justify youranswer using the Four V’s (volume, variety, velocity and veracity) of Big Data.
    • (2 marks) Explain what a key-value database is and discuss how the data will be storedand queried in such databases.
    • (2 marks) Is a distributed algorithm more fault tolerant than a centralized algorithm?Provide an example to support your answer.
    • (2 marks) Compare Cluster, Grid and Cloud Computing.
  2. (10 marks) Data and Knowledge Lake:
    • (2 marks) Explain what a Data Lake is. Name four main components of a Data Lake andexplain one of them in 50 words.
    • (2 marks) Compare Data Lakes and Data Warehouses.
    • (2 marks) Explain what Data ingestion is. Name two Big Data Technologies/Systemsthat can be used for ingesting streaming data.
    • (2 marks) Explain what a Knowledge Lake is. Name four main components of a Knowledge Lake and explain the ‘Data and Knowledge Extraction’ component in 50 words.
    • (2 marks) Explain what Big Data Summarization is and how it can facilitate analyzingthe Big Data. Name four techniques for summarizing the Big Data.
  3. (5 marks) Processing Big Data
    • (1 mark) Explain why transparency is an important property that a distributed systemdesigner should achieve.
    • (1 mark) Explain what Apache Hadoop is. Name four components of the Hadoop Ecosystem and explain the role of the Zookeeper component (in the Hadoop Ecosystem) in 50 words.
    • (3 marks) Assume that we have a set of Tweet documents, similar to the Tweet datasetpresented in assignment 2. Provide the MapReduce algorithm pseudocode for calculating the count of number of occurrences of each word in the text of Tweets.

END OF SECTION A. PLEASE TAKE ANOTHER ANSWER BOOK.

Page 2

SECTION B: 5 QUESTIONS, 25 MARKS

Write your answers to this section in a separate answer book and write “Section B” on the first page of the book.

  1. (5 marks) Data Analytics
    • (2 marks) A financial institution asks you to develop a system that would predict the riskof granting a loan to a potential customer. They have a record of past requests of loans and whether the customer defaulted. Answer the following questions.
      1. (1 mark) Is this an example of descriptive, diagnostic, predictive, or prescriptive analytics? Justify your answer.
      2. (1 mark) Given the data that the financial institution has, Would you apply supervised machine learning or unsupervised machine learning? Justify your answer.
    • (3 marks) List the overall steps of a Data Mining project. Each step should be explainedin one sentence or two.
  2. (5 marks) Text Analytics
    • (3 marks) Enumerate and briefly explain three characteristics of text that make it particularly challenging for computer processing.
    • (2 marks) Explain what Named Entity recognition is and how it could be useful for textanalytics.
  3. (5 marks) Visual Analytics
    • (2 marks) Explain how one can use scatterplots to determine whether a variable can beuseful as a predictor.
    • (3 marks) A company wishes to analyse the impact of a product recently introduced in themarket. Explain three visual analytic techniques that could be useful for this study. For each visual analytic technique, make sure that you specify the visual analytic technique, the reason for its use, and the data source on which it would be applied.
  4. (5 marks) Stream Processing
    • (2 marks) Explain the characteristics of stream processing with reference to the four V’sof Big Data (volume, variety, velocity, and veracity).
    • (3 marks) Draw the Stream Model and explain all of its components.
  5. (5 marks) Big Data and Society
    • (2 marks) Explain the privacy issues that arised when the Netflix Prize was introducedand that lead to a lawsuit.
    • (3 marks) Briefly Explain two of the six critical questions for Big Data presented by DanahBoyd and Kate Crawford in their 2012 paper “Critical Questions for Big Data”.

The post Using a real-world scenario appeared first on My Assignment Online.

Plagiarism Free Assignment Help

Expert Help With This Assignment — On Your Terms

Native UK, USA & Australia writers Deadline from 3 hours 100% Plagiarism-Free — Turnitin included Unlimited free revisions Free to submit — compare quotes
Scroll to Top