meta data for this page

Coursera Diploma

According to the McKinsey Global Institute, “Big data refers to data sets whose size is beyond the ability of typical data software tools to capture, store, manage and analyze.Traditional tools such as relational databases and desktop software for statistics and visualization are no longer adequate. Instead, big data requires “massively parallel software running on tens, hundreds or even thousands of servers.”

Many technology industry vendors and other experts subscribe to Gartner’s “3V” model, which defines big data as high-volume, high-velocity and high-variety information “that requires new forms of processing to enable enhanced decision-making, insight discovery and process optimization.”

Volume. Increasing amounts of data are being generated in real time by enterprise IT and sensor systems. Health information exchanges, traffic sensors and other monitoring devices, mobile networks and applications, video surveillance systems, citizen-facing Internet-based applications, enterprise resource planning (ERP) systems, tax systems etc.

Velocity. As the sophistication of data collection systems and sensors increases, so does the speed of data generation. Technological advances allow data to be captured (and analyzed) immediately.

Variety. The variety of data generated by governments includes financial transaction data and data from other transactional systems, sensor data, social media information, emails, photographs, video footage, audio, machine data, network and system data, and geographic/map data. With complex predictive models for weather, climate change and environmental events, scientists and researchers at the National Oceanic and Atmospheric Association (NOAA) generate between 80 and 100 terabytes of high-resolution climate- and weather related images, video and other data every day. This includes data from satellites, ships, aircraft and sensors that must be immediately analyzed to provide weather- and ocean-related forecasts and warnings that affect public safety and the national economy.

Some industry experts add a fourth “V,” veracity, implying that the data must be trustworthy. Still others add even more “Vs”— visualization and value, which respectively suggest the importance of data presentation and importance to the organization; vocabulary, which refers to the metadata, or data about the data.The National Association of State CIOs (NASCIO) contributes a final “V,” variability, and throws in a “C” for complexity.

Value. Oracle introduced Value as a defining attribute of big data. Based on Oracle's definition, big data are often characterized by relatively “low value density”. That is, the data received in the original form usually has a low value relative to its volume. However, a high value can be obtained by analyzing large volumes of such data.

Big data definitions have evolved rapidly, which has raised some confusion. This is evident from an online survey of 154 C-suite global executives conducted by Harris Interactive on behalf of SAP in April 2012 (“Small and midsize companies look to make big gains with big data,” 2012). The figure shows how executives differed in their understanding of big data, where some definitions focused on what it is, while others tried to answer what it does.


Gordon-Murnane, L 2012, 'BIG DATA', Online, 36, 5, pp. 30-34, Business Source Complete, EBSCOhost, viewed 18 February 2015.

'WHAT IS BIG DATA, ANYWAY?' 2013, Public CIO, 11, 1, pp. 6-7, Business Source Complete, EBSCOhost, viewed 18 February 2015.

Amir Gandomi,Murtaza Haider, 'Beyond the hype: Big data concepts, methods, and analytics', International Journal of Information Management

Volume 35, Issue 2, April 2015, Pages 137–144, viewed 18 February 2015.

Small and midsize companies look to make big gains with “big data,” according to recent poll conducted on behalf of SAP (2012, June 26) Retrieved from

Ethics in Big Data

Laws and regulations guide organizations, particularly around privacy and the use of data, defining the current “no-go” areas for an organization. However, recent advancements in analytics and big data technology has widened the gap between what is possible and what is legally allowed, changing the balance of power between individuals and the data collectors. Within this gap are new opportunities alongside the risks of public relations disasters and unintended consequences. And it is within this gap where the ethical questions around what is acceptable are raised.

As an organization looks towards applying analytics and big data to enhance the way they operate, how do they know that their use of this technology is ethical? At its core, an organization is “just people” and so are its customers and stakeholders. It will be individuals who choose what to organization does or does not do and individuals who will judge its appropriateness. As an individual, our perspective is formed from our experience and the opinions of those we respect. Not surprisingly, different people will have different opinions on what is appropriate use of big data and analytics technology particularly – so who decides which is “right”? Customers and stakeholders may have different opinions on to the organization about what is ethical.

This suggests that organizations should be thoughtful in their use of this technology; consulting widely and forming policies that record the decisions and conclusions they have come to. UK and Ireland Technical Consultancy Group (TCG) created this framework:

Context – For what purpose was the data originally surrendered? For what purpose is the data now being used? How far removed from the original context is its new use? Is this appropriate?

Consent & Choice – What are the choices given to an affected party? Do they know they are making a choice? Do they really understand what they are agreeing to? Do they really have an opportunity to decline? What alternatives are offered?

Reasonable – Is the depth and breadth of the data used and the relationships derived reasonable for the application it is used for?

Substantiated – Are the sources of data used appropriate, authoritative, complete and timely for the application?

Owned – Who owns the resulting insight? What are their responsibilities towards it in terms of its protection and the obligation to act?

Fair – How equitable are the results of the application to all parties? Is everyone properly compensated?

Considered – What are the consequences of the data collection and analysis?

Access – What access to data is given to the data subject?

Accountable – How are mistakes and unintended consequences detected and repaired? Can the interested parties check the results that affect them?

Together these facets are called the ethical awareness framework.

Exam Questions

Question 1. What is Big Data? Just because a data set is large, that doesn't automatically make it “Big Data.”

Question 2. Give an example of Big Data use and explain why it was beneficial. The book was filled with examples and the reader should have gotten an understanding of why it was used.

Question 3. Explain the difference between Open and Big Data. Important comparison between the two themes of the course

Question 4. Explain why it is beneficial for companies and governments to provide open data. Getting companies to release their data is crucial in obtaining more open data hemce we need to be able to convince them of the future beneficial properties of open data for society