meta data for this page
  •  

Big Data Definition

For a long time Big Data related to a vast amount of data that a single machine was unable to process. With increasing computational power however these boundaries became less visible resulting in a need of a more precise definition. For example Gartner came up with the 3V model. It defines big data as high-volume, high-velocity and high-variety.

  • High-Volume: Today vast amounts of data is generated by users, IT systems and sensors. The data is stored in digital form. The amount of available digital data increased over the past years dramatically and is expected to continue in the future.
  • High-Velocity: Relates to the gained speed in which data is collected nowadays. As our life’s become more and more digital and the number of sensors and IT systems increases so does the speed in which data is collected.
  • High-Variety: Big data analyses data from different sources and formats. Helpful is a trend to digitise all kinds of data sources such as in field of communication, climate, media and many more.

McAfee, Andrew, and Erik Brynjolfsson. “Big data: the management revolution.” Harvard business review 90 (2012): 60-6.

Russom, Philip. “Big data analytics.” TDWI Best Practices Report, Fourth Quarter (2011).

Manyika, James, et al. “Big data: The next frontier for innovation, competition, and productivity.” (2011).

Ethics in Big Data

Today our lifes involve more and more digital interactions - from social media to phones that are always connected to the internet to eCommerce shopping. In such interactions we leave footprints (data) that may contain information and an increasing number of governments and companies started to collect and analyse such data.

Users are aware that their data is being collected. At the same time almost nobody knows which data is collected, if it will be sold to third party players and which analyses are performed on the data. Therefore the question arises why do users still use these services? The answer might be different from country to country but for European countries it can be said that a few users trust these services whereas the majority has at least some worries but feels like they can do nothing against it. This is a contradiction as the European Union has strong privacy regulations. However in a globalist digital world borders are less visible. For example a US company that owns a server in the European Union where it stores data from EU users is in the dilemma that the European Union argues that EU privacy rules apply to this data whereas the US government disagrees with this opinion. Unfortunately privacy rules vary a lot from country to country. Russia and the US for example have a low privacy protection whereas the EU has strict rules regarding the privacy of its citizens.

In my humbled opinion every companies interest is to provide the user with the greatest user experience. This not only includes great design and a rich set of functionality but also a transparent process which data is collected and how it is used. Such information should not be hidden in long lists of terms, instead it should be presented in a graphical easy to understand way. Also the user should be able to option out of such usages and if he ends his contract with the company all his data should be deleted.

Exam questions

  1. In the second chapter [p. 26] the author states “The technical tools for handling data have already changed dramatically, but our methods and mind-sets have been slower to adapt. “ Discuss if we as a society should faster adapt to new possibilities offered by big data in accepting privacy constraints in favour of big data benefits. Include at least three benefits and three risks in your discussion. The question combines the students’ big data knowledge and his critical argumentation skills.
  2. For the past centuries humans spent all their effort on explaining past events by reasoning. Identified reasons were used to influence the future. In big data however “Predictions based on correlations lie at the heart of big data.” [p. 55] Explain what correlations are and discuss why big data will not will not totally replaced reasoning. Correlations are a very important part of big data. Unfortunately correlations have some limitations especially in terms of none linear correlations. Also reasoning will not be eliminated totally as big data only shows the likelihood of an appearance but not the reason why something will happen.
  3. Name five reasons why companies should publish their data and explain why it is important to publish even not so promising data. The book outlines multiple benefits where companies published their data. Also one example is given were not published data had a negative outcome for society.
  4. Discuss whether or not citizen should have access to government data. They might be able to perform analysis on the data but isn’t it the job of the government to provide citizen with these information’s? Open data can be easily misused for example when certain information is hidden in large data sets and the user drains in the massive size. Also responsibility to inform users / citizen is pushed to a third party.

Presentations

Statement of Accomplishment