meta data for this page

Groupwise definition for Big Data

SEE: Groupwise definition summary, big_data_definition.pptx.pdf

We’d define big data from 2 different perspectives

1) Compared to “small data” and “small data processing”

 a.	The heterogeneity and the big amount of big data makes traditional database tools insufficient in processing the data.
 b.	Big data is high-volume, high-velocity and high-variety (the 3 V's) information assets. (2 additional V's can be added: veracity and value)

V's and definitions

  1. Volume
    1. large amounts of data (or small and messy)
  2. Variety
    1. the data comes in different forms databases, images, documents, videos and complex records
  3. Velocity
    1. the content of the data is constantly changing, through the absorption of complementary data collections, through the introduction of previously archived data or legacy collections, and from streamed data arriving from multiple sources

(Jules J. Berman, Ph.D., M.D. - Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information)

2) A new way of thinking things. Big data challenges our way of current thinking:

  • “Big data refers to things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments, and more.” Applying math to huge quantities of data in order to infer probabilities. [Big Data book]
    • Shift from causation to correlation. The correlation within data is more important than knowing the reason. E.g. data analysis can show used orange cars are in better condition than cars with other color. Knowing why can be interesting to know but yet no necessary. Big data answers questions what rather than why.
  • When the data is “all” there is, we can accept measurement errors. In massive data sets the errors will average out.
  • In doing science, big data has a potential of changing the research methods. Traditional way of doing research is taking small random samples and doing analysis and deduction from this limited amount of information. Exactness of measurements and results is important. With “all” data as the data set, there is no more need for that. In addition, as data is more and more in a datafied and digitalized form, there is less need to go to the field to collect data.

Known challenges for Big Data include capture, curation, storage, search, sharing, transfer, analysis, and visualization.