Search code examples
hadoopstructured-databigdata

Why videos are unstructured data in context of Big data?


I am trying to delve into Big data, and few of the terms which I came across are structured and unstructured data. I understood what it means to be structured and unstructured data`.

I am having difficulty in understand as to why Videos and photos fall under the category of unstructured data.

Can anyone please help me understand this?


Solution

  • Most definitions of 'structured' data refer to data with a high degree of organization, usually meaning a predefined data schema. A schema generally consists of a number of fields in a specific order, each containing just one type of data, much like a classic DB table:

    userId,username,age,location,joinedOn
    12,"Polly",20,"Washington DC","2016-02-23 13:34:01"
    14,"Dan",19,"San Diego CA","2016-11-10 18:32:21"
    15,"Shania",36,"","2017-01-04 10:46:39"
    

    In this case, you have two String fields, two Integer fields, and a Date/Time-type field. In a Big Data context, this allows for convenient data querying/processing, vastly improved compression, as well as efficient storage. All of which can be difficult problems, in particular as data volumes get larger.

    Now consider images, which can be represented in many different ways: Simple bitmaps, vectors, progressive JPEGs, formats with built-in variable compression, fractals, containers of animation frames, etc. Not only this, but images have different sizes, color palettes, and metadata, and all of this variation means you can't really treat two images with different properties as one data schema (meaning you don't get the benefits of column-oriented storage, compression, or querying).

    As for videos, all of the above is still true, except you have container formats which can contain multiple different video (and audio) codecs and compressions inside, adding further complexity.