I know how to import and analyse structured and semi-structured data in hadoop using Pig,hive,sqoop using Json loader and Json serde but how to import unstructured data like video or audio or images and how to do further analysis on it.Kindly explain in simple step by step way, if you have any use case of analysing unstructured data that will be of great help.Thank You!
Since Hadoop does not play well with small files, one approach could be to group the binary files into a small number of large files (to avoid dealing with big amount of tiny files).
In order to do so, you could convert your binary files (images, audio, video, etc) in sequence files using custom UDFs that aggregates and store them into HDFS.
The following book, Pig Design Patterns, provides some design patterns regarding this topic (see chapter 2).
https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781783285556
Some code snipets are available on github.
https://github.com/pradeep-pasupuleti/pig-design-patterns/blob/master/Chapter2
Hope this help!