Search code examples
hadooptestinghdfssqoopbigdata

How we do Testing on migrated data from RDBMS to HDFS using SQoop?


  1. How a tester test if the data is moved from RDBMS to HDFS? Please explain only from testing perspective.

  2. What is the difference between moving data from RDBMS to HDFS and Moving Data from RDBMS to HIVE? As i know HIVE is not a database then why moving data to HIVE?


Solution

  • The topic is quite big. I will try to answer in simple terms.

    How a tester test if the data is moved from RDBMS to HDFS? Please explain only from testing perspective.

    This is what we did, in the past. Once the migration activity happened. We wrote bunch of test script where we polled the RDBMS with a good amount of random records. Prepared primary keys out of each records and then searched those PK on the hive tables and did head to head match of both result-sets.

    What is the difference between moving data from RDBMS to HDFS and Moving Data from RDBMS to HIVE? As i know HIVE is not a database then why moving data to HIVE

    When you move data to HDFS, you store the entire dataset into FS (that is file system). Hive is nothing but a Sql Wrapper which use the same files and give you an Sql interface to read/write the same data. Hive is not an actual database but it can be used as a database.

    Consider you underlying file is a simple csv. Hive while creating a Hive table you provide the delimeter, file name, column information and couple of other parameters and Hive will represent the same file as if it is a table.

    After this you can add/delete/update records from the hive table or directly editing the CSV.