Search code examples
hdfs

HDFS storage in PieCloudDB Database


I want to use HDFS for data storage, but I don't know how to use it in PieCloudDB. How can I configure PieCloudDB database to use HDFS storage?


Solution

  • Here is an example configuration for HDFS storage in PieCloudDB Database.

    pdbcli cluster create -c kylin01:3333 -s 1 -m s3 --s3-endpoint ip address:9000 --s3-bucket mytest --s3-user minioadmin --s3-pass minioadmin --s3-region us-east-1 --tenant mytest 
    
    pdbcli cluster start -c kylin01:3333 --tenant mytest --cluster 2 ##start cluster 
    
    ps -ef|grep postgres  ##check the port of cluster 
    

    enter image description here

    create an HDFS client access file, which is placed in a fixed directory.

    cd /home/openpie/cc
    vim hdfs.xml   
    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
            <property>
                    <name>dfs.default.uri</name>
                    <value>hdfs://ip address:8020</value>
            </property>
            <property>
                    <name>dfs.default.username</name>
                    <value>root</value>
            </property>
            <property>
                    <name>hadoop.security.authentication</name>
                    <value>simple</value>
            </property>
            <property>
                    <name>dfs.nameservices</name>
                    <value>dfs-cluster</value>
            </property>
            <property>
                    <name>dfs.default.replica</name>
                    <value>3</value>
            </property>
            <property>
                    <name>dfs.client.log.severity</name>
                    <value>INFO</value>
            </property>
            <property>
                    <name>rpc.max.idle</name>
                    <value>100</value>
            </property>
    </configuration>
    

    The file hdfs.xml needs to be copied to all the compute nodes related to the virtual data warehouse that uses HDFS. It is recommended to have the same directory path for all the nodes.

    scp hdfs.xml  kylin02:/home/openpie/cc/
    scp hdfs.xml  kylin03:/home/openpie/cc/
    

    Then, create the HDFS provider configuration file in the installation directory of all coordinators and executors of this virtual data warehouse.

    cd /home/openpie/cn0/mytest/2/6007/storage_provider_conf     
    vim hdfs_provider.conf   
    #------------------------------------------------------------------------------
    # Storage Provider Configuration File
    # BASIC OPTIONS
    #------------------------------------------------------------------------------
    provider_name = 'hdfs-1'
    # provider type: local/nas/hdfs/aws-s3/ali-oss/tencent-cos
    provider_type = 'hdfs'
    #------------------------------------------------------------------------------
    # POSIX STORAGE OPTIONS
    #------------------------------------------------------------------------------
    #posix.base_path = '/tmp/remote'
    #------------------------------------------------------------------------------
    # HDFS STORAGE OPTIONS
    #------------------------------------------------------------------------------
    hdfs.conf_file = '/home/openpie/cc/hdfs.xml'
    #------------------------------------------------------------------------------
    # OBJECT STORAGE OPTIONS
    #------------------------------------------------------------------------------
    

    Then, modify the postgresql.conf configuration file in the installation directory of all coordinators and executors of this virtual data warehouse

    vim postgresql.conf 
    ##Remove the comment from pdb_default_storage_provider and modify the content to the provider_name of HDFS.
    pdb_default_storage_provider = 'hdfs-1'
    

    enter image description here

    Restart the virtual data warehouse cluster on the coordinator node of PDB using the openpie user.

    pdbcli cluster stop -c kylin01:3333 --tenant mytest --cluster 2   ##stop cluster
    pdbcli cluster start -c kylin01:3333 --tenant mytest --cluster 2   ##start cluster
    

    Test the read and write capabilities of HDFS.

    echo " create table t1 (c1 int); insert into t1 values(generate_series(1,1000000)); select count(*) from t1;drop table t1;" | psql -p 6007 openpie
    

    View the written file in the HDFS management interface. http://ipaddress:9870/explorer.html#/

    enter image description here