I want to use HDFS for data storage, but I don't know how to use it in PieCloudDB. How can I configure PieCloudDB database to use HDFS storage?
Here is an example configuration for HDFS storage in PieCloudDB Database.
pdbcli cluster create -c kylin01:3333 -s 1 -m s3 --s3-endpoint ip address:9000 --s3-bucket mytest --s3-user minioadmin --s3-pass minioadmin --s3-region us-east-1 --tenant mytest
pdbcli cluster start -c kylin01:3333 --tenant mytest --cluster 2 ##start cluster
ps -ef|grep postgres ##check the port of cluster
create an HDFS client access file, which is placed in a fixed directory.
cd /home/openpie/cc
vim hdfs.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>dfs.default.uri</name>
<value>hdfs://ip address:8020</value>
</property>
<property>
<name>dfs.default.username</name>
<value>root</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>dfs-cluster</value>
</property>
<property>
<name>dfs.default.replica</name>
<value>3</value>
</property>
<property>
<name>dfs.client.log.severity</name>
<value>INFO</value>
</property>
<property>
<name>rpc.max.idle</name>
<value>100</value>
</property>
</configuration>
The file hdfs.xml needs to be copied to all the compute nodes related to the virtual data warehouse that uses HDFS. It is recommended to have the same directory path for all the nodes.
scp hdfs.xml kylin02:/home/openpie/cc/
scp hdfs.xml kylin03:/home/openpie/cc/
Then, create the HDFS provider configuration file in the installation directory of all coordinators and executors of this virtual data warehouse.
cd /home/openpie/cn0/mytest/2/6007/storage_provider_conf
vim hdfs_provider.conf
#------------------------------------------------------------------------------
# Storage Provider Configuration File
# BASIC OPTIONS
#------------------------------------------------------------------------------
provider_name = 'hdfs-1'
# provider type: local/nas/hdfs/aws-s3/ali-oss/tencent-cos
provider_type = 'hdfs'
#------------------------------------------------------------------------------
# POSIX STORAGE OPTIONS
#------------------------------------------------------------------------------
#posix.base_path = '/tmp/remote'
#------------------------------------------------------------------------------
# HDFS STORAGE OPTIONS
#------------------------------------------------------------------------------
hdfs.conf_file = '/home/openpie/cc/hdfs.xml'
#------------------------------------------------------------------------------
# OBJECT STORAGE OPTIONS
#------------------------------------------------------------------------------
Then, modify the postgresql.conf configuration file in the installation directory of all coordinators and executors of this virtual data warehouse
vim postgresql.conf
##Remove the comment from pdb_default_storage_provider and modify the content to the provider_name of HDFS.
pdb_default_storage_provider = 'hdfs-1'
Restart the virtual data warehouse cluster on the coordinator node of PDB using the openpie user.
pdbcli cluster stop -c kylin01:3333 --tenant mytest --cluster 2 ##stop cluster
pdbcli cluster start -c kylin01:3333 --tenant mytest --cluster 2 ##start cluster
Test the read and write capabilities of HDFS.
echo " create table t1 (c1 int); insert into t1 values(generate_series(1,1000000)); select count(*) from t1;drop table t1;" | psql -p 6007 openpie
View the written file in the HDFS management interface. http://ipaddress:9870/explorer.html#/