web123456

Extra detailed steps for building and installing GreenPlum cluster

 

 

Table of contents

1. Installation instructions

1.1 Environment Description

1.2 Cluster Introduction

2. Installation environment preparation

2.1 Modify the names of each node

2.1.1 Modify the hosts of the master node

2.1.2 Copy to child nodes

2.1.3 Modify each node /etc/sysconfig/network file in turn

2.2 Modify the system kernel /etc/file

2.3 Modify the number of processes /etc/security//file

2.4 Modify /etc/selinux/config file

2.5 Copy the master node configuration to the child node

2.6 Create gpadmin user (all nodes)

3. Install Greenplum DB

3.1 Install Greenplum DB on the Master node

3.2 Create a configuration cluster hostlist file and open the node

3.2.1 Create a hostlist containing all node host names:

3.2.2 Create a seg_hosts that contains the host names of all Segment Hosts:

3.2.3 Configure ssh password-free connection:

3.3 Install Greenplum DB on the Segment node

4. Initialize the database

4.1 Create a resource directory

4.2 Environment variable configuration

4.2.1 Configuring environment variables on the master node

4.2.2 Then copy to each child node in turn:

4.2.3 Let the environment variable take effect:

4.3 NTP configuration

4.4 Check connectivity before initialization

4.5 Execute initialization

V. Database operations

5.1 Stop and start the cluster

5.2 Log in to the database

5.3 Cluster Status

6. Greenplum's hadoop environment configuration



1. Installation instructions

1.1 Environment Description

 

name

Version

operating system

CentOS 64bit

greenplum

Greenplum-db-5.0.0-rhel6-x86_64.rpm

 

1.2Cluster introduction

use1indivualmasternindivualsegmentClusters.Example:

196.168.0.1

196.168.0.2

196.168.0.3

196.168.0.4

in196.168.0.1formaster, the rest aresegment

2. Installation environment preparation

2.1 Modify the names of each node

2.1.1 Modify the hosts of the master node

Note: This is mainly to prepare for Greenplum to communicate with each other in the future.

[root@ gp-master ~]# vi /etc/hosts
127.0.0.1   localhost  localhost4 localhost4.localdomain4
::1         localhost  localhost6 localhost6.localdomain6
192.168.0.1   gp-master gp-master
192.168.0.2   gp-sdw1  gp-sdw1  
192.168.0.3   gp-sdw2  gp-sdw2 
192.168.0.4   gp-sdw3  gp-sdw3 

2.1.2 Copy to child nodes

After configuring the master node file, copy it to the other child nodes

scp /etc/hosts gp-sdw1:/etc

2.1.3 Modify each node /etc/sysconfig/network file in turn

At the same time, modify the file /etc/sysconfig/network. The file is as follows (the configurations of different nodes are different and cannot be copied, and all machines must be modified)

[root@ gp-master ~]# vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME= gp-maste

The HOSTNAME here must be consistent with the host name in /etc/hosts. Finally, you can use the ping gp-sdw1 node name command to test whether the configuration is complete.

2.2 Modify the master node system kernel/etc/document

(Note: The same configuration is configured first on the main node node, and after the configuration is completed,2.5Copy to other nodes in the section)

vi /etc/

  = 4096
  = 4000000000
  = 250 512000 100 2048
  = 1
 kernel.core_uses_pid = 1
  = 65536
  = 65536
  = 2048
 net.ipv4.tcp_syncookies = 1
 net.ipv4.ip_forward = 0
 net.ipv4.tcp_tw_recycle = 1
 net.ipv4.tcp_max_syn_backlog = 4096
 net..arp_filter = 1
 net.ipv4.ip_local_port_range = 1025 65535
 .netdev_max_backlog = 10000
 .rmem_max = 2097152
 .wmem_max = 2097152
 vm.overcommit_memory = 2 ### The test environment needs to cancel this, otherwise the oracle cannot be enabled ### The value is 1

 

2.3 Modify the number of master node processes/etc/security//document

(Note: The same configuration is configured first on the main node node, and after the configuration is completed,2.5Copy to other nodes in the section)

vi /etc/security//
*          soft    nproc     131072
root       soft    nproc     unlimited

 

2.4 Turn off all node firewalls,Modify the master node/etc/selinux/configdocument

(Note: The same configuration is configured first on the main node node, and after the configuration is completed,2.5Copy to other nodes in the section)

Close the firewall: service iptables stop
 Turn off the boot firewall: chkconfig iptables off
 Check the firewall status service iptables status

 

vi /etc/selinux/config 


# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
#     targeted - Targeted processes are protected,
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted

2.5 Copy the master node configuration to the child node

Copy to each child node in turn

scp /etc/ gp-sdw1:/etc
scp /etc/security// gp-sdw1:/etc/security/
scp /etc/selinux/config gp-sdw1:/etc/selinux

Finally, let each node configuration take effect

[root@dw-greenplum-1~]# sysctl -p (let the configuration take effect)

 

2.6 creategpadminUser (all nodes)

groupadd -g 530 gpadmin
useradd -g 530 -u 530 -m -d /home/gpadmin -s /bin/bash gpadmin
chown -R gpadmin:gpadmin /home/gpadmin
echo "gpadmin" | passwd --stdin gpadmin

3. Install Greenplum DB

3.1 inMasterInstall Greenplum DB on the node

The installation package is a rpm format to execute rpm installation command:

rpm -ivh greenplum-db-5.0.0-rhel6-x86_64.rpm

The default installation path is /usr/local, and then you need to modify the gpadmin operation permissions:

chown -R gpadmin:gpadmin /usr/local

 

3.2 Create a configuration clusterhostlistdocument,Open node

3.2.1 Create a hostlist containing all node host names:

su - gpadmin
mkdir -p /home/gpadmin/conf
vi /home/gpadmin/conf/hostlist

gp-master
gp-sdw1
gp-sdw2
gp-sdw3

3.2.2 Create a seg_hosts , including allSegment HostHostname of

vi /home/gpadmin/conf/seg_hosts

gp-sdw1
gp-sdw2
gp-sdw3

3.2.3 Configure ssh password-free connection:

[root@ gp-master ~]# su - gpadmin
 [gpadmin@ gp-master ~]# source /usr/local/greenplum-db/greenplum_path.sh
 [gpadmin@ gp-master ~]# gpssh-exkeys -f /home/gpadmin/conf/hostlist

 [STEP 1 of 5] create local ID and authorize on local host
   ... /home/gpadmin/.ssh/id_rsa file exists ... key generation skipped

 [STEP 2 of 5] keyscan all hosts and update known_hosts file

 [STEP 3 of 5] authorize current user on remote hosts
   ... send to gp-sdw1
   ... send to gp-sdw2
   ... send to gp-sdw3
 #Tip: Here we prompt to enter the user password of each child node
 [STEP 4 of 5] determine common authentication file content

 [STEP 5 of 5] copy authentication files to all remote hosts
   ... finished key exchange with gp-sdw1
   ... finished key exchange with gp-sdw2
   ... finished key exchange with gp-sdw3

 [INFO] completed successfully

Test whether the password-free connection is successful:

[root@ gp-master ~]# ssh gp-sdw1 #Login without a password;

or:

[root@ gp-master ~]# gpssh -f /home/gpadmin/conf/hostlist

=> pwd
[gp-sdw1] /home/gpadmin
[gp-sdw3] /home/gpadmin
[gp-sdw2] /home/gpadmin
[ gp-master] /home/gpadmin
=> exit

The above result is successful.

3.3existSegmentInstall on the nodeGreenplum DB

existeachSubnodes are used to empower folders:

chown -R gpadmin:gpadmin /usr/local
chown -R gpadmin:gpadmin /opt

Pack the installation package on the master node and copy it to each child node:

[gpadmin@mdw conf]$ cd /usr/local/
 Pack:
 [gpadmin@mdw greenplum]$ tar -cf greenplum-db-5.0.0/
[gpadmin@mdw greenplum]$ gpscp -f /home/gpadmin/conf/seg_hosts  =:/usr/local/

OK, if there is no accident, the batch copy is successful. You can go to the corresponding folder of the child node to view it. After that, you need to decompress the tar package. Now we will use batch decompression operation for the child nodes:

[gpadmin@mdw conf]$ source /usr/local/ greenplum-db/greenplum_path.sh
 [gpadmin@mdw conf]$ gpssh -f /home/gpadmin/conf/seg_hosts #Unified processing of child nodes

 => cd /usr/local
 [sdw3]
 [sdw1]
 [sdw2]
 => tar -xf
 [sdw3]
 [sdw1]
 [sdw2]

 #Create a soft link
 => ln -s ./greenplum-db-5.0.0 greenplum-db
 [sdw3]
 [sdw1]
 [sdw2]
 => ll (you can use ll to check if it has been installed successfully)
 =>exit(exit)

This completes the installation of all nodes.

4. Initialize the database

4.1 Create a resource directory

source /usr/local/ greenplum-db/greenplum_path.sh
 gpssh -f /home/gpadmin/conf/hostlist #Unified processing of all nodes

 #Create the next series of directories /opt/greenplum/data (the number of production directories can be generated according to requirements)
 => mkdir -p /opt/greenplum/data/master
 => mkdir -p /opt/greenplum/data/primary
 => mkdir -p /opt/greenplum/data/mirror
 => mkdir -p /opt/greenplum/data2/primary
 => mkdir -p /opt/greenplum/data2/mirror

4.2Environment variable configuration

4.2.1 Configuring environment variables on the master node

vi /home/gpadmin/.bash_profile added at the end

 source /usr/local/greenplum-db/greenplum_path.sh
 export MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1
 export GPPORT=5432
 export PGDATABASE=gp_sydb

4.2.2 Then copy to each child node in turn:

scp /home/gpadmin/.bash_profile gp-sdw1:/home/gpadmin/
。。。

4.2.3 Make environment variables effective:

source .bash_profile

 

4.3 NTP Configuration

EnablemasterOn the nodentp, and inSegmentConfigure and enable on nodesNTP:

echo "server gp-master perfer" >>/etc/
gpssh -f /home/gpadmin/conf/hostlist -v -e 'sudo ntpd'
gpssh -f /home/gpadmin/conf/hostlist -v -e 'sudo /etc//ntpd start && sudo chkconfig --level 35 ntpd on'

Note: This step is slow to execute, and the time has not been synchronized successfully. I don’t know why. But it does not actually affect the database installation.

4.4 Check connectivity before initialization

Check file reading between nodes;

cd /usr/local/greenplum-db/bin
gpcheckperf -f /home/gpadmin/conf/hostlist -r N -d /tmp

--  NETPERF TEST
-------------------

====================
==  RESULT
====================
Netperf bisection bandwidth test
gp-master -> gp-sdw1 = 72.220000
gp-sdw2 -> gp-sdw3 = 21.470000
gp-sdw1 -> gp-master = 43.510000
gp-sdw3 -> gp-sdw2 = 44.200000

Summary:
sum = 181.40 MB/sec
min = 21.47 MB/sec
max = 72.22 MB/sec
avg = 45.35 MB/sec
median = 44.20 MB/sec

The above content appears to prove that each node can be connected.

4.5 Perform initialization

initialization Greenplum The configuration file templates are all here/usr/local/greenplum-db/docs/cli_help/gpconfigsIn the directory,gpinitsystem_configIt is initialization Greenplum template in this template Mirror SegmentAll configurations are commented; create a copy and modify its following configurations:

cd /usr/local/greenplum-db/docs/cli_help/gpconfigs
 cp gpinsystem_config initgp_config
 vi initgp_config

 #The following is the attribute field configuration to be modified in the text
 #The resource directory is the resource directory created in Chapter 4.1. The resource directory is configured several times. Each child node has several instances (4-8 recommended, 6 are configured here, and the number of primary and mirror folders corresponds to)
 declare -a DATA_DIRECTORY=(/opt/greenplum/data/primary /opt/greenplum/data/primary /opt/greenplum/data/primary /opt/greenplum/data/primary /opt/greenplum/data2/primary /opt/greenplum/data2/primary)
 declare -a MIRROR_DATA_DIRECTORY=(/opt/greenplum/data/mirror /opt/greenplum/data/mirror /opt/greenplum/data/mirror /opt/greenplum/data/mirror /opt/greenplum/data2/mirror /opt/greenplum/data2/mirror)

 ARRAY_NAME=”gp_sydb” #4.2.1 Initialized database name configured in Chapter 4.2.1
 MASTER_HOSTNAME=gp-master #Master node name
 MASTER_DIRECTORY=/opt/greenplum/data/master #The resource directory is the resource directory created in Chapter 4.1
 MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1 #The same configuration as in Chapter 4.1
 DATABASE_NAME=gp_sydb Initialized Database Name configured in Chapter 4.2.1
 MACHINE_LIST_FILE=/home/gpadmin/conf/seg_hosts #is the file created in Chapter 3.2.2

Perform initialization;

gpinitsystem -c initgp_config -S

If the initialization fails, the data resource directory under /opt needs to be deleted and re-initialized;

If the initialization is successful, then congratulations on the installation successful.

V. Database operations

5.1 Stop and start the cluster

gpstop -M fast
gpstart -a

5.2 Log in to the database

$ psql -d postgres #Enter a database

 postgres=# \l # Query the database
                  List of databases
    Name | Owner | Encoding | Access privileges
 -----------+---------+----------+---------------------
  gp_sydb | gpadmin | UTF8 |
  postgres | gpadmin | UTF8 |
  template0 | gpadmin | UTF8 | =c/gpadmin
                                 : gpadmin=CTc/gpadmin
  template1 | gpadmin | UTF8 | =c/gpadmin
                                 : gpadmin=CTc/gpadmin
 (4 rows)
 postgres=# \i #Execute sql
 postgres=# copy table name to '/tmp/' with 'csv'; #Quickly export single table data
 postgres=# copy table name from '/tmp/' with 'csv'; #Quick import single table data
 postgres=# \q #Exit the database

5.3 Cluster status

gpstate -e #View the status of mirror
 gpstate -f # Check the status of standby master
 gpstate -s #View the status of the entire GP cluster
 gpstate -i #View GP version
 gpstate --help #help document, you can view more usage of gpstate

So far, the database has been operated. By default, only local databases can be connected. If you need other I, you need to modify the gp_hba.conf file. I will not repeat it here.

 

If you need greenplum to connect hdfs to read files to generate external tables, you need to have a hadoop environment: please refer to Chapter 6.

6. Greenplum's hadoop environment configuration

Note: After installing greenplum, if you need to read HDFS files in an external table, you need to perform this configuration.

  1. (All child nodes) Unzip tar -zxvf hadoop-2.6.0-cdh5.8.                                                                                                              �
  2. (All child nodes) Unzip the hadoop dependency jdk tar -zxvf    to /usr/java/jdk1.7.0_75
  3. (All child nodes) Modify gpadmin user parameters
    vi /home/gpadmin/.bash_profile
    
     Add in the configuration file
     export JAVA_HOME=/usr/java/jdk1.7.0_75
     export CLASSPATH=$JAVA_HOME/lib/
     export HADOOP_HOME=/home/hadoop/yarn/hadoop-2.6.0-cdh5.8.0
     PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
     export PATH

     

  4. (Only executed on the master node) Configure hadoop version information and path information
    gpconfig -c gp_hadoop_target_version -v "cdh5"
    gpconfig -c gp_hadoop_home -v "/home/hadoop/yarn/hadoop-2.6.0-cdh5.8.0"
    

     

  5. Restart gp