Table of contents
1. Installation instructions
1.1 Environment Description
1.2 Cluster Introduction
2. Installation environment preparation
2.1 Modify the names of each node
2.1.1 Modify the hosts of the master node
2.1.2 Copy to child nodes
2.1.3 Modify each node /etc/sysconfig/network file in turn
2.2 Modify the system kernel /etc/file
2.3 Modify the number of processes /etc/security//file
2.4 Modify /etc/selinux/config file
2.5 Copy the master node configuration to the child node
2.6 Create gpadmin user (all nodes)
3. Install Greenplum DB
3.1 Install Greenplum DB on the Master node
3.2 Create a configuration cluster hostlist file and open the node
3.2.1 Create a hostlist containing all node host names:
3.2.2 Create a seg_hosts that contains the host names of all Segment Hosts:
3.2.3 Configure ssh password-free connection:
3.3 Install Greenplum DB on the Segment node
4. Initialize the database
4.1 Create a resource directory
4.2 Environment variable configuration
4.2.1 Configuring environment variables on the master node
4.2.2 Then copy to each child node in turn:
4.2.3 Let the environment variable take effect:
4.3 NTP configuration
4.4 Check connectivity before initialization
4.5 Execute initialization
V. Database operations
5.1 Stop and start the cluster
5.2 Log in to the database
5.3 Cluster Status
6. Greenplum's hadoop environment configuration
1. Installation instructions
1.1 Environment Description
name |
Version |
operating system |
CentOS 64bit |
greenplum |
Greenplum-db-5.0.0-rhel6-x86_64.rpm |
1.2Cluster introduction
use1indivualmaster,nindivualsegmentClusters.Example:
196.168.0.1
196.168.0.2
196.168.0.3
196.168.0.4
in196.168.0.1formaster, the rest aresegment。
2. Installation environment preparation
2.1 Modify the names of each node
2.1.1 Modify the hosts of the master node
Note: This is mainly to prepare for Greenplum to communicate with each other in the future.
[root@ gp-master ~]# vi /etc/hosts
127.0.0.1 localhost localhost4 localhost4.localdomain4
::1 localhost localhost6 localhost6.localdomain6
192.168.0.1 gp-master gp-master
192.168.0.2 gp-sdw1 gp-sdw1
192.168.0.3 gp-sdw2 gp-sdw2
192.168.0.4 gp-sdw3 gp-sdw3
2.1.2 Copy to child nodes
After configuring the master node file, copy it to the other child nodes
scp /etc/hosts gp-sdw1:/etc
2.1.3 Modify each node /etc/sysconfig/network file in turn
At the same time, modify the file /etc/sysconfig/network. The file is as follows (the configurations of different nodes are different and cannot be copied, and all machines must be modified)
[root@ gp-master ~]# vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME= gp-maste
The HOSTNAME here must be consistent with the host name in /etc/hosts. Finally, you can use the ping gp-sdw1 node name command to test whether the configuration is complete.
2.2 Modify the master node system kernel/etc/document
(Note: The same configuration is configured first on the main node node, and after the configuration is completed,2.5Copy to other nodes in the section)
vi /etc/
= 4096
= 4000000000
= 250 512000 100 2048
= 1
kernel.core_uses_pid = 1
= 65536
= 65536
= 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net..arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
.netdev_max_backlog = 10000
.rmem_max = 2097152
.wmem_max = 2097152
vm.overcommit_memory = 2 ### The test environment needs to cancel this, otherwise the oracle cannot be enabled ### The value is 1
2.3 Modify the number of master node processes/etc/security//document
(Note: The same configuration is configured first on the main node node, and after the configuration is completed,2.5Copy to other nodes in the section)
vi /etc/security//
* soft nproc 131072
root soft nproc unlimited
2.4 Turn off all node firewalls,Modify the master node/etc/selinux/configdocument
(Note: The same configuration is configured first on the main node node, and after the configuration is completed,2.5Copy to other nodes in the section)
Close the firewall: service iptables stop
Turn off the boot firewall: chkconfig iptables off
Check the firewall status service iptables status
vi /etc/selinux/config
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
2.5 Copy the master node configuration to the child node
Copy to each child node in turn
scp /etc/ gp-sdw1:/etc
scp /etc/security// gp-sdw1:/etc/security/
scp /etc/selinux/config gp-sdw1:/etc/selinux
Finally, let each node configuration take effect
[root@dw-greenplum-1~]# sysctl -p (let the configuration take effect)
2.6 creategpadminUser (all nodes)
groupadd -g 530 gpadmin
useradd -g 530 -u 530 -m -d /home/gpadmin -s /bin/bash gpadmin
chown -R gpadmin:gpadmin /home/gpadmin
echo "gpadmin" | passwd --stdin gpadmin
3. Install Greenplum DB
3.1 inMasterInstall Greenplum DB on the node
The installation package is a rpm format to execute rpm installation command:
rpm -ivh greenplum-db-5.0.0-rhel6-x86_64.rpm
The default installation path is /usr/local, and then you need to modify the gpadmin operation permissions:
chown -R gpadmin:gpadmin /usr/local
3.2 Create a configuration clusterhostlistdocument,Open node
3.2.1 Create a hostlist containing all node host names:
su - gpadmin
mkdir -p /home/gpadmin/conf
vi /home/gpadmin/conf/hostlist
gp-master
gp-sdw1
gp-sdw2
gp-sdw3
3.2.2 Create a seg_hosts , including allSegment HostHostname of:
vi /home/gpadmin/conf/seg_hosts
gp-sdw1
gp-sdw2
gp-sdw3
3.2.3 Configure ssh password-free connection:
[root@ gp-master ~]# su - gpadmin
[gpadmin@ gp-master ~]# source /usr/local/greenplum-db/greenplum_path.sh
[gpadmin@ gp-master ~]# gpssh-exkeys -f /home/gpadmin/conf/hostlist
[STEP 1 of 5] create local ID and authorize on local host
... /home/gpadmin/.ssh/id_rsa file exists ... key generation skipped
[STEP 2 of 5] keyscan all hosts and update known_hosts file
[STEP 3 of 5] authorize current user on remote hosts
... send to gp-sdw1
... send to gp-sdw2
... send to gp-sdw3
#Tip: Here we prompt to enter the user password of each child node
[STEP 4 of 5] determine common authentication file content
[STEP 5 of 5] copy authentication files to all remote hosts
... finished key exchange with gp-sdw1
... finished key exchange with gp-sdw2
... finished key exchange with gp-sdw3
[INFO] completed successfully
Test whether the password-free connection is successful:
[root@ gp-master ~]# ssh gp-sdw1 #Login without a password;
or:
[root@ gp-master ~]# gpssh -f /home/gpadmin/conf/hostlist
=> pwd
[gp-sdw1] /home/gpadmin
[gp-sdw3] /home/gpadmin
[gp-sdw2] /home/gpadmin
[ gp-master] /home/gpadmin
=> exit
The above result is successful.
3.3existSegmentInstall on the nodeGreenplum DB
existeachSubnodes are used to empower folders:
chown -R gpadmin:gpadmin /usr/local
chown -R gpadmin:gpadmin /opt
Pack the installation package on the master node and copy it to each child node:
[gpadmin@mdw conf]$ cd /usr/local/
Pack:
[gpadmin@mdw greenplum]$ tar -cf greenplum-db-5.0.0/
[gpadmin@mdw greenplum]$ gpscp -f /home/gpadmin/conf/seg_hosts =:/usr/local/
OK, if there is no accident, the batch copy is successful. You can go to the corresponding folder of the child node to view it. After that, you need to decompress the tar package. Now we will use batch decompression operation for the child nodes:
[gpadmin@mdw conf]$ source /usr/local/ greenplum-db/greenplum_path.sh
[gpadmin@mdw conf]$ gpssh -f /home/gpadmin/conf/seg_hosts #Unified processing of child nodes
=> cd /usr/local
[sdw3]
[sdw1]
[sdw2]
=> tar -xf
[sdw3]
[sdw1]
[sdw2]
#Create a soft link
=> ln -s ./greenplum-db-5.0.0 greenplum-db
[sdw3]
[sdw1]
[sdw2]
=> ll (you can use ll to check if it has been installed successfully)
=>exit(exit)
This completes the installation of all nodes.
4. Initialize the database
4.1 Create a resource directory
source /usr/local/ greenplum-db/greenplum_path.sh
gpssh -f /home/gpadmin/conf/hostlist #Unified processing of all nodes
#Create the next series of directories /opt/greenplum/data (the number of production directories can be generated according to requirements)
=> mkdir -p /opt/greenplum/data/master
=> mkdir -p /opt/greenplum/data/primary
=> mkdir -p /opt/greenplum/data/mirror
=> mkdir -p /opt/greenplum/data2/primary
=> mkdir -p /opt/greenplum/data2/mirror
4.2Environment variable configuration
4.2.1 Configuring environment variables on the master node
vi /home/gpadmin/.bash_profile added at the end
source /usr/local/greenplum-db/greenplum_path.sh
export MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1
export GPPORT=5432
export PGDATABASE=gp_sydb
4.2.2 Then copy to each child node in turn:
scp /home/gpadmin/.bash_profile gp-sdw1:/home/gpadmin/
。。。
4.2.3 Make environment variables effective:
source .bash_profile
4.3 NTP Configuration
EnablemasterOn the nodentp, and inSegmentConfigure and enable on nodesNTP:
echo "server gp-master perfer" >>/etc/
gpssh -f /home/gpadmin/conf/hostlist -v -e 'sudo ntpd'
gpssh -f /home/gpadmin/conf/hostlist -v -e 'sudo /etc//ntpd start && sudo chkconfig --level 35 ntpd on'
Note: This step is slow to execute, and the time has not been synchronized successfully. I don’t know why. But it does not actually affect the database installation.
4.4 Check connectivity before initialization
Check file reading between nodes;
cd /usr/local/greenplum-db/bin
gpcheckperf -f /home/gpadmin/conf/hostlist -r N -d /tmp
-- NETPERF TEST
-------------------
====================
== RESULT
====================
Netperf bisection bandwidth test
gp-master -> gp-sdw1 = 72.220000
gp-sdw2 -> gp-sdw3 = 21.470000
gp-sdw1 -> gp-master = 43.510000
gp-sdw3 -> gp-sdw2 = 44.200000
Summary:
sum = 181.40 MB/sec
min = 21.47 MB/sec
max = 72.22 MB/sec
avg = 45.35 MB/sec
median = 44.20 MB/sec
The above content appears to prove that each node can be connected.
4.5 Perform initialization
initialization Greenplum The configuration file templates are all here/usr/local/greenplum-db/docs/cli_help/gpconfigsIn the directory,gpinitsystem_configIt is initialization Greenplum template in this template Mirror SegmentAll configurations are commented; create a copy and modify its following configurations:
cd /usr/local/greenplum-db/docs/cli_help/gpconfigs
cp gpinsystem_config initgp_config
vi initgp_config
#The following is the attribute field configuration to be modified in the text
#The resource directory is the resource directory created in Chapter 4.1. The resource directory is configured several times. Each child node has several instances (4-8 recommended, 6 are configured here, and the number of primary and mirror folders corresponds to)
declare -a DATA_DIRECTORY=(/opt/greenplum/data/primary /opt/greenplum/data/primary /opt/greenplum/data/primary /opt/greenplum/data/primary /opt/greenplum/data2/primary /opt/greenplum/data2/primary)
declare -a MIRROR_DATA_DIRECTORY=(/opt/greenplum/data/mirror /opt/greenplum/data/mirror /opt/greenplum/data/mirror /opt/greenplum/data/mirror /opt/greenplum/data2/mirror /opt/greenplum/data2/mirror)
ARRAY_NAME=”gp_sydb” #4.2.1 Initialized database name configured in Chapter 4.2.1
MASTER_HOSTNAME=gp-master #Master node name
MASTER_DIRECTORY=/opt/greenplum/data/master #The resource directory is the resource directory created in Chapter 4.1
MASTER_DATA_DIRECTORY=/opt/greenplum/data/master/gpseg-1 #The same configuration as in Chapter 4.1
DATABASE_NAME=gp_sydb Initialized Database Name configured in Chapter 4.2.1
MACHINE_LIST_FILE=/home/gpadmin/conf/seg_hosts #is the file created in Chapter 3.2.2
Perform initialization;
gpinitsystem -c initgp_config -S
If the initialization fails, the data resource directory under /opt needs to be deleted and re-initialized;
If the initialization is successful, then congratulations on the installation successful.
V. Database operations
5.1 Stop and start the cluster
gpstop -M fast
gpstart -a
5.2 Log in to the database
$ psql -d postgres #Enter a database
postgres=# \l # Query the database
List of databases
Name | Owner | Encoding | Access privileges
-----------+---------+----------+---------------------
gp_sydb | gpadmin | UTF8 |
postgres | gpadmin | UTF8 |
template0 | gpadmin | UTF8 | =c/gpadmin
: gpadmin=CTc/gpadmin
template1 | gpadmin | UTF8 | =c/gpadmin
: gpadmin=CTc/gpadmin
(4 rows)
postgres=# \i #Execute sql
postgres=# copy table name to '/tmp/' with 'csv'; #Quickly export single table data
postgres=# copy table name from '/tmp/' with 'csv'; #Quick import single table data
postgres=# \q #Exit the database
5.3 Cluster status
gpstate -e #View the status of mirror
gpstate -f # Check the status of standby master
gpstate -s #View the status of the entire GP cluster
gpstate -i #View GP version
gpstate --help #help document, you can view more usage of gpstate
So far, the database has been operated. By default, only local databases can be connected. If you need other I, you need to modify the gp_hba.conf file. I will not repeat it here.
If you need greenplum to connect hdfs to read files to generate external tables, you need to have a hadoop environment: please refer to Chapter 6.
6. Greenplum's hadoop environment configuration
Note: After installing greenplum, if you need to read HDFS files in an external table, you need to perform this configuration.
- (All child nodes) Unzip tar -zxvf hadoop-2.6.0-cdh5.8. �
- (All child nodes) Unzip the hadoop dependency jdk tar -zxvf to /usr/java/jdk1.7.0_75
- (All child nodes) Modify gpadmin user parameters
vi /home/gpadmin/.bash_profile Add in the configuration file export JAVA_HOME=/usr/java/jdk1.7.0_75 export CLASSPATH=$JAVA_HOME/lib/ export HADOOP_HOME=/home/hadoop/yarn/hadoop-2.6.0-cdh5.8.0 PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export PATH
- (Only executed on the master node) Configure hadoop version information and path information
gpconfig -c gp_hadoop_target_version -v "cdh5" gpconfig -c gp_hadoop_home -v "/home/hadoop/yarn/hadoop-2.6.0-cdh5.8.0"
- Restart gp