/opt/rocks/sbin/cluster-fork shutdown
or
/opt/rocks/sbin/cluster-fork poweroff (if kernel and bios agree)
Compute node removal
rocks remove host compute-0-14
insert-ethers –-remove=compute-0-14
insert-ethers –-update
rocks sync config
Add/remove Nodes
Remove node with
rocks remove host compute-0-14
followed by
rocks sync config
and then run
insert-ethers --cabinet=0 --rank=14
and then pxe boot it?
Watch the /var/log/daemon log file for DHCPREQUEST from the MAC
address of that node. Once you see request and offer of the IP address
instert-ethers should show that it found new node. Then see if you are
seeing anything in /var/log/httpd/ssl_request_log from that IP
address. Fresh node should ask for a kickstart.cgi
Check for dhcp requests etc
tail -f /var/log/messages
Check Kickstart file is being correctly generated
rocks list host profile compute-0-0 > /tmp/ks.cfg
Check you can download kickstart file
wget --no-check-certificate https://localhost/install/sbin/public/kickstart.cgi
Sync Config
rocks sync config
Set node to be OS rescued or reinstalled
rocks set host pxeboot compute-x-y action=rescue/install
List all hosts On Cluster
cat /etc/hosts
IP Address for Node
host compute-0-3
New Node Install No IP address received
The new node sometimes doesn't get a new ip address via dhcp during pxe boot. A look in the head nodes messages shows no leases available. To fix this do :-
/etc/init.d/syslog restart
Problem with Ganglia Webpage
/etc/init.d/gmetad restart
/etc/init.d/gmond restart
Reinstall Node Problem
24/6/09
Then we tried to insert it:
insert-ethers --cabinet=0 --rank=14
It still failed at "choose a language".
It didn't show # symbol when
Kickstart file not loading on compute node.
ls -ld /root
Gives … drwx------ 21 root root 4096 Jun 24 12:01 /root
ls -ld /root/.my.cnf
Gives … -r--r----- 1 root apache 28 Nov 25 2008 /root/.my.cnf
Problem with download of kickstart file was to do with /root permissions.
was fixed with chmod o+r /root and chmod o+x /root
After the above two commands were used root permissions were:-
drwx---r-x 21 root root 4096 Jun 24 12:01 /root
This cured the install problem.
Install id_rsa.pub in Nodes
Now copy id_rsa.pub file from head node to compute node.
scp /root/.ssh/id_rsa.pub root@compute-0-45 root@compute-0-45 ://root/.ssh/linux.pub
Now Login to the compute node.
ssh compute-0-45
Copy contents of linux.pub file and append them to the authorized_keys file.
cat /root/.ssh/linux.pub >> /root/.ssh/authorized_keys
Restart Ganglia
Sometimes the Ganglia web page from the head node shows all nodes as down but they can be sshed into and pinged via the console and seem very much alive!.
service gmond restart
service gmetad restart
Run a command on all nodes
This will run the cat command on all nodes and output the results on the head node and redirect the output to a file. This gives a list of hostnames and MAC addresses in a txt file.
[root@blub~]#cluster-fork cat /etc/sysconfig/network-scripts/ifcfg-eth0:0 | egrep "compute|HWADDR" > HostHWaddr.txt
Debug Commands Installation
Console Use Keystroke
1
Installation
Cntl-Alt-F1
2
Shell prompt
Cntl-Alt-F2
3
Installation log
Cntl-Alt-F3
4
System messages
Cntl-Alt-F4
5
Other messages
Cntl-Alt-F5