Server failure
This server failed with the error "E171F PCIE Fatal Err B0 D3 F0" during power up. Upon inspection it was found that the 5/I PCIe SAS Controller card model no UCS-51 had a swollen electrolytic capacitor. This was replaced with one from an old PC motherboard, it had the same value (1500uF) and voltage rating (6.3v). After replacement the server booted and completed all it POST and began booting the OS. The server was soak tested and appeared to be operating fine. Hopefully saved an engineer call-out that probably would cost ~£500. By the way the machine was out of warranty. Below are pictures of the board and the repair.
Swollen Capacitor
Capacitor Repair
Supplemental Info
A further 5 Dell servers came up with the same error and these were fixed by replacing the capacitors, bad batch of electrolytics.
A place to where I can record my Technical stuff that should be accessible from anywhere. It is mainly technical stuff I find interesting and I suppose it can be looked on as a modern Lab-book.
Wednesday, 2 February 2011
Tuesday, 25 January 2011
Sheevaplug Shenanigans
I was loaned this particular Sheevaplug V1.3 with ESATA by Alex Voss, many thanks Alex. I plan to install a large hard drive and use this to boot the Sheeva plug from and install standard Debian rather than the Ubuntu it is supplied with. It seems to be relatively easy but requires a custom Kernel.
Friday, 10 December 2010
Rocks Cluster Config
Shutdown Cluster
/opt/rocks/sbin/cluster-fork shutdown
or
/opt/rocks/sbin/cluster-fork poweroff (if kernel and bios agree)
Compute node removal
rocks remove host compute-0-14
insert-ethers –-remove=compute-0-14
insert-ethers –-update
rocks sync config
Add/remove Nodes
Remove node with
rocks remove host compute-0-14
followed by
rocks sync config
and then run
insert-ethers --cabinet=0 --rank=14
and then pxe boot it?
Watch the /var/log/daemon log file for DHCPREQUEST from the MAC
address of that node. Once you see request and offer of the IP address
instert-ethers should show that it found new node. Then see if you are
seeing anything in /var/log/httpd/ssl_request_log from that IP
address. Fresh node should ask for a kickstart.cgi
Check for dhcp requests etc
tail -f /var/log/messages
Check Kickstart file is being correctly generated
rocks list host profile compute-0-0 > /tmp/ks.cfg
Check you can download kickstart file
wget --no-check-certificate https://localhost/install/sbin/public/kickstart.cgi
Sync Config
rocks sync config
Set node to be OS rescued or reinstalled
rocks set host pxeboot compute-x-y action=rescue/install
List all hosts On Cluster
cat /etc/hosts
IP Address for Node
host compute-0-3
New Node Install No IP address received
The new node sometimes doesn't get a new ip address via dhcp during pxe boot. A look in the head nodes messages shows no leases available. To fix this do :-
/etc/init.d/syslog restart
Problem with Ganglia Webpage
/etc/init.d/gmetad restart
/etc/init.d/gmond restart
Reinstall Node Problem
24/6/09
Then we tried to insert it:
insert-ethers --cabinet=0 --rank=14
It still failed at "choose a language".
It didn't show # symbol when.
Kickstart file not loading on compute node.
ls -ld /root
Gives … drwx------ 21 root root 4096 Jun 24 12:01 /root
ls -ld /root/.my.cnf
Gives … -r--r----- 1 root apache 28 Nov 25 2008 /root/.my.cnf
Problem with download of kickstart file was to do with /root permissions.
was fixed with chmod o+r /root and chmod o+x /root
After the above two commands were used root permissions were:-
drwx---r-x 21 root root 4096 Jun 24 12:01 /root
This cured the install problem.
Install id_rsa.pub in Nodes
Now copy id_rsa.pub file from head node to compute node.
scp /root/.ssh/id_rsa.pub root@compute-0-45 root@compute-0-45 ://root/.ssh/linux.pub
Now Login to the compute node.
ssh compute-0-45
Copy contents of linux.pub file and append them to the authorized_keys file.
cat /root/.ssh/linux.pub >> /root/.ssh/authorized_keys
Restart Ganglia
Sometimes the Ganglia web page from the head node shows all nodes as down but they can be sshed into and pinged via the console and seem very much alive!.
service gmond restart
service gmetad restart
Run a command on all nodes
This will run the cat command on all nodes and output the results on the head node and redirect the output to a file. This gives a list of hostnames and MAC addresses in a txt file.
[root@blub~]#cluster-fork cat /etc/sysconfig/network-scripts/ifcfg-eth0:0 | egrep "compute|HWADDR" > HostHWaddr.txt
Debug Commands Installation
Console Use Keystroke
1
Installation
Cntl-Alt-F1
2
Shell prompt
Cntl-Alt-F2
3
Installation log
Cntl-Alt-F3
4
System messages
Cntl-Alt-F4
5
Other messages
Cntl-Alt-F5
/opt/rocks/sbin/cluster-fork shutdown
or
/opt/rocks/sbin/cluster-fork poweroff (if kernel and bios agree)
Compute node removal
rocks remove host compute-0-14
insert-ethers –-remove=compute-0-14
insert-ethers –-update
rocks sync config
Add/remove Nodes
Remove node with
rocks remove host compute-0-14
followed by
rocks sync config
and then run
insert-ethers --cabinet=0 --rank=14
and then pxe boot it?
Watch the /var/log/daemon log file for DHCPREQUEST from the MAC
address of that node. Once you see request and offer of the IP address
instert-ethers should show that it found new node. Then see if you are
seeing anything in /var/log/httpd/ssl_request_log from that IP
address. Fresh node should ask for a kickstart.cgi
Check for dhcp requests etc
tail -f /var/log/messages
Check Kickstart file is being correctly generated
rocks list host profile compute-0-0 > /tmp/ks.cfg
Check you can download kickstart file
wget --no-check-certificate https://localhost/install/sbin/public/kickstart.cgi
Sync Config
rocks sync config
Set node to be OS rescued or reinstalled
rocks set host pxeboot compute-x-y action=rescue/install
List all hosts On Cluster
cat /etc/hosts
IP Address for Node
host compute-0-3
New Node Install No IP address received
The new node sometimes doesn't get a new ip address via dhcp during pxe boot. A look in the head nodes messages shows no leases available. To fix this do :-
/etc/init.d/syslog restart
Problem with Ganglia Webpage
/etc/init.d/gmetad restart
/etc/init.d/gmond restart
Reinstall Node Problem
24/6/09
Then we tried to insert it:
insert-ethers --cabinet=0 --rank=14
It still failed at "choose a language".
It didn't show # symbol when
Kickstart file not loading on compute node.
ls -ld /root
Gives … drwx------ 21 root root 4096 Jun 24 12:01 /root
ls -ld /root/.my.cnf
Gives … -r--r----- 1 root apache 28 Nov 25 2008 /root/.my.cnf
Problem with download of kickstart file was to do with /root permissions.
was fixed with chmod o+r /root and chmod o+x /root
After the above two commands were used root permissions were:-
drwx---r-x 21 root root 4096 Jun 24 12:01 /root
This cured the install problem.
Install id_rsa.pub in Nodes
Now copy id_rsa.pub file from head node to compute node.
scp /root/.ssh/id_rsa.pub root@compute-0-45 root@compute-0-45 ://root/.ssh/linux.pub
Now Login to the compute node.
ssh compute-0-45
Copy contents of linux.pub file and append them to the authorized_keys file.
cat /root/.ssh/linux.pub >> /root/.ssh/authorized_keys
Restart Ganglia
Sometimes the Ganglia web page from the head node shows all nodes as down but they can be sshed into and pinged via the console and seem very much alive!.
service gmond restart
service gmetad restart
Run a command on all nodes
This will run the cat command on all nodes and output the results on the head node and redirect the output to a file. This gives a list of hostnames and MAC addresses in a txt file.
[root@blub~]#cluster-fork cat /etc/sysconfig/network-scripts/ifcfg-eth0:0 | egrep "compute|HWADDR" > HostHWaddr.txt
Debug Commands Installation
Console Use Keystroke
1
Installation
Cntl-Alt-F1
2
Shell prompt
Cntl-Alt-F2
3
Installation log
Cntl-Alt-F3
4
System messages
Cntl-Alt-F4
5
Other messages
Cntl-Alt-F5
Cluster Head Node Overnight Temperature
Using the Temperature sensors on the Motherboard
The command sensors-detect was used to setup the sensors, then the command sensors was used in a script the output of which was then piped to the cut command to extract the wanted data board temperature was redirected to a data file temp.txt along with a comma to delimit the data. Data collection was performed every 5 minutes this was run in an infinite loop overnight. The data file temp.txt was imported into a spreadsheet as a csv comma separated variable file.
The Cheap and nasty Script used
#!/bin/bash
while [ 1 ]
do
temp=`sensors | grep low | grep -v Temp | cut -d\( -f1`
echo $temp >> temp.txt
echo , >> temp.txt
sleep 300
done
Its not the best script, but it does what I wanted it to do IMHO.
....
The command sensors-detect was used to setup the sensors, then the command sensors was used in a script the output of which was then piped to the cut command to extract the wanted data board temperature was redirected to a data file temp.txt along with a comma to delimit the data. Data collection was performed every 5 minutes this was run in an infinite loop overnight. The data file temp.txt was imported into a spreadsheet as a csv comma separated variable file.
The Cheap and nasty Script used
#!/bin/bash
while [ 1 ]
do
temp=`sensors | grep low | grep -v Temp | cut -d\( -f1`
echo $temp >> temp.txt
echo , >> temp.txt
sleep 300
done
Its not the best script, but it does what I wanted it to do IMHO.
....
Wednesday, 8 September 2010
IPTables Example
Head Node IPTables example to open port 1099 and save the rules.
Add rule
iptables -A INPUT -p tcp --dport 1099 -j ACCEPT
Add rule
iptables -A OUTPUT -p tcp --dport 1099 -j ACCEPT
Save Rules in the event of reboot
/sbin/service iptables save
Add rule
iptables -A INPUT -p tcp --dport 1099 -j ACCEPT
Add rule
iptables -A OUTPUT -p tcp --dport 1099 -j ACCEPT
Save Rules in the event of reboot
/sbin/service iptables save
Run a command on all nodes
This will run the cat command on all nodes and output the results on the head node and redirect the output to a file. This gives a list of hostnames and MAC addresses in a txt file.
[root@HOST~]#cluster-fork cat /etc/sysconfig/network-scripts/ifcfg-eth0:0 | egrep "compute|HWADDR" > HostHWaddr.txt
[root@HOST~]#cluster-fork cat /etc/sysconfig/network-scripts/ifcfg-eth0:0 | egrep "compute|HWADDR" > HostHWaddr.txt
Monday, 30 August 2010
SSH Tunnel Example
Tunnel ssh from Local Machine to Remote Machine
and from Remote Machine to a Local Machine on
Remote Machines network.
Local to remote machine with 5900 tunnel
ssh -L 5900:127.0.0.1:5900 -l username -p 22 theactualurl.net
Remote to machine on remote network with 5900 to 443 tunnel
sudo ssh -L 5900:127.0.0.1:443 -l username -p 22 remotemachine.local
This allowed one on the local machine to connect to a webserver using https on the remote machines network. The address on the local machine is https://localhost:5900 or https://127.0.0.1:5900 and the connection is tunnelled through port 5900 but the actual server uses port 443.
and from Remote Machine to a Local Machine on
Remote Machines network.
Local to remote machine with 5900 tunnel
ssh -L 5900:127.0.0.1:5900 -l username -p 22 theactualurl.net
Remote to machine on remote network with 5900 to 443 tunnel
sudo ssh -L 5900:127.0.0.1:443 -l username -p 22 remotemachine.local
This allowed one on the local machine to connect to a webserver using https on the remote machines network. The address on the local machine is https://localhost:5900 or https://127.0.0.1:5900 and the connection is tunnelled through port 5900 but the actual server uses port 443.
Subscribe to:
Posts (Atom)
Defender 300tdi Lucas 10AS Alarm Immobiliser (Spider) Problems
We have a 1997 Landrover Defender 300tdi that has given immobiliser problems intermittently. I had initially fixed the fuel solenoid as we w...

-
Server failure This server failed with the error "E171F PCIE Fatal Err B0 D3 F0" during power up. Upon inspection it was found th...
-
I bricked my MR3220 router with an OpenWRT firmware upload. I did manage to get it back by soldering a 4 pin header on its pcb to allow conn...
-
I make no guarantees for the information contained herein, I also cannot be held responsible for data loss as a result of your actions, we m...