本人使用linux heartbeat 的一些经验 作者: 52samba硬件:2台hpe800,一条心跳线用于串口信息的检测,一条交叉线用于UDP协议的通讯。双网卡,一块用于连接交叉线,一块用于连接交换机。 软件:REDHAT 7.3(MANDRAKE 8.2测试也通过),heartbeat-0.4.9.1-1.i386.rpm 其它准备:HPE800(1),主机名:CLUSTER-101-SERVER ,IP地址:192.9.100.101 HPE800(2),主机名:CLUSTER-101-SERVER ,IP地址:192.9.100.101 虚拟主机名:CLUSTER-SERVER ,IP地址:192.9.100.100 从http://www.linux-ha.org 网站下载最新的Heartbeat 软件包,目前的版本为heartbeat-0.4.9.1,分别解压:rpm –ivh heartbeat-0.4.9.1-1.i386.rpm。 首先配置第一台hpe800 ,解压后得到目录/etc/ha.d, 主要配置三个文件/etc/ha.d/ha.cf、/etc/ha.d/haresources、/etc/ha.d/authkeys 我主要配置的http 与 smb的HA集群,三个文件的主要配置如下: /etc/ha.d/ha.cf # If any of debugfile, logfile and logfacility are defined then they # will be used. If debugfile and/or logfile are not defined and # logfacility is defined then the respective logging and debug # messages will be loged to syslog. If logfacility is not defined # then debugfile and logfile will be used to log messges. If # logfacility is not defined and debugfile and/or logfile are not # defined then defaults will be used for debugfile and logfile as # required and messages will be sent there. # # File to wirte debug messages to debugfile /var/log/ha-debug # # # File to write other messages to # logfile /var/log/ha-log # # # Facility to use for syslog()/logger # logfacility local0 # # # keepalive: how many seconds between heartbeats # keepalive 2 # # deadtime: seconds-to-declare-host-dead # deadtime 10 # # # Very first dead time (initdead) # # On some machines/OSes, etc. the network takes a while to come up # and start working right after you've been rebooted. As a result # we have a separate dead time for when things first come up. # It should be at least twice the normal dead time. # initdead 120 # # hopfudge maximum hop count minus number of nodes in config #hopfudge 1 # # serial serialportname ... serial /dev/ttyS0 # # # Baud rate for serial ports... # baud 19200 # # What UDP port to use for communication? # udpport 694 # # What interfaces to heartbeat over? # udp eth1 # # Set up a multicast heartbeat medium # mcast [dev] [mcast group] [port] [ttl] [loop] # # [dev] device to send/rcv heartbeats on # [mcast group] multicast group to join (class D multicast address # 224.0.0.0 - 239.255.255.255) # [port] udp port to sendto/rcvfrom (no real reason to differ # from the port used for broadcast heartbeats) # [ttl] the ttl value for outbound heartbeats. this effects # how far the multicast packet will propagate. (0-255) # [loop] toggles loopback for outbound multicast heartbeats. # if enabled, an outbound packet will be looped back and # received by the interface it was sent on. (0 or 1) # # mcast eth1 225.0.0.1 694 1 1 # # Watchdog is the watchdog timer. If our own heart doesn't beat for # a minute, then our machine will reboot. # watchdog /dev/watchdog # # "Legacy" STONITH support # Using this directive assumes that there is one stonith # device in the cluster. Parameters to this device are # read from a configuration file. The format of this line is: # # stonith <stonith_type> <configfile> #333 # NOTE: it is up to you to maintain this file on each node in the # cluster! # #stonith baytech /etc/ha.d/conf/stonith.baytech # # STONITH support # You can configure multiple stonith devices using this directive. # The format of the line is: # stonith_host <hostfrom> <stonith_type> <params...> # <hostfrom> is the machine the stonith device is attached # to or * to mean it is accessible from any host. # <stonith_type> is the type of stonith device (a list of # supported drives is in /usr/lib/stonith.) # <params...> are driver specific parameters. To see the # format for a particular device, run: # stonith -l -t <stonith_type> # # # Note that if you put your stonith device access information in # here, and you make this file publically readable, you're asking # for a denial of service attack ;-) # # #stonith_host * baytech 10.0.0.3 mylogin mysecretpassword #stonith_host ken3 rps10 /dev/ttyS1 kathy 0 #stonith_host kathy rps10 /dev/ttyS1 ken3 0 # # Tell what machines are in the cluster # node nodename ... -- must match uname -n node cluster-101-server node cluster-102-server
/etc/ha.d/haresources # #just.linux-ha.org 135.9.216.110 # #------------------------------------------------------------------- # # Assuming the adminstrative addresses are on the same subnet... # A little more complex case: One service address, default subnet # and netmask, and you want to start and stop http when you get # the IP address... # #just.linux-ha.org 135.9.216.110 http #------------------------------------------------------------------- # # A little more complex case: Three service addresses, default subnet # and netmask, and you want to start and stop http when you get # the IP address... # #just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd #------------------------------------------------------------------- # # One service address, with funny subnet and bcast addr # Stop and start httpd service with the subnet address # #just.linux-ha.org 135.9.216.3/4/135.9.216.12 httpd # #------------------------------------------------------------------- # # An example where a shared filesystem is to be used. # Note that multiple aguments are passed to this script using # the delimiter '::' to separate each argument. # #node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2
cluster-101-server 192.9.100.100 httpd smb
/etc/ha.d/authkeys # Authentication file. Must be mode 600 # # # Must have exactly one auth directive at the front. # auth send authentication using this method-id # # Then, list the method and key that go with that method-id # # Available methods: crc sha1, md5. Crc doesn't need/want a key. # # You normally only have one authentication method-id listed in this file # # Put more than one to make a smooth transition when changing auth # methods and/or keys. # # # sha1 is believed to be the "best", md5 next best. # # crc adds no security, except from packet corruption. # Use only on physically secure networks. # auth 1 1 crc #2 sha1 HI! #3 md5 Hello!
很重要的一点,一定要去确保两台机器的配置文件一样,包括smb.conf等需要集群的配置文件,如果有共享存储的话还要注意很多问题,具体我没有测试过,也没有相关的硬件设备。 现在可以开始测试,首先关闭两台机器需要集群的服务,因为heartbeat 启动时会自动服务打开(测试的时候会有几秒钟的滞后)。 /etc/rc.d/init.d/httpd stop /etc/rc.d/init.d/smb stop /etc/rc.d/init.d/heartbeat start
ok, 配置已经完成,服务也应该起来,如果没有的话,注意检查/var/log/messages里面的信息。 可以开始测试了: 为了清楚,把web服务器的主文件,/var/www/html/index.html 修改成可以区分两台机器的页面,例如可以把内容改为:cluster-101-server 与cluster-102-server 在别的机器里输入:http://192.9.100.100 (虚拟的地址) 可以看到cluster-101-server ,想办法让cluster-101-server死机,大概3-5秒,可以看到页面变成cluster-102-server,服务成功的转换了,等cluster-101-server服务起来后,页面又切换到 cluster-101-server,几乎没有延时。这样就提高了系统的高可用性。
忘记了一点:一定要修改文件authkeys的属性,否则服务起不来。 chmod 600 /etc/ha.d/authkeys |