此篇本是為 Acer 的伺服器而整理,到後來變成 megaraid 的操作整理,就把標題給改了。
-- 2020-03-07 記
新購一部 Acer Altos R360 f3 伺服器 (2016 年),整理一下相關資料,以供日後參考。
機器原本是裝 vSphere ESXi,最近想玩 Docker,就重灌 Gentoo Linux。選擇 Gentoo,因為它是滾動發行,沒有升級的問題。另外,因為要自己編 kernel,必須先收集相關的硬體資訊,不然,就無法正常開機了。
Acer Altos R360 f3 的配備
CPU Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (family: 0x6, model: 0x4f, stepping: 0x1)
8 x 2.5" disks
Intel Integrated RAID Module RMS3CC080
Intel I350 Gigabit Network Connection
嗯,相對於買 HP 或 ASUS 的伺服器,買宏碁的伺服器的好處是,8個磁碟的 tray 都附上,可以買自己想買的磁碟,不用多花錢。
可惜透過中信局買的 Acer 伺服器,只能用 2.5" 的硬碟。而 後來買的 ASUS 伺服器則採比較聰明的作法,在 3.5" 的硬碟匣 (tray) 上的打洞,可同時裝 2.5" 的硬碟,所以後來都買 ASUS 伺服器,雖然要再另外買硬碟匣,但還是比較划算。
lspci 顯示的硬體資料
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)
03:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
07:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05)
dmesg 摘要
megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
megasas: 06.811.02.00-rc1
megaraid_sas 0000:01:00.0: FW now in Ready state
scsi host10: Avago SAS based MegaRAID driver
scsi 10:2:0:0: Direct-Access Intel RMS3CC080 4.66 PQ: 0 ANSI: 5
random: fast init done
sd 10:2:0:0: [sda] 1167966208 512-byte logical blocks: (598 GB/557 GiB)
igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
igb 0000:03:00.0: added PHC on eth0
megaraid 卡使用 JBOD 模式,磁盘可以直接被系统识别,使用 smartctl 查看 SMART 信息和 直连 SAS 卡一样。
# megacli -PDList -aALL -Nolog|grep '^Firm'
Firmware state: JBOD
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
行有時候要確定那個硬碟壞了,用 grep 的 OR 條件來找,比較快。
# megacli -PDList -aALL -Nolog | grep '^Firm\|Slot'
Slot Number: 0
Firmware state: JBOD
Slot Number: 3
Firmware state: Online, Spun Up
Slot Number: 4
Firmware state: Online, Spun Up
Slot Number: 6
Firmware state: Failed
Slot Number: 7
Firmware state: Online, Spun Up
Slot Number: 8
Firmware state: Online, Spun Up
Slot Number: 9
Firmware state: Online, Spun Up
Slot Number: 10
Firmware state: Online, Spun Up
Slot Number: 11
Firmware state: Online, Spun Up
硬碟使用紀錄
為了省錢,使用4顆 HGST 的 SATA 1TB 硬碟,建立 RAID 6 。2017年 6月,11月中,壞掉一顆,繼續觀察中。
後來,試著把它重插回去,看起來又是正常的。下指令,結果如下
# megacli -PDList -aALL -Nolog|grep '^Firm'
Firmware state: JBOD
Firmware state: Online, Spun Up
Firmware state: Rebuild
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
查看硬碟的編號如下
# megacli -PDList -aALL -Nolog|grep -iE 'enclosure device id|slot Number'
Enclosure Device ID: 0
Slot Number: 0
Enclosure Device ID: 0
Slot Number: 3
Enclosure Device ID: 0
Slot Number: 4
Enclosure Device ID: 0
Slot Number: 6
RAID 6 壞掉一顆硬碟,查詢的狀態
# megacli -ldinfo -lALL -aALL
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name :VD_24TB
RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
Size : 21.830 TB
Sector Size : 512
Is VD emulated : Yes
Parity Size : 10.915 TB
State : Partially Degraded
Strip Size : 256 KB
Number Of Drives : 6
Span Depth : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
PI type: No PI
Is VD Cached: No
Exit Code: 0x00
下指令察看 rebuild 進度
# watch -n 30 'megacli -PDRbld -ShowProg -PhysDrv [0:4] -aALL'
Every 30.0s: megacli -PDRbld -ShowProg -PhysDrv [0:... Mon Mar 2 10:08:42 2020
Rebuild Progress on Device at Enclosure 0, Slot 4 Completed 6% in 1 Minutes.
Exit Code: 0x00
讓 LED 閃爍 (LED blink)
Syntax: megacli -PdLocate <-start|-stop> -physdrv[<enclosure#>:<disk#>] -a<adapter#>
In this example we will locate disk 0 on adapter 0:
[root@localhost MegaCli]# megacli -PdLocate -start -physdrv[252:0] -a0
Adapter: 0: Device at EnclId-252 SlotId-0 — PD Locate Start Command was successfully sent to Firmware
Exit Code: 0x00
[root@localhost MegaCli]#
相關資料及指令,參考
LSI MegaRAID SAS。
在 github 上,有原作者寫的 megaclisas-status,可供週期性檢查 disk array 的狀態。
關掉/開啟 BEEP 聲
在 rebuild 時,關掉聲音
# megacli -AdpSetProp AlarmSilence -a0
Adapter 0: Set alarm to Silenced success.
Exit Code: 0x00
Enable 警告聲音
# megacli -AdpGetProp AlarmDsply -a0
Adapter 0: Alarm Status is Enabled
Exit Code: 0x00
查詢警告聲音的設定
# megacli -AdpGetProp AlarmDsply -a0
Adapter 0: Alarm Status is Enabled
Exit Code: 0x00
不時就會看到硬碟的燈一直在閃,原來它會定時在背後執行 Consistency Check。查看 Consistency Check 目前正在進行的工作
# megacli -LDCC -ShowProg -lall -aall
To see next scheduled Consistency Check time,預設一個星期執行一次。
# megacli -AdpCcSched -Info -aALL
Adapter #0
Operation Mode: Concurrent
Execution Delay: 168
Next start time: 03/07/2020, 03:00:00
Current State: Active
Number of iterations: 21
Number of VD completed: 1
Excluded VDs : None
Exit Code: 0x00
Consistency Check 有兩種 mode, concurrent mode 及 sequencial mode,可以自己更改。Note: After change mode from disable to concurrent/sequencial, the next scheduled CC time will become year 2135, you have to set next scheduled run time again.
也可以更改 Consistency Check delay interval, Consistency Check Rate,亦可 manually start Consistency Check。
取得 Consistency Check Rate
# megacli -AdpGetProp CCRate -aALL
Adapter 0: Check Consistency Rate = 30% ##default
If you get an error like "Consistency Check suspended on VD. . .", you can resume like so:
# MegaCli64 -LDCC -resume -lall -aall
At the end, if you want check Adapter properties, here is it
# MegaCli64 -AdpAllInfo -aALL
如何查硬碟序號
如何查硬碟序號
參考 How To Get Disk Serial Number in Megaraid
smartctl -d megaraid,11 -a /dev/sda
硬碟狀態 Unconfigured(bad) 的處理
參考: RAID: REBUILDING A FOREIGN DISK BY HAND
可能的問題,先前用此備用硬碟暫時取代一顆壞掉的硬碟,被寫入 RAID 的註記,後來換新的硬碟,此備用硬碟就留著繼續備用。這次另一顆出問題,用此備用硬碟就出現此問題。
執行步驟如下,將硬碟狀態改成 "good"
# megacli -PDList -aALL -Nolog|grep '^Firm'
[…]
Firmware state: Online, Spun Up
Firmware state: Unconfigured(bad)
Firmware state: Online, Spun Up
[…]
# megacli -PDList -aALL -Nolog
Adapter #0
[…]
Enclosure Device ID: 0
Slot Number: 10
Enclosure position: 1
Device Id: 9
WWN: 5000cca0bbc611a5
Sequence Number: 11
[…]
Logical Sector Size: 512
Physical Sector Size: 4096
Firmware state: Unconfigured(bad)
Device Firmware Level: 0A81
Shield Counter: 0
Successful diagnostics completion on : N/A
[…]
# megacli -PDMakeGood -PhysDrv[0:10] -a0
Adapter: 0: EnclId-0 SlotId-10 state changed to Unconfigured-Good.
Exit Code: 0x00
# megacli -PDList -aALL -Nolog|grep '^Firm'
[…]
Firmware state: Online, Spun Up
Firmware state: Rebuild
Firmware state: Online, Spun Up
[…]
有時候不總是這麼順利,這次設定後只變成 Unconfigured(good),並沒有重新 rebuild,還需要執行一些步驟才行,參考
QRadar: Disk drive is in "Unconfigured (good)" state after replacement and is not being rebuilt automatically
# megacli -PDList -aALL -Nolog|grep '^Firm'
[…]
Firmware state: Online, Spun Up
Firmware state: Unconfigured(bad)
Firmware state: Online, Spun Up
[…]
# megacli -PDList -aALL -Nolog
Adapter #0
[…]
Enclosure Device ID: 0
Slot Number: 6
Enclosure position: 1
[…]
Logical Sector Size: 512
Physical Sector Size: 4096
Firmware state: Unconfigured(bad)
Device Firmware Level: 0A81
Shield Counter: 0
Successful diagnostics completion on : N/A
[…]
# megacli -PDMakeGood -PhysDrv[0:6] -a0
Adapter: 0: EnclId-0 SlotId-6 state changed to Unconfigured-Good.
Exit Code: 0x00
# megacli -PDList -aALL -Nolog|grep '^Firm'
[…]
Firmware state: Online, Spun Up
Firmware state: Unconfigured(good), Spun Up
Firmware state: Online, Spun Up
[…]
# megacli -pdgetmissing -aALL
Adapter 0 - Missing Physical drives
No. Array Row Size Expected
0 0 0 5722624 MB
Exit Code: 0x00
# megacli -PdReplaceMissing -PhysDrv [0:6] -Array0 -row0 -a0
Adapter: 0: Missing PD at Array 0, Row 0 is replaced.
Exit Code: 0x00
# megacli -PDList -aALL -Nolog|grep '^Firm'
[…]
Firmware state: Online, Spun Up
Firmware state: Offline
Firmware state: Online, Spun Up
[…]
# megacli -PDRbld -Start -PhysDrv [0:6] -a0
Started rebuild progress on device(Encl-0 Slot-6)
Exit Code: 0x00
# megacli -PDList -aALL -Nolog|grep '^Firm'
[…]
Firmware state: Online, Spun Up
Firmware state: Rebuild
Firmware state: Online, Spun Up
[…]
執行 replace 之後,仍是 offline,必須再下指令才會 rebuild,但若是已有一個在 rebuild 中,另一個好像會等到它 rebuild 完之後,才會開始。
Rebuild 之後還會繼續嗶嗶叫,說明請參見 MegaRAID controller still beeps after rebuild completes and VD is optimal if Copy Back is enabled.