2017年6月18日 星期日

megaraid 操作 -- 於 Acer Altos R360 f3 伺服器

此篇本是為 Acer 的伺服器而整理,到後來變成 megaraid 的操作整理,就把標題給改了。
-- 2020-03-07 記

新購一部 Acer Altos R360 f3 伺服器 (2016 年),整理一下相關資料,以供日後參考。
機器原本是裝 vSphere ESXi,最近想玩 Docker,就重灌 Gentoo Linux。選擇 Gentoo,因為它是滾動發行,沒有升級的問題。另外,因為要自己編 kernel,必須先收集相關的硬體資訊,不然,就無法正常開機了。

Acer Altos R360 f3 的配備

CPU Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz (family: 0x6, model: 0x4f, stepping: 0x1)
8 x 2.5" disks
Intel Integrated RAID Module RMS3CC080
Intel I350 Gigabit Network Connection

嗯,相對於買 HP 或 ASUS 的伺服器,買宏碁的伺服器的好處是,8個磁碟的 tray 都附上,可以買自己想買的磁碟,不用多花錢。

可惜透過中信局買的 Acer 伺服器,只能用 2.5" 的硬碟。而 後來買的 ASUS 伺服器則採比較聰明的作法,在 3.5" 的硬碟匣 (tray) 上的打洞,可同時裝 2.5" 的硬碟,所以後來都買 ASUS 伺服器,雖然要再另外買硬碟匣,但還是比較划算。

lspci 顯示的硬體資料

01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS-3 3108 [Invader] (rev 02)
03:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
07:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05)

dmesg 摘要

megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
megasas: 06.811.02.00-rc1
megaraid_sas 0000:01:00.0: FW now in Ready state
scsi host10: Avago SAS based MegaRAID driver
scsi 10:2:0:0: Direct-Access     Intel    RMS3CC080        4.66 PQ: 0 ANSI: 5
random: fast init done
sd 10:2:0:0: [sda] 1167966208 512-byte logical blocks: (598 GB/557 GiB)

igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
igb 0000:03:00.0: added PHC on eth0

将 megaraid 卡磁盘 raid0 改为 JBOD 模式

相關說明,參考: 将 megaraid 卡磁盘 raid0 改为 JBOD 模式
megaraid 卡使用 JBOD 模式,磁盘可以直接被系统识别,使用 smartctl 查看 SMART 信息和 直连 SAS 卡一样。
# megacli -PDList -aALL -Nolog|grep '^Firm'
Firmware state: JBOD
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

行有時候要確定那個硬碟壞了,用 grep 的 OR 條件來找,比較快。

# megacli -PDList -aALL -Nolog | grep '^Firm\|Slot'
Slot Number: 0
Firmware state: JBOD
Slot Number: 3
Firmware state: Online, Spun Up
Slot Number: 4
Firmware state: Online, Spun Up
Slot Number: 6
Firmware state: Failed
Slot Number: 7
Firmware state: Online, Spun Up
Slot Number: 8
Firmware state: Online, Spun Up
Slot Number: 9
Firmware state: Online, Spun Up
Slot Number: 10
Firmware state: Online, Spun Up
Slot Number: 11
Firmware state: Online, Spun Up

硬碟使用紀錄

為了省錢,使用4顆 HGST 的 SATA 1TB 硬碟,建立 RAID 6 。2017年 6月,11月中,壞掉一顆,繼續觀察中。
後來,試著把它重插回去,看起來又是正常的。下指令,結果如下

# megacli -PDList -aALL -Nolog|grep '^Firm'
Firmware state: JBOD
Firmware state: Online, Spun Up
Firmware state: Rebuild
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

查看硬碟的編號如下
# megacli -PDList -aALL -Nolog|grep -iE 'enclosure device id|slot Number'
Enclosure Device ID: 0
Slot Number: 0
Enclosure Device ID: 0
Slot Number: 3
Enclosure Device ID: 0
Slot Number: 4
Enclosure Device ID: 0
Slot Number: 6

RAID 6 壞掉一顆硬碟,查詢的狀態
# megacli -ldinfo -lALL -aALL
                                     
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name                :VD_24TB
RAID Level          : Primary-6, Secondary-0, RAID Level Qualifier-3
Size                : 21.830 TB
Sector Size         : 512
Is VD emulated      : Yes
Parity Size         : 10.915 TB
State               : Partially Degraded
Strip Size          : 256 KB
Number Of Drives    : 6
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Bad Blocks Exist: No
PI type: No PI

Is VD Cached: No

Exit Code: 0x00

下指令察看 rebuild 進度
# watch -n 30 'megacli -PDRbld -ShowProg -PhysDrv [0:4] -aALL'

Every 30.0s: megacli -PDRbld -ShowProg -PhysDrv [0:...  Mon Mar  2 10:08:42 2020

Rebuild Progress on Device at Enclosure 0, Slot 4 Completed 6% in 1 Minutes.

Exit Code: 0x00

讓 LED 閃爍 (LED blink)

Syntax: megacli -PdLocate <-start|-stop> -physdrv[<enclosure#>:<disk#>] -a<adapter#>

In this example we will locate disk 0 on adapter 0:
[root@localhost MegaCli]# megacli -PdLocate -start -physdrv[252:0] -a0
 Adapter: 0: Device at EnclId-252 SlotId-0 — PD Locate Start Command was successfully sent to Firmware
 Exit Code: 0x00
 [root@localhost MegaCli]# 
相關資料及指令,參考 LSI MegaRAID SAS
在 github 上,有原作者寫的 megaclisas-status,可供週期性檢查 disk array 的狀態。

關掉/開啟 BEEP 聲

在 rebuild 時,關掉聲音
# megacli -AdpSetProp AlarmSilence -a0
  Adapter 0: Set alarm to Silenced success.
  Exit Code: 0x00

Enable 警告聲音
# megacli -AdpGetProp AlarmDsply -a0
  Adapter 0: Alarm Status is Enabled
  Exit Code: 0x00

查詢警告聲音的設定
# megacli -AdpGetProp AlarmDsply -a0
  Adapter 0: Alarm Status is Enabled
  Exit Code: 0x00

警告聲的代碼: Controller beep codes

Consistency Check 

不時就會看到硬碟的燈一直在閃,原來它會定時在背後執行 Consistency Check。查看 Consistency Check 目前正在進行的工作
# megacli -LDCC -ShowProg -lall -aall 

To see next scheduled Consistency Check time,預設一個星期執行一次。
# megacli -AdpCcSched -Info -aALL
                                     
Adapter #0

Operation Mode: Concurrent
Execution Delay: 168
Next start time: 03/07/2020, 03:00:00
Current State: Active
Number of iterations: 21
Number of VD completed: 1
Excluded VDs          : None
Exit Code: 0x00
Consistency Check 有兩種 mode, concurrent mode 及 sequencial mode,可以自己更改。Note: After change mode from disable to concurrent/sequencial, the next scheduled CC time will become year 2135, you have to set next scheduled run time again.

也可以更改 Consistency Check delay interval, Consistency Check Rate,亦可 manually start Consistency Check。

取得 Consistency Check Rate
# megacli -AdpGetProp CCRate -aALL
Adapter 0: Check Consistency Rate = 30% ##default 
If you get an error like "Consistency Check suspended on VD. . .", you can resume like so:
# MegaCli64 -LDCC -resume -lall -aall
At the end, if you want check Adapter properties, here is it
# MegaCli64 -AdpAllInfo -aALL

2 則留言:

  1. Hello 請問監控raid卡的軟體如何取得
    我至intel官網下載rmc3cc080模組軟體並安裝後
    可辨識到Esxi server的ip ,但是raid卡 硬碟都是空的

    回覆刪除
    回覆
    1. 不太清楚,看看這個網頁有沒有你要的
      https://www.broadcom.com/support/download-search?dk=megacli

      刪除

網誌存檔