My weekly zpool scrub came back with this:

  pool: blackhole
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: scrub repaired 0B in 02:01:59 with 0 errors on Tue Jul 11 04:02:09 2023
config:

    NAME                                    STATE     READ WRITE CKSUM
    blackhole                               DEGRADED     0     0     0
      raidz1-0                              DEGRADED     0     0     0
        ata-WDC_WD120EDAZ-11F3RA0_5PG8DYKC  ONLINE       0     0     0
        ata-WDC_WD120EFBX-68B0EN0_5QKJ6M8B  ONLINE       0     0     0
        ata-WDC_WD120EFBX-68B0EN0_5QKJTT8B  FAULTED     51     0     0  too many errors

errors: No known data errors

I only got the drive 6 months ago, well within WD’s 3 year warranty so I opened a support case, but do errors like this basically always mean the drive is its way out or is it possible to have false positives?

  • Admiral Patrick
    link
    fedilink
    English
    101 year ago

    Typically, yes. It could be due to either a flaky SATA cable/connection/controller, so you might try moving it to a different port if you are able, clearing the error, and seeing if it reoccurs.

    Regardless, just make sure you have a good backup of the data or are confident in the other two disks.

    • @[email protected]
      link
      fedilink
      English
      41 year ago

      Change cable or re seat sata connector, clear errors and start a scrub is what I always do.

    • Relic5646!OP
      link
      English
      21 year ago

      Thanks, I will start a backup now. I don’t have any extra automated backups so I guess this is my wake-up call to figure something out.

    • Relic5646!OP
      link
      English
      21 year ago

      I’ve been suuuuper lazy troubleshooting this so it’s been a few weeks, but I talked to WD support, they said to run a full extended S.M.A.R.T. test on the drive, it passed with no issues.

      Reconnected it to my server using a different SATA cable on a different port on the motherboard, with a different power connector. It resilvered with no problems, and a zpool scrub returned no errors this time so hopefully I’m in the clear!

      I have a script that runs once a week that does a scrub then sends the output of zpool status to a Discord channel. When this first started it had read errors (as mentioned in the post), then checksum errors two weeks later. With there being a couple different errors before troubleshooting, and now with no errors after a scrub I’m hoping this means everything’s fine now.