Xsan with Active Storage acting up

huntson's picture

As if almost on cue, my Active Storage system has started acting up. I've rebuild my SAN once already and still the following occurs:

On occasion all R/W activity freezes or read activitiy in FCP stutters and every few seconds the playhead will freeze. Turn everything off and on and it works for a few minutes but the problem resurfaces. I've replaced all cables and SFPs and have been able to replicate the issue on multiple clients so I can isolate it to the Active Storage system. No errors in the console anywhere appear to offer any clues.

Any ideas of where/what I should be looking for?

huntson's picture

I've narrowed it down to when two systems access the Active Storage volume even with the smallest amount of data - that's when things go to hell.

JSamuel's picture

huntson wrote:
I've narrowed it down to when two systems access the Active Storage volume even with the smallest amount of data - that's when things go to hell./quote

Have you checked the status of the Active units via Active Viewer? You say checked cabling/SFPs, is this to the Active units aswell - what does the switch have to say?

Joel Samuel.
/thirtytwo - Consultancy & Direction
Proud sponsor of Xsanity.com

All contributions are my own personal opinions - not those of any entity I represent.

abstractrude's picture

Yeah I would narrow this down using the split half approach.
No, i'm not being a jerk. This is the way to go. For example, remove controllers one by one. swap cables etc.


-Trevor Carlson

huntson's picture

No joke indeed. I wound up doing just that. I tried each cable separately. The only thing I didn't do was pull an active controller but I figure since I tried each cable that should eliminate the controllers as the controller is only active when there is a cable plugged in.

After doing some more testing it appeared to have nothing really to do with multiple systems accessing the enclosure - just any type of raid.

I wound up destroying the volume and then trying to mount it locally in a raid set. It worked perfectly there. After putting it back together and incorporating it in the SAN everything was initially fine but after I renamed it I started to notice the same behavior. I'm going to reformat the whole thing and then bring it back into the SAN.

huntson's picture

Curious, on the build one array is lagging behind almost 20% and it was all started at the same time - does this sound correct?

raji's picture

After you renamed it? Or, sometime after you moved it from local back to the fabric? That sounds really curious - as in, filesystem, DNS, or zoning. You're going to have to do more specific troubleshooting. If you couldn't reproduce the issue when you direct connect fibre, that would point to the bits in between.

Are you running the last released firmware? What was the last publicly released FW? 1.28?

Enable the stats in the cli and take a sample when the problem occurs and when it doesn't. Post results here.

xsanguy's picture

Bad/marginal drive?

Sirsloth's picture

Bad / marginal Drive

We had a similar problem where fcp playback would stutter/freeze and general issues with access. Replacing a hdd with reported bad sectors on the active sorted the issue for us. Probably the array with the slow raid rebuild contains a drive that is faulty or marginal.

Hope this helps.