MDC / XSAN update methodology

elliott's picture
Forums: 

New user here!

Recently we've been having some issues with finder crashing when copying files (and also via terminal with both cp and cvcp), and in general our xsan environment has been a little unreliable, ghost volumes, spotlight crashes, random failovers etc.

We are running 10.6.4 and xsan 2.2.1 across all clients and mdc's, so I thought now might be the time to update everything to 10.6.8v1.1 and xsan 2.2.2 (there are added benefits too, the app store on clients and the latest nvidia drivers for our smacs).

I'm also hoping to do a clean reinstall of the mdc's as they been through a lot recently (we have added more storage blocks and built a new san and mirror to replace the old ones and have been renaming volumes, luns etc).

My question is: what would be the correct methodology for flattening and updating the mdc's and then clients?

Initially, I'm thinking of the below:

Unmount all clients and remove as computers from xsan, leaving just the mdc's running and connected,
Failover both volumes to mdc 1,
Remove mdc 2 as a controller and computer,
Disconnect ethernet and fibre and reinstall mdc 2 and update to 10.6.8v1.1, install xsan 2.2.2,
Add as a controller and failover both volumes to mdc 2,
Remove mdc1 as a controller and reinstall and update os and xsan,
Add as a controller,
Then finally, update all clients to 10.6.8v1.1 and xsan 2.2.2 and add them as clients,

The only thing that i'm concerned about here is having (if only for a short time) one mdc at 10.6.4 / 2.2.1 and the other on 10.6.8 / 2.2.2.

I will have a 3rd unused xserve that I could potentially use to host both volumes whilst I'm updating them.

I realise that simply updating the os without flattening them would be much easier, but we would rather clean install them.

Our setup consists of the following:

2 x MDC's - Xsevres running 10.6.4 and Xsan 2.2.1
39 x Clients running 10.6.4 and 2.2.1
192TB active storage SAN
252TB active storage Mirror

Any suggestions would be very much appreciated!

Thanks

singlemalt's picture

Hi Elliot,
I see nothing wrong with your proposed method of updating. Only thing I would add
is to make a backup copy of the /Library/Filesystems/Xsan/config folder on one of the
MDCs and a copy of sudo cvlabel -c> ~/Desktop/lunlabels.txt somewhere. If you have a
copy of the .cfg files, config.plist (which has all your xsan serial numbers
see http://support.apple.com/kb/HT4391) and a cvlabels file you can pretty much resurrect
any xsan volume. If you have those you can even uninstall xsan on everything and still
bring it all back, just as long as somebody doesn't do something stupid like reformat
any luns.

loccoliv's picture

had this exact problem but on 1.4.2. we found that it was a faulty fiber connector in an xraid that was the source of those problems

Starbytes's picture

And what is the correct methodology for updating to Lion?
Our setup is now two mdc's running 10.6.8 and xsan 2.2.2.
We would like to update them to Lion 10.7 with xsan 2.3.

Is it also like below?

- Backup my .cfg files, config.plist and cvlabels
- Unmount all clients and remove as computers from xsan, leaving just the mdc's running and connected,
- Failover both volumes to mdc 1,
- Remove mdc 2 as a controller and computer,
- Disconnect ethernet and fibre update mdc2 to 10.7.5 install xsan 2.3 (i think with Lion it's already there),
- Add as a controller and failover both volumes to mdc 2,
- Remove mdc1 as a controller and update OS,
- Add as a controller,
- Then finally, update all clients to 10.7.5 and add them as clients,

wrstuden's picture

Is http://support.apple.com/kb/HT5285 not helpful?

These instructions do not require you to remove all the clients from the SAN, just stop them.

In general, it is ok to have MDCs at different versions during an upgrade as, well, you have to. :-)

The guide will give recommended practices. If you play fast and loose, the place where things will go [b][=red]BAD/color/b is when a volume goes from being hosted by an updated MDC to being hosted by an older MDC.

[b]DON'T DO THAT/b.

Starbytes's picture

But the http://support.apple.com/kb/HT5285 in particular tells me how to replace my hardware. I don't want that. I already have two intel controllers with 10.6 installed and only want to upgrade to 10.7..

matx's picture

Starbytes wrote:
And what is the correct methodology for updating to Lion?
Our setup is now two mdc's running 10.6.8 and xsan 2.2.2.
We would like to update them to Lion 10.7 with xsan 2.3.

Is it also like below?

- Backup my .cfg files, config.plist and cvlabels
- Unmount all clients and remove as computers from xsan, leaving just the mdc's running and connected,
- Failover both volumes to mdc 1,
- Remove mdc 2 as a controller and computer,
- Disconnect ethernet and fibre update mdc2 to 10.7.5 install xsan 2.3 (i think with Lion it's already there),
- Add as a controller and failover both volumes to mdc 2,
- Remove mdc1 as a controller and update OS,
- Add as a controller,
- Then finally, update all clients to 10.7.5 and add them as clients,/quote

Look at the Migration Guide for general principles

http://manuals.info.apple.com/en_US/Xsan_2_Migration_Guide_3rd_Ed.pdf

Or migrating to 10.8 / Xsan 3 (same general idea with 10.7)

http://help.apple.com/advancedserveradmin/mac/10.8/#apd19EE1DF0-7921-4E9...

I you're over complicating the upgrade procedure.

1. backup all data
2. backup all xsan configs (config folder, cvlabel info)
3. make sure nobody is working on the san volume (unmounting the volume from all clients is a good idea. if they won't unmount, then remove from the SAN).
4. failover volumes to one of two controllers.
5. update the controller that is not hosting the volumes (reboot, check cvadmin, Xsan Admin, that is successful)
6. failover the volumes back to the newly updated controller
7. update the other controller
8. update clients

Starbytes's picture

Ok.. Thanks for the advise..!

Small update:

- We've backupped all data
- Then made a backup of the xsan configs and cvlabel info
- Unmounted all clients
- Made sure all volumes where hosted bij MDC1
- Stopped all volumes

When we tried to install Lion (from a dmg on a USB drive) it stopped after downloading the additionals and tried to start the actual install. The only thing we could do was restart. It restarted and it went straight through to the same initial install process. Failed again, and again we could only restart it. Again went straight to through to the install process and this time finished it!

Hooray! MDC2 was updated to Lion but it started hosting the volumes automatically. No failover or anything. The moment MDC2 was done upgrading and i logged in all volumes automatically failovered to MDC2...
Not a bad thing 'cause it would have been the next step but it was a bit freaky... :)

Keep you guys posted and thanks for the input!

Starbytes's picture

Upgrading MDC1 went ok, no weird restarts or any.
But the moment MDC1 was upgraded all the volumes got an yellow sign saying: "Volume is not running on mdc1".

Why is this? And why are all options greyed out? No stop or start volume or remove from san. Only force fail over....?

brianwells's picture

Starbytes wrote:
But the moment MDC1 was upgraded all the volumes got an yellow sign saying: "Volume is not running on mdc1"./quote

Are you able to mount any volumes on MDC1? I found that Lion's Xsan Admin seemed to behave better if the volumes were mounted.

What do you see if you run the following command on each MDC?

[code]sudo cvadmin -e 'select'/code

This should show what volumes are actually up and running on each server.

Starbytes's picture

Volumes are all up and running, all volumes are mountable.
Failover is also working but still those yellow signs...

When MDC2 is hosting the yellow sign says "volume is not running on MDC1"
and when we failover to MDC1 the signs say "volume is not running on MDC2".
When we shut one of the MDC's down all signs disappear and everything looks ok. Looks like a DNS problem of some sort...

csanchez's picture

Assuming there is actually an fsm process running on both MDCs (run the cvadmin select command to check), the "volume is not running" warning may be the result of servermgrd holding onto some pre-upgrade data. These steps may fix it:

1. Delete the /var/servermgrd/servermgr_xsan.lock file on both MDCs.
2. Reboot both MDCs or restart servermgrd with: sudo killall servermgrd
3. Reconnect in Xsan Admin.

brianwells's picture

The solution from csanchez just showed up in a technical article: http://support.apple.com/kb/TS4481