Monday, August 8, 2011

Catalyst 6509-E VSS Software Upgrade Gone Bad

My work network has a pair of Cisco Catalyst 6509-E chassis that are configured in a Virtual Switching System (VSS) to serve as the network core.  Last week we had a supervisor engine crash and were having some residual craziness with our CAM table.  TAC suggested a reboot and software upgrade so we scheduled one for Sunday afternoon.  

Usually a software upgrade on the 6509 is relatively painless, but this time it proved to be very painful.  The previous software load on the VSS pair was 12.2(33)SXI, but it was the modular version (keep this in mind it's important).  The new software load suggested by TAC was 12.2(33)SXJ1 which as of SXJ is only offered in monolithic versions.

Assuming that all was well with these two versions, I started down the path of doing an enhanced Fast Software Upgrade (eFSU) of my VSS pair using the ISSU commands as listed in the Catalyst 6500 Release 12.2SX Software Configuration Guide - Virtual Switching Systems (VSS) on Cisco's website.  After issuing issue loadversion disk0:s72033-ipservicesk9_wan-mz.122-33.SXJ1.bin on the active console, I waited for the standby chassis to reload.  Unfortunately it entered a reboot loop because the new software was not compatible for ISSU.  Here is where it got hairy.  At this point I could neither abort nor complete the upgrade on the active supervisor.  It wouldn't let me change the boot system variable because it had a state somewhere that said it was in an ISSU upgrade even after power cycling the chassis.  

After 5 hours on the phone with TAC, we were able to clear this persistence and finish the upgrade, but it was a very long downtime.  The moral of the story... modular and monolithic IOS don't mix well.

4 comments:

  1. Ouch! How did you finally clear the persistent state?

    ReplyDelete
  2. We ended up using a spare SUP engine, but I think that if we had done delete nvram:persistent-data we would have been good too. My TAC engineer didn't seem to have a deep knowledge of issu or VSS so it took more effort than it probably should have.

    ReplyDelete
  3. The issue you described has been seen before when trying to move between modular and non modular versions of code using ISSU (see below).

    Unfortunately, neither efSU nor fsu are supported when moving between modular and non modular code. You will need to do the more traditional method of upgrading, by setting the boot statement, and reloading the VSS.There is going to be down time while both switches reload.

    This is outlined here. We have it in the Bug Tool Kit (although technically not a bug) for documentation purposes.

    http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsb07831

    As a FYI-- ISSU is also not supported between crypto and non crypto images (or vice versa), or versions of code with different feature sets (e.g ipbase --> entservices).

    ISSU allows you to still have the supervisors in SSO, even though there is an [expected] code mismatch during the upgrade, which is what allows miminmal impact during ISSU. This different than in the past, where you would be in RPR mode with mismatched versions of code. However, there are specific restrictions that must be followed for it to be successful in the end.

    It doesn't look like the modular/monolithic restriction is clearly stated in the 'eFSU Restrictions and Guidlnes' section of the configuration guide, so I've put in a request to see if we can get that added in.

    David Kosich
    Cisco TAC-LAN Switching

    ReplyDelete
  4. David,

    Thanks for the information. I appreciate the effort to clarify the documentation because if it was there it definitely needed a big bomb next to it or something. It would seem to me that the IOS ISSU process should be able to check compatability before rebooting the standby processor similar to how the 3750 switches check the IOS when you use the archive download-sw command to make sure it's a valid image for the hardware.

    ReplyDelete