Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOSDriver should catch "Rollback aborted" case when using configure replace #1701

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

rbcollins123
Copy link
Contributor

@rbcollins123 rbcollins123 commented Jul 12, 2022

Invalid candidate configs may silently fail installation currently because IOS-XE may attempt to rollback upon error but the rollback itself may be invalid.

Here is an example output from a Cat9300 switch on IOS-XE v17.3.3 when the intended config file attempts to make impossible changes to the running-config and reversing those changes are also impossible.

ncl-sw-03#configure replace flash:rendered.cfg list force revert trigger error timer 60

Rollback Confirmed Change: Backing up current running config to flash:ncl-sw-03-Jul-12-23-02-40.132-UTC-25

!Pass 1
!List of Rollback Commands:
interface GigabitEthernet0/0
 no negotiation auto
interface GigabitEthernet0/0
 shutdown
  negotiation auto
end


!Pass 2
!List of Rollback Commands:
interface GigabitEthernet0/0
 no negotiation auto
interface GigabitEthernet0/0
 shutdown
  negotiation auto
end


!Pass 3
!List of Rollback Commands:
interface GigabitEthernet0/0
 no negotiation auto
interface GigabitEthernet0/0
 shutdown
  negotiation auto
end


!Pass 4
!List of Rollback Commands:
interface GigabitEthernet0/0
 no negotiation auto
interface GigabitEthernet0/0
 shutdown
  negotiation auto
end


!Pass 5
!List of Rollback Commands:
interface GigabitEthernet0/0
 no negotiation auto
interface GigabitEthernet0/0
 shutdown
  negotiation auto
end


The rollback configlet from the last pass is listed below:
********
!List of Rollback Commands:
interface GigabitEthernet0/0
 no negotiation auto
interface GigabitEthernet0/0
 shutdown
  negotiation auto
end
********


Rollback aborted after 5 passes

The above should generate an exception within IOSDriver.commit_config

@rbcollins123 rbcollins123 marked this pull request as ready for review July 12, 2022 23:46
@ncsurfus
Copy link

Is anything else needed to merge this?

@ktbyers
Copy link
Contributor

ktbyers commented Aug 25, 2022

Do we need a different process here?

In this case applying the candidate config failed and the on-box rollback failed.

I think we should automatically invoke a NAPALM rollback in this case as we are now in an invalid state (as far as the config). By invalid I mean it was neither the original configuration nor the intended candidate config.

Thoughts?

@ncsurfus
Copy link

Do we need a different process here?

In this case applying the candidate config failed and the on-box rollback failed.

I think we should automatically invoke a NAPALM rollback in this case as we are now in an invalid state (as far as the config). By invalid I mean it was neither the original configuration nor the intended candidate config.

Thoughts?

Would the rollback process just end up issuing another configure replace, but back to the original configuration? How much different is that compared to what configure replace is doing during its rollback?

@ktbyers
Copy link
Contributor

ktbyers commented Aug 25, 2022

@ncsurfus Yes, it would try to do a configure replace again, but back to original configuration (which has been saved to flash).

I would expect this should work (otherwise, I would say it is an IOS/IOS-XE bug)...since the config that we are rolling back to is the running configuration before the original configure replace started.

We would have to make sure we can't get into a loop (i.e. if the NAPALM rollback also fails, then we raise an exception and tell the user). We would also need to make sure here that if we fail again that we raise a clear message to the end-user.

@ncsurfus
Copy link

I did some manual testing for when this occurs, and yeah running a configure replace bootflash:rollback_config.txt force seemed to work pretty well to get it back to the original state (at least for my scenario).

@rbcollins123
Copy link
Contributor Author

@ktbyers @ncsurfus I pulled the match for this case out into its own block and added a self.rollback() for now. This isn't something I can readily test on a device at the moment, but if you want to drop this iteration of ios.py into your lab and try it @ncsurfus, let me know if it doesn't behave as planned. It should net the same result as what you were doing by hand.

@ncsurfus
Copy link

@rbcollins123 @ktbyers I pulled in these changes and was able to test it. It worked great and I was able to get successful rollbacks across a couple different devices. I also verified with the command below and only got the expected difference.

xxxxxx#show archive config differences flash:rollback_config.txt
!Contextual Config Diffs:
+file prompt quiet

@rbcollins123
Copy link
Contributor Author

rbcollins123 commented Aug 30, 2022 via email

@rbcollins123
Copy link
Contributor Author

Bump! Anything else I can do to help get this merged?

@ncsurfus
Copy link

I'm not sure this CR should be merged. "configure replace" uses the term rollback in a confusing way. "configure replace" was intended to rollback to a previously saved running configuration, not deploy an entirely new configuration. For this reason "configure replace" uses the term rollback to indicate that it's applying the desired configuration. For instance, when "configure replace" is successful, it'll end with "Rollback Done". The terminology is confusing, but kind of makes sense if you understand the original use case.

"Rollback Aborted" also doesn't mean "configure replace" attempted to revert back to the original configuration and failed, but rather it appears to indicate that it's just aborting the deployment. The most common reason, from my testing, is that some of the desired configuration isn't showing up in the running configuration. For instance, if you include "no shutdown" on an interface, then "configure replace" will attempt to apply "no shutdown", and when it doesn't show up in the running configuration after 5 passes, then it "aborts". Some things like SNMPv3 users don't show up in the running configuration and can also cause this output. From my testing, this is generally harmless, but I'm sure there are scenarios where this could break something.

@mirceaulinic
Copy link
Member

Code-wise, this would kinda make sense to me. But I can't claim I understand what rollback aborted actually means, and based on the comment above, it sounds like the message is rather counter-intuitive.
@ktbyers would you please advise?

@ktbyers
Copy link
Contributor

ktbyers commented Mar 26, 2024

@mirceaulinic @rbcollins123 @ncsurfus I think we should leave the current solution and just close this PR (as per the comments from @ncsurfus). Basically, it sounds like we might be creating larger problems by attempting to auto-rollback on this.

I am definitely open to discussions on this, however.

@rbcollins123
Copy link
Contributor Author

@ktbyers I don't care if the napalm-level-rollback happens or not, that was only added at your request. My only concern in raising the PR originally was to ensure an exception is raised when this occurs since the config-replace has failed in this scenario. If the self.rollback() is the only point of contention, we can just revert back to the original commit on the PR to generate the exception and merge that. It would be nice to not have to locally patch our installs to add the exception at some point.

@ktbyers
Copy link
Contributor

ktbyers commented Mar 26, 2024

@rbcollins123 (@ncsurfus) is pointing out that rollback aborted is relatively normal behavior (meaning it is just a dumb string parser and it is looking for certain text in the output and if the text doesn't show up then we will get this rollback aborted). This could easily happen due to default values (like shutdown/no shutdown).

So I would have the same issue with raising an exception (i.e. we would be causing working config files to fail).

@rbcollins123 Was this your test merge config to demonstrate the issue:

********
!List of Rollback Commands:
interface GigabitEthernet0/0
 no negotiation auto
interface GigabitEthernet0/0
 shutdown
  negotiation auto
end
********

If not, can you provide a test case on IOS/IOS-XE?

@rbcollins123
Copy link
Contributor Author

rbcollins123 commented Mar 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants