Mender clien not rolling back when there is not internet connection
Description
Environment
Attachments
- 05 Aug 2019, 01:15 PM
- 30 Jul 2019, 09:11 AM
Checklist
Activity
Ole Petter OrhagenSeptember 12, 2019 at 11:38 AM
eystein.maloy.stenbergAugust 5, 2019 at 5:43 PM
Will do, thanks!
Kristian AmlieAugust 5, 2019 at 1:34 PM
@eystein.maloy.stenberg: Can we backlog this one too? This one is pretty easy to hit, and more serious than https://northerntech.atlassian.net/browse/MEN-2643#icft=MEN-2643 I think. Luckily it's not hard to fix if we just tweak the retry count in the algorithm.
Kristian AmlieAugust 5, 2019 at 1:30 PM
An easy fix for this is to change the retry count formula to something which is bounded, and quite low (I haven't looked at the exact numbers, but I'd say definitely not more than 10), which perhaps should be considered for usability reasons as well, since the current formula is a bit suboptimal. It will always retry for at least the entire UpdatePollIntervalSeconds
period, which can be quite long.
For existing users, a workaround for this problem is to ensure that the UpdatePollIntervalSeconds / RetryPollIntervalSeconds
formula doesn't produce a high number.
Kristian AmlieAugust 5, 2019 at 1:26 PM
I managed to reproduce this by starting and update, and then killing the backend components while the client was restarting. The settings I used were:
RetryPollIntervalSeconds = 10
UpdatePollIntervalSeconds = 300
I have uploaded the log.txt
file.
The interesting line is the one that says "State transition loop detected in state update-commit: Forcefully aborting update. The system is likely to be in an inconsistent state after this
". This is a protection mechanism meant to protect against update modules that get stuck in a loop between the ArtifactRollbackReboot
and ArtifactVerifyRollbackReboot
states, so that the client hangs forever. What happens here, however, is that the number of retries is calculated to be quite high, because the formula is UpdatePollIntervalSeconds / RetryPollIntervalSeconds
. And this number of retries blows the limit that the client has set on the number of state transitions. At this point, the client assumes that the update module (rootfs updates are also update modules in this context) is busted, and instead of rolling back, which is the update module's responsibility to perform, it aborts the whole update, including the rollback. After this there is no more update to perform, and hence the client is stuck in the final report procedure forever.
Due to the bootloader integration, the client should still recover if the device is rebooted, but this requires manual intervention by a user.
Details
Assignee
Ole Petter OrhagenOle Petter OrhagenReporter
Adrian RealAdrian RealLabels
Story Points
3Priority
(None)Days in progress
0Backlog
yes
Details
Details
Assignee
Reporter
Labels
Story Points
Priority
Days in progress
Backlog
Zendesk Support
Linked Tickets
Zendesk Support
Linked Tickets
Zendesk Support

Mender client 2.0.0 does not roll back after update when there is not internet connection.
The used configuration is:
$ cat /etc/mender/mender.conf
{
"ClientProtocol": "https",
"RootfsPartA": "/dev/mmcblk0p2",
"RootfsPartB": "/dev/mmcblk0p3",
"UpdatePollIntervalSeconds": 1800,
"InventoryPollIntervalSeconds": 28800,
"RetryPollIntervalSeconds": 120,
"ServerURL": "https://mender-development.XXXXX.com:YYYY",
"ArtifactVerifyKey": "/etc/mender/mender-artifact-verify-key.pem"
}
Mender client log can be found in attahced file.