What to expect when the Skylark OTA hits


#1

I will be sending down an OTA (our first!) sometime next week, barring more unforeseen holdups.

This is what will happen when the OTA is received:

  1. You will start receiveing a file named skyark-chip-.[some release tag].to.[some newer release tag].psop.tbz2
  2. Currently the OTA looks like its going to be about 300KB in size - this amounts to about half hour receive time.
  3. When fully received, the OTA will automatically be processed - this happens in multiple stages
  4. All-in, the entire processing takes about 5-7 minutes.
  5. Once the OTA has been successfully processed, your CHIP will reboot. Unfortunately, there is no UI indicator yet for this.
  6. Reboot should take the normal time - 30-40 seconds. All your config done from the UI should be retained - beam selection, network config, etc. If you changed passwords from the command line, changed user names, groups etc - those will be reset to stock. If you made extensive modifications manually to rxos_config - those changes will stay, but might interfere with normal operation.
  7. Thats it!
  8. After reboot, the watermark on the bottom right of the skylark desktop will have been updated. You should also be able to see the updated release info in the log viewer.

So nothing needs to be done from the users end.

That said - some of the users have been having power issues. In those cases, its possible that the reboot might not work. That is obviously a pain, but it will not interfere with the update. Just manually starting it up should allow you to use it as normal.

There is a window of ~15 seconds about 5 minutes after the OTA has been received (and about 1 minute before the forced reboot) where interrupting the OTA (say by rebooting the CHIP manually, or due to a power outage) might cause bootup issues. I am working to minimize this window.

(I will make a seperate post about the features to expect from the OTA - this is a general informational about the process itself)
Questions?


#2

Also, please post the satellite upload time in UTC and date so we know it actually happened and can start looking at the changes (or need to manually reboot). Ken


#3

Can the OTA only replace files overwriting any user modifications?
Or can it also include a script that will modify files or execute other commands?
(you write that user and password mods will be overwritten, while I expect that an update, as usual
in Linux, will only add or maybe delete user entries but will not modify established passwords because
those user mods are done using adduser/deluser and addgroup/delgroup commands run from a
postinstall script)


#4

ota can do various thing, including run scripts.

BUT - there is a reason that there is no UI component for changing password etc yet. All that function is being worked upon.

The Skylark system isn’t a generic Linux distribution prepared to handle all sorts of user mods. While the system is open to such mods, by design, there is no way OTAs can be written to adapt to each such mod.

So those expectations are unrealistic.


#5

I don’t understand it at all. I am not asking for a UI component to change password, but for
an OTA update that does not overwrite passwords. Users, Groups and Passwords live in /etc/passwd,
/etc/group, /etc/shadow which are a symlink to files in /mnt/conf/etc/ and are plain textfiles that can be
updated using the above mentioned tools but also using sed or whatever text processing tool callable
from a script. You can do updates to these files without overwriting user mods to other lines in the files.
When other config files are kept across the update, then why not these?

Rolling out systems with default passwords already is becoming a big issue these days, but resetting
already “hardened” systems back to default passwords by an OTA is probably worse.


#6

Any chance of having the update push but not automatically reboot? Maybe not this OTA update, but the next ones? I personally would prefer a prompt when you login to the GUI. Something like “An update is available, install now?” so people can choose when to patch and reboot. Alternatively have a configuration option that is on by default “Auto update” but people that don’t wish to automatically update and reboot can turn the option off and then install manually when ready.


#7

Edit: oh just realized you are only talking about reboot.

Yeah - possibly. Though again, the reboot takes less than a minute. and the system is in a transitory state after the OTA is processed and before the reboot - so delaying it is a bit of a problem. But yes, I can see a need for preventing auto reboot in certain circumstances.

Also - the biggest issue is that the chances a user is actively using the UI while the OTA is processed are very low.


#8

I’ve been on both sides of this type of situation so I completely understand the reasons to push OTA and auto-install with reboot. I was thinking more about “advanced” users that want to manually trigger the update when it is ready since they may be testing something. In my case, I’m testing how long the CHIP can stay online without reboot :slight_smile:

My original suggestion was something simple similar to how my antivirus software is currently repeatedly showing this on my screen:

This would allow reboots to trigger when the user is actually nearby to deal with any issues, like the system not coming back after reboot, or in my case pulling logs and other information before the reboot to record the status of the system for my testing.

Some people get grumpy when their system is updated and rebooted when they are not around and they come back to find it broken with no idea why. Since it is likely 99% of the people using Outernet are on this forum regularly I doubt it will be an issue now but could be in the future.

Whatever you decide, I’m looking forward to the next version!


#9

I understand, and I agree in those specific cases - those are good examples. The thing is - they are both problems in their own right - the power on issue (which is a h/w power draw problem, we are figuring a way out of it), and the logs being lost problem (which is high on my agenda to fix - I don’t want to lose the logs either). I’d rather not add OTA not being completed as another problem, as a work around for problems that I should anyway be fixing :smiley:

The problem in this specific case is that users aren’t around very much. In fact in a test rollout last year, the scenario was that an “admin” might log into the boxes maybe once a month. This is going to be true for most non-technical users. Its important that the device self-manage.

OS updates like on windows etc - they have the advantage that they are on bidirectional links (which helps with correcting for missing updates), and they are not in a resource limited system (which helps with being able to keep and manage multiple update sets). They can also constantly nag users. In the early windows versions, updates weren’t this insistent, and we know how few people updated their WinXP systems.

My users can very easily ignore the nags as they are at the end of a remote browser sessions - just close the browser tab - an OS alert you can’t ignore very long.

Then, due to resource limits, I cannot allow a situation where a second OTA hits before the previous one was completed via a reboot. Also, not everything can be maintained in two sets - its the base reality of an embedded system like this. So the transitory period before the reboot cannot be avoided - it then becomes critical to minimize that period.

Now - I do want to send an alert to the UI in case a user is actively using it. “click to delay the reboot by 30 minutes. If you do nothing, system will reboot in 2 minutes” - sth like that.

But the problem with that is, right now there is no infrastructure to push any alerts from the backend to the UI. Everything is client-initiated - like all http servers. Its not impossible to add, but its definitely a complex addition.

Its also something I eventually want to add - cause then I can do “new file received” kinda alerts. But its a longer haul process. Right now I am at the stage of making sure the “must have” requirements are all done, then I can move to “nice to have”.


#10

Maybe I can help with this. It is something I’ve done before on embedded systems with only a simple HTTP server on them.


#11

Or when they find that their system has reset the passwords to defaults allowing anyone with knowledge
of those passwords to login and modify the system configuration or examine further secrets like the
WiFi password…


#12

Maybe we allow the development process to run its course without trying to make the admittedly “in progress” application a production environment with all these restrictions and caveats. It’s not ready yet. Use it at your risk.


#13

I have a quick fix to the lost logs issue. The automatic file cleanup script runs once every hour or so doesn’t it? Just add a line to append the log to a copy in /mnt/download. You would maybe have to trim the log in /mnt/download once in a while just to keep the size down.

-C


#14

Resetting the password to the stock password as a side effect of an automatic OTA update is a really bad idea.


#15

And unneccessary too, as far as I understand from the thread. This should always be avoided and
the only excuse I could think of is that the OTA would only be able to replace files and an update to
the passwd and shadow files is required. But that does not appear to be the case.


#16
  1. This is an in development system.
  2. The upcoming one is the first ever OTA.
  3. Changing passwords on Skylark systems is unsupported. Thats why there is no UI component for it (yet).
  4. Its hardly any work to change them back again.
  5. Lets not make a big deal out of a non issue - this system is not a router etc - that needs be exposed over the network
  6. If you are using it in AP mode, the fact that someone needs to be close by to hack it and needs your wifi password to hack protects it. Also - in that particular case, there is no use hacking it except for vandalism. Its not like it can be used as a botnet zombie.
  7. If its in wifi client mode, the only way it can be accessed is if you open a port to it from outside your LAN. If you are already letting untrustable people onto your LAN so that they can hack Skylark, you have bigger problems than Skylark password to contend with.

It has been very clear that while user can mod the system to their hearts content - at that point it becomes your responsibility.

Either way - the whole password issue is a pointless argument. And a result of cargo cult thinking about “security”. There is no “security” impact at all - its a bogey man in this scenario. Such thinking is what prevents actual security discussions from happening.

Yes, I am a bit irritated. The discussion about the OTA has been unnecessarily hijacked on a total non-issue. No amount of explanation seems to get thru.


#17

if you look at recent commits - I have already implemented support for such user scripts. :wink: I will post the details post-OTA.


#18

absolutely!

take at look at the repos. I already have a websocket server in there which I intend to use for this purpose. But let me know what you are thinking.

Easiest/Quickest, implementation wise, is polling - its just - if I can come up with a simple alternative, I’d like to avoid polling. Though we are in a RAM constrained and CPU constrained system, so if the alternative is resource-expensive, i’d just take the polling :smiley:


#19

I’d just rather move the logs themselves out of the ramfs completely. The logs-in-ramfs is an old artifact, and it is not sensible. It wastes already-constrained RAM, and results in loss of logs. Moving logs to persistent on an hourly basis is fine if the system is working mostly allright, but persistent logs are vastly more useful , say if something bad happens to the system at bootup. So making sure logs are always persisted is important.

I did think about persisting them this way, periodically, but it seemed like a half-way solution. I am simply going to move them out of ramfs completely.


#20

That is because you are completely missing the point.
The security of Skylark is completely ridiculous. There is a fixed password that is not supposed to
be changed, then why have it at all? Remove the password dialog if that only confuses the issue.
At least in Librarian there was a resonable concept: allow everyone in to browse the data but require
a password for system administration, and allow it to be set by the user. That was just fine.
When you don’t want to spend effort on security, just provide some way to remove the setup tasks
from the Web page (Tuner settings, Network settings) using a small script or by editing a file in the shell.
Then the casual user does not have to worry about it, and those who want to give access to others
but do not want them to read the WiFi password and disturb the tuner settings can do so. It is easy
to restrict access to the shell to a smaller audience than access to the web interface, using a firewall.