Has compressing the information sent via Outernet satellites been considered? Will it be standard compression algorithms, something custom developed for Outernet, or is there some reason I’m not thinking of that compression wouldn’t be good for this?
I’m not sure what algorithm is being used, but Outernet stream is compressed and encrypted.
And the prepared internet-originated/located files for broadcast compressed by zip at start. You can see them here:
Compression used for archiving is normally just DEFLATE, so it isn’t a serious gain in space. This is done for the purposes of bundling files together rather than compression, and since files are streamed directly from zip files (not unpacked to disk), we opted for a less CPU-intensive compression method.
thank you for clarification this details. If I know well, there are separated program(s) to extract and indexing the internet contents, scheduled for outernet broadcast stream. But I dont know how this process working. I see only in the downloaded wikipedia articles, they dont have links, pointed outside.
If I understand you correctly, you are referring to ONDD, which decrypts and extracts file from the stream, and Librarian, which is the end-user interface to these files.
For now, we are stripping content of unnecessary markup (sidebars, headers, footers, ads, etc) and stripping external links. Each piece of content is given an ID, which is an MD5 hash of the URL, and some metadata (language, license, title, etc). This is zipped up and sent down the pipe. Once received, the metadata portion is added to the database, so user can search for content, and the zip file is stored in a separate folder. This is done by Librarian.
thank you very much for the explanations.
It is valuable tou nderstand, how the all parts of the system working together.