Just wanted to start a new conversation on this. I realize it is a very low priority. Also, I’m not worried about burning through my credits due to testing. I consider them a donation and I get to play with a new feature as a thank you
I had some more time today to play with the Twitter feed. It looks like image files are still being treated as HTML and partially sent. The one I requested first was actually way too large (almost 1MB, oops) but the system still sent 5KB of it as HTML anyway and subtracted appropriate credits for that size. No, I don’t want my credits back, see first paragraph.
I believe there is a 25KB (before compression) limit to requests. If the results of the URL are larger than this, shouldn’t it reject the request outright instead of trying to send partial results and using credits?
I also requested a little page I created that I knew, uncompressed, it would be under 25K including image data. Unfortunately the system stripped the image that was 16KB and sent about 500 bytes of HTML along. Since the image was actually an SVG file it would have compressed up very nicely. The largest part of the page was all the included javascript files which would have exceeded the 25K limit due to the inclusion of jquery. It makes sense to strip out javascript includes.
This was the original request: https://twitter.com/JesseBrownFL/status/851175971784970241
My proposal would be to use something similar to the below commands:
wget -q -nd -E -k -Rjs,woff,ttf,otf,eot,font,css -Q20K -p http://tree.cafe ; du -s -b .
This outputs a size of 23598 bytes, just below the 25K limit, including the image on the page but taking out everything else that is already being stripped. The parameters do several things including making sure any additional files such as images don’t exceed 20K total. It does not count the main requested file in that total so the du command is needed to verify size before compressing and sticking it on the carousel.
It may be better to have URLs process similar to how the Wikipedia URLs are pulled so the images are in-line in the html file and shrunk. Then the size of the resulting file can be checked. If the URL directly links an image file then no processing should occur and if it is larger then the request size limit it should be rejected.
Obviously you can’t just dump whatever URL information from Twitter right to a shell to call wget, but after appropriate input sanitation wget could be used for this as an easy stop-gap. It may also be good if the system does some sort of check against the URL to make sure it isn’t a known inappropriate or malicious site, but that can be the next revision as more people use it.
That does bring up something I hadn’t thought about. How would Outernet delete files that have already reached receivers? This is in case someone decided to be malicious and request a URL with adult or other inappropriate content. I would guess it could be an easy command addition to the carousel that would simply give the path and filename as a delete command so it would take up very little space and get out there quickly.