Twitter request support for small non-HTML files


#1

Just wanted to start a new conversation on this. I realize it is a very low priority. Also, I’m not worried about burning through my credits due to testing. I consider them a donation and I get to play with a new feature as a thank you :slight_smile:

I had some more time today to play with the Twitter feed. It looks like image files are still being treated as HTML and partially sent. The one I requested first was actually way too large (almost 1MB, oops) but the system still sent 5KB of it as HTML anyway and subtracted appropriate credits for that size. No, I don’t want my credits back, see first paragraph.

I believe there is a 25KB (before compression) limit to requests. If the results of the URL are larger than this, shouldn’t it reject the request outright instead of trying to send partial results and using credits?

I also requested a little page I created that I knew, uncompressed, it would be under 25K including image data. Unfortunately the system stripped the image that was 16KB and sent about 500 bytes of HTML along. Since the image was actually an SVG file it would have compressed up very nicely. The largest part of the page was all the included javascript files which would have exceeded the 25K limit due to the inclusion of jquery. It makes sense to strip out javascript includes.

This was the original request: https://twitter.com/JesseBrownFL/status/851175971784970241

My proposal would be to use something similar to the below commands:
wget -q -nd -E -k -Rjs,woff,ttf,otf,eot,font,css -Q20K -p http://tree.cafe ; du -s -b .

This outputs a size of 23598 bytes, just below the 25K limit, including the image on the page but taking out everything else that is already being stripped. The parameters do several things including making sure any additional files such as images don’t exceed 20K total. It does not count the main requested file in that total so the du command is needed to verify size before compressing and sticking it on the carousel.

It may be better to have URLs process similar to how the Wikipedia URLs are pulled so the images are in-line in the html file and shrunk. Then the size of the resulting file can be checked. If the URL directly links an image file then no processing should occur and if it is larger then the request size limit it should be rejected.

Obviously you can’t just dump whatever URL information from Twitter right to a shell to call wget, but after appropriate input sanitation wget could be used for this as an easy stop-gap. It may also be good if the system does some sort of check against the URL to make sure it isn’t a known inappropriate or malicious site, but that can be the next revision as more people use it.

That does bring up something I hadn’t thought about. How would Outernet delete files that have already reached receivers? This is in case someone decided to be malicious and request a URL with adult or other inappropriate content. I would guess it could be an easy command addition to the carousel that would simply give the path and filename as a delete command so it would take up very little space and get out there quickly.


#2

This may have potential but it would need to be run after something like wget strips out the files we don’t want (javascript, included css (keep existing in-line css), font files, video files, etc).


#3

Jesse,

yes - right now i actually have the seystem oriented towards just sending text. I even clean up the HTML - remove css, js, all embeds, images - everything. This allows most text-oriented pages to be sent, as its rare for the text content to be very large.

The problem with inliner is - with all the junk that a typical page includes, it’d be rare to find something that would fit into the size limit.

Just images: I actually did not have images in mind at all when I set up the system. I can see the value, though I have to think of a way to make it work considering the size limits. Downconversion of image quality is very tricky. But let me think on it more.


#4

If you like I could help you out with the images part of it. It is something I’ve done in the past for a different project where image file size was critical and needed to be processed on the fly.


#5

@kf4hzu Are you able to elaborate a bit more on that other thing you built?


#6

It was nothing special. Back in the days when Slackware was a “cool” Linux distro to use I put together some utilities to convert web pages to smaller versions for use on AMPRNet. With only 1200 baud half-duplex browsing normal web pages was nearly impossible. I didn’t really do anything with the HTML like @Abhishek is doing, just the images.

I’ve actually been digging through my old files since I started this thread to see if I can find the glue code for it. Reproducing it would be easier, but I am nostalgic so I wanted to find it :slight_smile:


#7

What did you find to be useful with just 1200 baud?


#8

ASCII art was popular back in the 1200 half duplex days of Packet Radio :slight_smile:
http://picascii.com/


#9

Schematics, pictures of equipment, qsl card pictures. Ham radio nerd stuff :wink: