Using Wikipedia to supplement information on world events

Each ‘view’ to a Wikipedia page could be thought of as a vote.

In the English speaking world yesterday most Wikipedia users were interested in the Chicago Cubs. (they won a baseball thing) The Day before it was Day of the Dead. If there is a ‘big news story’ then it will tend to appear here But also there is room for the more random or amusing articles.

This could be done with the Wikipedia’s in French, Turkish, Arabic, Spanish, Russian etc to give a good diversity of topical content in multiple languages. It Would ‘cost’ in the region of 1.2Mb /day to do so.

1 Like

So running

curl ‘https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia.org/all-access/2016/10/21’ | jq ‘.items.articles | select(.views > 10000).article’ | grep 2016 | grep -vE “(film|tv|TV|Film)”

On yesterdays date, & publishing the top article (With a 30 day blacklist to keep fresh content coming through would give these articles for October 2016)

1st Oct - Deaths_in_2016"
2nd Oct- “2016_Ryder_Cup”
3rd -Oct- “Bound_for_Glory_(2016)”
4th Oct - “United_States_presidential_election_debates,2016"
5th Oct - "No_Mercy
(2016)”
6th Oct - “United_States_presidential_election,2016"
7th Oct - “2016_Atlantic_hurricane_season”
8th Oct -List_of_Donald_Trump_presidential_campaign_endorsements,2016"
9th Oct - “2016_Formula_One_season”
10th Oct - "Hell_in_a_Cell
(2016)"
11th Oct - "2016_Kabaddi_World_Cup
(Indoor)”
12th Oct - “Survivor_Series_(2016)”
13th Oct - “Nationwide_opinion_polling_for_the_United_States_presidential_election,2016"
14th Oct - “2016–17_NBA_season”
15th Oct - “2016–17_Premier_League”
16th Oct 2016_Summer_Olympics"
17th Oct “2016_AFC_U-19_Championship”
18th Oct "2016_Kabaddi_World_Cup
(Standard_style)”
19th Oct “Battle_of_Mosul_(2016)”
20th Oct 2016_World_Series"
21st Oct - “October_2016_Dyn_cyberattack”
22nd Oct “2016_World_Series”
23rd Oct - 2016_Eséka_train_derailment"
24th Oct “2016_Thai_League_Cup”
25th Oct - 2016–17_EFL_Cup"
26th Oct “Miss_International_2016”
27th Oct - 2016_WTA_Finals"
28th Oct - “Miss_Earth_2016”
29th Oct “2016_Asian_Men’s_Hockey_Champions_Trophy”
30th Oct “2016_NFL_Draft”

I think it’s a pretty good list of articles. @Abhishek @syed are you intested in a script that outputs those wikipedia articles?

I also think it would also be worth grabbing this page every day: Portal:Current events/2016 November 4 - Wikipedia(you’d have to script sequentially grabbing the next date)

Are you interested in getting that page to suppliment the other news stuff?

absolutely. though just a script which gives the “names” as listed above (the way they are found in the wikipedia urls) would be enough as I already have a script to dump the actual pages.

1 Like

I’ve made a start on this. It’s been a bit harder than I thought & I’m feeling a bit dispirited about it. I’ve got the first part working, so it creates a text file with the name of the article to broadcast in it. I’m stuck on looking in the 30 day blacklist to ensure we don’t keep rebroadcasting the same article. If anyone can help please do! Copy the file below to a FILENAME.sh and run with $bash FILENAME.sh

#!/usr/bin/env bash

#Setting dates & output file
curDate=$(date --date=“Yesterday” +“%Y/%m/%d”)
fileDate=$(date --date=“Yesterday” +“%Y-%m-%d”)
outputFile=“Wikipedia-${fileDate}.txt”

#Copy Article title to datestamped text file (works!)
curl -s “https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia.org/all-access/$curDate” | jq ‘.items.articles |select(.views > 10000).article’ | grep 2016| grep -vE “(film|tv|TV|Film)” | head -n 1 > “${outputFile}”

I had been thinking of writing all the files to a kind of log and removing the files from the log after 30 days so as to not re-send the file too often, whilst at the same time sending updated files that are popular…

@Abhishek have you got a solution for this?

Thanks

Sam

Today’s articles using this methodology would be

English

Spanish

French

Portugese

German

Polish

Chinese

Russian

Arabic

Hindi

@Abhishek I realise this isn’t a priority now, until post shipping,

But getting all the pages linked internally from the Wikipedia ‘current events’ page eg: Portal:Current events/2018 January 15 - Wikipedia

It’s ‘just another perspective’ but the internal links were broadcast as well, it would give a richer background than other sources of news in my opinion.

Seems like it would be possible now we have some additional bandwidth?

1 Like

I think grabbing the Wikipedia Current events page is great. And datacasting the articles the current event links to would be fantastic. This provides in depth news that many of the news services can’t provide because they pretty much have limited themselves to a flat file of 500-750 words.

–Konrad, WA4OSH

1 Like

I think the main problem with this idea is ‘people’…People make up these articles, and believe it or not, people are biased. :slight_smile: Plus if someone wanted to plant some disinfo and use a botnet to pump up the views, it would be very trivial to spread disinfo to the third-world this way. Not that I think you can trust most ‘mainstream’ news outlets any more than wikipedia, at least someone can’t login one day and add a bunch of disinfo, hammer it with a botnet and watch it get stuffed onto the carousel for upload to outernet. Just my .$02.

2 Likes

You don’t even need a botnet!

These are human curated So you just need to make edits that are verifiable from third party sources and meet the other critera

Yes absolutely it’s a partial view of the world, but it’s another perspective to go alongside DW.com, aljazeera.com & voanews.com

All of these sources have bias, that’s the nature of news.

1 Like

I fully agree. With all the RSS news feeds Outernet has implemented, Wikipedia is a valuable addition to explain what the news might be “all about”.

We need to provide all sides of an issue - - even the potentially biased ones - - for readers to consume. Ken

1 Like

For me the real value of the latest news page is that it provides contextual sources that link back to full wikipedia articles.

If we script grabbing all the pages linked from the current events page then it becomes a really rich resource for background.

To take today’s Zuma resignation

VOA is yet to cover it

But none of the above can match the contextual info given by

1 Like

Again, my point was about it being an easy avenue for disinfo, I’m not saying it isn’t useful, but what happens when Outernet puts up a slanderous article because they don’t have editors or curators and just dump up whatever a script returned? Hate to see a project get shutdown by a lawsuit for something like that, once its pushed up, it can’t be pulled back… News is another story, not easily editable, wikipedia is RIPE with disinfo…Heck, I bet that Zuma article alone has 10 pieces or more of disinfo in it that anyone with a little searching skills could uncover… Your script can’t tell that an article has 100 outstanding corrections that need to be made, etc, it would get uploaded as-is, right or wrong, slanderous or true, and in extreme cases, illegal or legal…Free speech isn’t a given and trade agreements do prevent corporations from saying ‘illegal’ things in certain countries, etc. All I’m saying is there is much more to consider here. Not going to argue, just happen to know a lot about disinfo and where and how it exists and gets spread, will leave it at that.

I really don’t agree. Much of Wikipedia has actually been reviewed by a crowd of reviewers. Each article is rated on its degree of completeness and whether it meets Wikipedia’s standards. You can see this on the “talk” page for each article. So if you are worried, only use “good” or featured articles. If someone makes a change to these articles, it’s looked at with very close scrutny.

For example, take a look at the “Peak oil” article.

“Peak oil has been listed as one of the Geography and places good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it.”

In contrast, look at the “RFD-TV” article.

“|This article is within the scope of WikiProject Television, a collaborative effort to develop and improve Wikipedia articles about television programs. If you would like to participate, please visit the project page where you can join the discussion. To improve this article, please refer to the style guidelines for the type of work.||
|—|—|
| ??? |This article has not yet received a rating on the project’s quality scale.|
| ??? |This article has not yet received a rating on the project’s importance scale.|”

my two cents… I’ve been editing Wikipedia articles for over 10 years.

–Konrad, WA4OSH

Outernet Broadcast Content faces various copyright and permission issues.

Possession of a digital file, regardless of how it was obtained, does not give one the right to share and distribute it. Outernet must ensure broadcast content is openly licensed for sharing.

When Outernet was broadcasting L-band, Al Jazeera, Deutsche Welle, and BBC gave their permission for Outernet to share their RSS feeds pro bono. Other information that could be broadcast was Public Domain Content such as VOA and US Government Health feeds. I was personally unsuccessful getting 2 other commercial news sources to allow rebroadcast of their RSS feeds without paying hefty fees Outernet could not afford.

The real question for us to investigate are the other sources for shared information. Creative Commons (https://creativecommons.org) can come in to play here providing alternative sources to investigate. Wikipedia is only one of the many integral parts of Creative Commons.

Ken

1 Like

What about Creative Commons content?
Creators choose a set of conditions they wish to apply to their work.

–Konrad, WA4OSH