Using Wikipedia to supplement information on world events

sam_uk · November 4, 2016, 10:11pm

Each ‘view’ to a Wikipedia page could be thought of as a vote.

In the English speaking world yesterday most Wikipedia users were interested in the Chicago Cubs. (they won a baseball thing) The Day before it was Day of the Dead. If there is a ‘big news story’ then it will tend to appear here But also there is room for the more random or amusing articles.

This could be done with the Wikipedia’s in French, Turkish, Arabic, Spanish, Russian etc to give a good diversity of topical content in multiple languages. It Would ‘cost’ in the region of 1.2Mb /day to do so.

sam_uk · November 5, 2016, 8:11pm

So running

On yesterdays date, & publishing the top article (With a 30 day blacklist to keep fresh content coming through would give these articles for October 2016)

1st Oct - Deaths_in_2016"
2nd Oct- “2016_Ryder_Cup”
3rd -Oct- “Bound_for_Glory_(2016)”
4th Oct - “United_States_presidential_election_debates,2016"
5th Oct - "No_Mercy(2016)”
6th Oct - “United_States_presidential_election,2016"
7th Oct - “2016_Atlantic_hurricane_season”
8th Oct -List_of_Donald_Trump_presidential_campaign_endorsements,2016"
9th Oct - “2016_Formula_One_season”
10th Oct - "Hell_in_a_Cell(2016)"
11th Oct - "2016_Kabaddi_World_Cup(Indoor)”
12th Oct - “Survivor_Series_(2016)”
13th Oct - “Nationwide_opinion_polling_for_the_United_States_presidential_election,2016"
14th Oct - “2016–17_NBA_season”
15th Oct - “2016–17_Premier_League”
16th Oct 2016_Summer_Olympics"
17th Oct “2016_AFC_U-19_Championship”
18th Oct "2016_Kabaddi_World_Cup(Standard_style)”
19th Oct “Battle_of_Mosul_(2016)”
20th Oct 2016_World_Series"
21st Oct - “October_2016_Dyn_cyberattack”
22nd Oct “2016_World_Series”
23rd Oct - 2016_Eséka_train_derailment"
24th Oct “2016_Thai_League_Cup”
25th Oct - 2016–17_EFL_Cup"
26th Oct “Miss_International_2016”
27th Oct - 2016_WTA_Finals"
28th Oct - “Miss_Earth_2016”
29th Oct “2016_Asian_Men’s_Hockey_Champions_Trophy”
30th Oct “2016_NFL_Draft”

I think it’s a pretty good list of articles. @Abhishek @syed are you intested in a script that outputs those wikipedia articles?

I also think it would also be worth grabbing this page every day: Portal:Current events/2016 November 4 - Wikipedia(you’d have to script sequentially grabbing the next date)

Are you interested in getting that page to suppliment the other news stuff?

Abhishek · November 5, 2016, 10:14pm

absolutely. though just a script which gives the “names” as listed above (the way they are found in the wikipedia urls) would be enough as I already have a script to dump the actual pages.

sam_uk · November 6, 2016, 10:39pm

I’ve made a start on this. It’s been a bit harder than I thought & I’m feeling a bit dispirited about it. I’ve got the first part working, so it creates a text file with the name of the article to broadcast in it. I’m stuck on looking in the 30 day blacklist to ensure we don’t keep rebroadcasting the same article. If anyone can help please do! Copy the file below to a FILENAME.sh and run with $bash FILENAME.sh

#!/usr/bin/env bash

#Setting dates & output file
curDate=$(date --date=“Yesterday” +“%Y/%m/%d”)
fileDate=$(date --date=“Yesterday” +“%Y-%m-%d”)
outputFile=“Wikipedia-${fileDate}.txt”

#Copy Article title to datestamped text file (works!)
curl -s “https://wikimedia.org/api/rest_v1/metrics/pageviews/top/en.wikipedia.org/all-access/$curDate” | jq ‘.items.articles |select(.views > 10000).article’ | grep 2016| grep -vE “(film|tv|TV|Film)” | head -n 1 > “${outputFile}”

I had been thinking of writing all the files to a kind of log and removing the files from the log after 30 days so as to not re-send the file too often, whilst at the same time sending updated files that are popular…

@Abhishek have you got a solution for this?

Thanks

Sam

sam_uk · November 10, 2016, 4:56pm

Today’s articles using this methodology would be

English

Spanish

French

Portugese

German

Polish

Chinese

zh.wikipedia.org

2016年美國總統選舉

巴拉克·奥巴馬民主党唐納·川普共和黨 2016年美國總統選舉於美國時間2016年11月8日舉行，此次是美國第58屆總統選舉，同時眾議院全部435個席位及參議院33個議席也會進行改選以產生美國第115屆國會。第43、44任總統巴拉克·歐巴馬將於2017年1月20日任期屆滿，因已連任一次，依據美國憲法第二十二修正案規定「總統、副總統之任期為四年，連選得連任一次」，所以此次無法再度競選。此次共有10組候選人參選，其中共和黨候選人唐納·川普、邁克·彭斯搭檔掌握306張選舉人票（公民投票總票數62,979,636張，45.97%），勝過民主黨候選人希拉蕊·柯林頓、蒂姆·凱恩搭檔掌握232張選舉人票（公民投票總票數65,844,610張，48.06%），超過選舉人票半數270張的門檻篤定當選美國第45任總統。總計有29個第三黨及獨立候選人參選今次總統選舉。至於約翰遜以及吉爾則是繼2012年美國總統選舉繼續擔任自由意志黨及綠黨之總統候選人。選舉人團於2016年12月19日投票並確定選舉結果，特朗普正式成為總統當選人。

Russian

ru.wikipedia.org

Президентские выборы в США (2016)

Президе́нтские вы́боры в США 2016 года прошли 8 ноября и стали 58-ми выборами президента США в рамках всеобщих выборов. Вторая статья Конституции США предусматривает, что для избрания президентом Соединённых Штатов Америки лицо должно быть гражданином США по рождению, не моложе 35 лет, прожившим на территории Соединённых Штатов Америки не менее 14 лет. Действующий на тот момент президент США Барак Обама не имел права быть избранным на третий срок — по причине ограничения сроков полномочий в соот...

Arabic

Hindi

hi.wikipedia.org

संयुक्त राज्य राष्ट्रपति चुनाव, 2016

बराक ओबामा डेमोक्रैटिक पार्टी डोनाल्ड ट्रम्प रिपब्लिकन पार्टी संयुक्त राज्य अमेरिका के राष्ट्रपति चुनाव, 2016, चार वर्ष से होने वाले 58वें और अब तक के सबसे ताजा अमेरिकी चुनाव हैं जो मंगलवार, नवम्बर 8, 2016 को सम्पन्न हुये। इस चुनाव में रिपब्लिकन पार्टी के उम्मीदवार डोनाल्ड ट्रम्प ने डेमोक्रैटिक पार्टी की उम्मीदवार हिलेरी क्लिंटन को हराया। जनवरी 20, 2017 को ट्रम्प अमेरिका के 45वें राष्ट्रपति तथा उनके साथी, इंडियाना के गर्वनर माइक पेंस 48वें उपराष्ट्रपति के रूप में शपथ ग्रहण करेंगे। मतदाताओं ने ...

sam_uk · January 15, 2018, 9:20pm

@Abhishek I realise this isn’t a priority now, until post shipping,

But getting all the pages linked internally from the Wikipedia ‘current events’ page eg: Portal:Current events/2018 January 15 - Wikipedia

It’s ‘just another perspective’ but the internal links were broadcast as well, it would give a richer background than other sources of news in my opinion.

Seems like it would be possible now we have some additional bandwidth?

Konrad_Roeder · February 12, 2018, 2:20am

I think grabbing the Wikipedia Current events page is great. And datacasting the articles the current event links to would be fantastic. This provides in depth news that many of the news services can’t provide because they pretty much have limited themselves to a flat file of 500-750 words.

–Konrad, WA4OSH

unixpunk · February 14, 2018, 8:56pm

I think the main problem with this idea is ‘people’…People make up these articles, and believe it or not, people are biased. Plus if someone wanted to plant some disinfo and use a botnet to pump up the views, it would be very trivial to spread disinfo to the third-world this way. Not that I think you can trust most ‘mainstream’ news outlets any more than wikipedia, at least someone can’t login one day and add a bunch of disinfo, hammer it with a botnet and watch it get stuffed onto the carousel for upload to outernet. Just my .$02.

sam_uk · February 14, 2018, 9:18pm

You don’t even need a botnet!

These are human curated So you just need to make edits that are verifiable from third party sources and meet the other critera

Yes absolutely it’s a partial view of the world, but it’s another perspective to go alongside DW.com, aljazeera.com & voanews.com

All of these sources have bias, that’s the nature of news.

kenbarbi · February 14, 2018, 9:22pm

I fully agree. With all the RSS news feeds Outernet has implemented, Wikipedia is a valuable addition to explain what the news might be “all about”.

We need to provide all sides of an issue - - even the potentially biased ones - - for readers to consume. Ken

sam_uk · February 14, 2018, 9:30pm

For me the real value of the latest news page is that it provides contextual sources that link back to full wikipedia articles.

If we script grabbing all the pages linked from the current events page then it becomes a really rich resource for background.

To take today’s Zuma resignation

VOA is yet to cover it

But none of the above can match the contextual info given by

unixpunk · February 14, 2018, 9:47pm

Again, my point was about it being an easy avenue for disinfo, I’m not saying it isn’t useful, but what happens when Outernet puts up a slanderous article because they don’t have editors or curators and just dump up whatever a script returned? Hate to see a project get shutdown by a lawsuit for something like that, once its pushed up, it can’t be pulled back… News is another story, not easily editable, wikipedia is RIPE with disinfo…Heck, I bet that Zuma article alone has 10 pieces or more of disinfo in it that anyone with a little searching skills could uncover… Your script can’t tell that an article has 100 outstanding corrections that need to be made, etc, it would get uploaded as-is, right or wrong, slanderous or true, and in extreme cases, illegal or legal…Free speech isn’t a given and trade agreements do prevent corporations from saying ‘illegal’ things in certain countries, etc. All I’m saying is there is much more to consider here. Not going to argue, just happen to know a lot about disinfo and where and how it exists and gets spread, will leave it at that.

Konrad_Roeder · February 14, 2018, 10:02pm

I really don’t agree. Much of Wikipedia has actually been reviewed by a crowd of reviewers. Each article is rated on its degree of completeness and whether it meets Wikipedia’s standards. You can see this on the “talk” page for each article. So if you are worried, only use “good” or featured articles. If someone makes a change to these articles, it’s looked at with very close scrutny.

For example, take a look at the “Peak oil” article.

“Peak oil has been listed as one of the Geography and places good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it.”

In contrast, look at the “RFD-TV” article.

“|This article is within the scope of WikiProject Television, a collaborative effort to develop and improve Wikipedia articles about television programs. If you would like to participate, please visit the project page where you can join the discussion. To improve this article, please refer to the style guidelines for the type of work.||
|—|—|
| ??? |This article has not yet received a rating on the project’s quality scale.|
| ??? |This article has not yet received a rating on the project’s importance scale.|”

my two cents… I’ve been editing Wikipedia articles for over 10 years.

–Konrad, WA4OSH

kenbarbi · February 15, 2018, 2:00pm

Outernet Broadcast Content faces various copyright and permission issues.

Possession of a digital file, regardless of how it was obtained, does not give one the right to share and distribute it. Outernet must ensure broadcast content is openly licensed for sharing.

When Outernet was broadcasting L-band, Al Jazeera, Deutsche Welle, and BBC gave their permission for Outernet to share their RSS feeds pro bono. Other information that could be broadcast was Public Domain Content such as VOA and US Government Health feeds. I was personally unsuccessful getting 2 other commercial news sources to allow rebroadcast of their RSS feeds without paying hefty fees Outernet could not afford.

The real question for us to investigate are the other sources for shared information. Creative Commons (https://creativecommons.org) can come in to play here providing alternative sources to investigate. Wikipedia is only one of the many integral parts of Creative Commons.

Ken

Konrad_Roeder · February 15, 2018, 7:34pm

What about Creative Commons content?
Creators choose a set of conditions they wish to apply to their work.

–Konrad, WA4OSH