Multilingual Wikipedias - Part I
I have been working with Wikipedia data for sometime. As a non-native English speaker I am naturally interested in impact of multilingual Wikipedians to Wikipedia platform. As a researcher I wanted to dig into this further and wanted to understand Multlingual Wikipedians edit behavior and what type of impact they bring in to the Wikipedia. I plant to write my results as a series of posts as time permits.
This study is seperated into two main sections, Movement between Wikipedia domains and Area of Contribution.
Areas of Contribution
Each Wikipedia has multiple areas interms of Wikipedia “lingo” it’s called Namespace. Each namespace is a reserved name in the MediaWiki platform. For example, in the user namespace all titles begin with “User”.
As you can see each subject namespace has its talk namespace for example “main” which is the article namespace where all the articles are at has its “Talk” namespace. Talk namespace is defined for users to discuss information related to the topic. Example, if a wikipedia edit is working on article which is the “Main” namespace that user would discuss about that article and its related information and issues in “Talk” (1) with other users. You can read all the descriptions of namespaces by going to above links.
Wikipedia is community driven information center. Given these namespaces, it is interesting to see if language proficency of a user/contributor has an impact on the type of work and edtior does, type of content and editor works on, and quality of the work of the work that editor performs?
####How do we identify editor’s proficiency level? User’s can publish their language and proficiency level of the language in their user page (user profile page) with the help of Babel Babel
Users can use this template to display the language proficiency, it is self-reported. We collect 286 languages users information and their edit history to generate what part of propotion of their edits to each main namespaces and talk namespaces.
Lets take a look at English, French, German, Norwegian, Ukranian, and Malaysian wikipedias. In the above graph y axis shows average propotinal edits generated by users with language proficency from 1 to N where 1 being lowest and N being “native”, which is represented in the x axis. Each line represent a namespace, for this graph I am only using Main, Main Talk, User, User Talk, Wikipedia, Wikipedia Talk, Template, and Template Talk namespaces.
Contributions to Main namespace respect language proficency shows and interesting insights respect to language. In English language users with lower proficiency have contributed to sligthly higher propotion compare to contributions to Main namespace provided by users with “level 5” and “level native”.Is this due to characteristics of English language? Does it mean English is a easier language for users to construct sentences? or English wikipedians are more welcoming towards less language proficeincy users compare to other language Wikipedia domains. French is known as a harder language to learn, looking at the graph in French language there is a clear edit contribution boost when looking at propotional edit between French “Native” vs French Non Native.
to be continued…