Archive for the ‘ KiBeKi ’ Category

As I also briefly mentioned in the previous post (see: Kamangir is Back), I have started working on Project Profiler (see the development log here).

In short, Profiler attempts at discovering the map of the Persian blogosphere, through analyzing the connections between the Persian bloggers in different social networks, including Friendfeed.com, which I have been focused on for the last couple of months. This project will also use the reports now being regularly published by Project Didish.

As a short presentation, here, two preliminary graphs generated by Profiler will be posted. As of know, there are 717 entries in the database, each representing one Persian blogger.  These bloggers have been discovered through friendfeed.com.

The first graph shows that from the 566 blogs registered in the database, 171 are on wordpress.com (30%), 122 are on blogspot.com (22%), and 58 are on blogfa.com (10%). Interestingly, about 186 blogs are on their own domains (33%) (also see the corresponding pie chart in the latest Didish report).

The second graph shows the ten services used by the most bloggers registered in the system. Red bars indicate filtered services, while green and magenta denote services which are accessible in Iran and those about which mixed reports have been given, respectively. The category “blog” is included in the mixed reports because many leading blogs are indeed filtered. More detailed analysis of this issue will be carried out in the coming phases of the project.

Profiler not only aims at producing a large detailed map of the Persian blogosphere, it will provide information about connections, usage statistics, and trends in this online society.

The first report of project KiBeKi (Generation 1) is available. According to this report, the ten most favorite websites in the Persian blogosphere are,

  1. blogger.com: linked at 304 sites (visit).
  2. feeds.feedburner.com: linked at 199 sites (visit).
  3. radiozamaaneh.com: linked at 180 sites (visit).
  4. isna.ir: linked at 168 sites (visit).
  5. google.com: linked at 167 sites (visit).
  6. balatarin.com: linked at 157 sites (visit).
  7. s.wordpress.com: linked at 155 sites (visit).
  8. persianblog.ir: linked at 154 sites (visit).
  9. webstats4u.com: linked at 150 sites (visit).
  10. fa.wordpress.com: linked at 146 sites (visit).

Similarly, the ten topmost Persian blogs are,

  1. 1pezeshk.com: linked at 129 sites (visit).
  2. khabgard.com: linked at 91 sites (visit).
  3. nikahang.blogspot.com: linked at 88 sites (visit).
  4. ahmadnia.net: linked at 81 sites (visit).
  5. younesspace.blogspot.com: linked at 78 sites (visit).
  6. persian.kamangir.net: linked at 73 sites (visit).
  7. khorshidkhanoom.com: linked at 68 sites (visit).
  8. mhmazidi.wordpress.com: linked at 68 sites (visit).
  9. balootak.com: linked at 65 sites (visit).
  10. hanouz.com: linked at 63 sites (visit).

The details of the procedure can be found here. The complete report, including charts and the connection graph, is here.

Through analyzing links shared by Persian bloggers, I regularly search for the hottest points in the Persian blogosphere (a brief introduction to the methodology is given in here, for more information please send me an email or leave a comment). This report is based on 171 sources and over 30,000 shared links. The Persian translation of this post can be found in here.

The following graph shows the ten top-most blogs/websites in terms of the number of shared links.

didish_domains1.png

  1. bbc.co.uk
  2. radiozamaaneh.com
  3. 1pezeshk.com
  4. persian.kamangir.net
  5. updateblog.net
  6. nikahang.blogspot.com
  7. dw-world.de
  8. freekeyboard.net
  9. asriran.com
  10. oldestfashion.blogspot.com

The list is similar to the results published about ten days ago, except for the presence of the Persian blog oldestfashion. Furthermore, there is now a close-to-state website (asriran) in the list. Interestingly, no state-run source has made it to the list and Persian BBC and Radio Zamaneh are still on top.

The list of servers is identical to the one published before, and therefore will not be published here.

While the above chart considers the number of links shared from each blog/website, another measure could be the number of sources which share links from each blog/website. The following chart shows the ten top-most blogs/websites from this point of view.

didish_sourceperdomains.png

  1. 1pezeshk.com
  2. freekeyboard.net
  3. radiozamaaneh.com
  4. updateblog.net
  5. bamdadi.com
  6. bbc.co.uk
  7. persian.kamangir.net
  8. asroone.net
  9. 9blog.wordpress.com
  10. zangoole.com

This list shows that while more links from Radio Zamaneh and BBC are shared, two Persian blogs of 1pezeshk and freekeyboard seem to be followed by more users.

The last chart looks at the correlation of the two charts shown in the above.

didish_link_sources.png

Similar to any research, these are preliminary results and need to be verified by more extensive deliberation. The author is ready to share the methodology and the results with any serious researcher, given that the privacy of the bloggers is guaranteed.

You might know that since a few months ago I have been working on a robot which crawls through the Persian blogosphere (see: Statistics of 78,000 Persian Blogs – Report on KiBeKi’s results so far). A companion to this robot is another project I have been working on recently. I have given the name “Didish?” to this project (means “have you seen it?” in Persian).

Didish is a feed aggregator (installed at didish.kamangir.net), which archives the links Persian bloggers share through delicious, Google Reader, or other services. As of now, I have been able to find 163 sources. This chart shows which service the sharing is carried out through.

didish_sourcess.png

Clearly, delicious is the favorite sharing service for Persian bloggers. This could be due to the fact that the sharing service in Google Reader is fairly new.

Using DidishExtractor, a custom tool I have developed in Delphi, I have been able to extract the recent links of the sources,therefore collecting a total of more than 20,000 links shared by Persian bloggers. These links are from over 4300 different sources (blogs and websites). I will skip the technical details here, but the tool goes through different stages of lookup table-based parsing and other string-related procedures.

The first question I aimed at giving an answer to was “which Persian sources are the ten most favorites, according to the links shared by Persian bloggers?” The answer to this question is given below, both as a pie chart and as a list. Interestingly, these 10 sources make up about 20% of the total, indicating that there are a few “hot” sources and a huge number of “less popular” ones, as one would also expect.

didish_domains.png

  1. bbc.co.uk
  2. radiozamaaneh.com
  3. 1pezeshk.com
  4. persian.kamangir.net
  5. updateblog.net
  6. dw-world.de
  7. freekeyboard.net
  8. nikahang.blogspot.com
  9. bamdadi.com
  10. asroone.net

As shown here, BBC Persian and the Amsterdam-based Radio Zamaneh are on top of the list, closely followed by two Persian blogs of 1pezeshk.com and persian.kamangir.net (my Persian blog). Then come the Persian blog updateblog.net and DW-World Persian. Interestingly, none of the state-run news sources make it to the top-ten list.

As the early results of KiBeKi suggested that the Persian blogging service blogfa is the dominant player in the Persian blogosphere, the main domain names of the aggregated links were analyzed in an independent analysis. Interestingly, the top-ten hottest domain names contain about half the total links.

didish_servers.png

Although, three fourths of the blogs KiBeKi discovers are hosted on blogfa, this analysis shows that Persian bloggers prefer links on blogspot and wordpress more. Note that these blogs may be in other languages other than Persian. Finally, we observe that four Persian blogs of 1pezeshk.com, persian.kamangir.net, updateblog.net and freekeyboard.net are able to compete with the whole blogfa, in terms of number of posts shared by other Persian bloggers.

These are very preliminary results and more analysis has to be carried out in order to verify them. If you have any interest in this field or would like to do joint research, please drop me a line here or send me an email at kamangirblog@yahoo.ca. This post is also available in Persian.

My Interview with Global Voices

What are the objectives of your research and how do you want to develop them?
First, I aim at finding the population of the Persian blogosphere and the connection pattern in there. This will only concern Persian blogs that exist, independent of their nature and frequency of update. Then, in the second phase, I’ll work on determining the volume of activity of blogs through reading timestamps. This will give me a better understanding of the Persian blogosphere and will help reject dead blogs. Nevertheless, more than anything else, this is a preliminary step in helping other researchers.

This is a select part of my interview wit Global Voices Online about project “KiBeKi” (the robot which analyzes the Persian blogosphere). Read the rest here.

Read the rest of this entry »

You might know that I am currently working on a project called KiBeKi (project’s page). It is a research I started out of curiosity, but then as the databases grew I realized the value of the information it gathers. I am in touch with a few researchers who have requested to use this information for their work. I am ready to share this information with other serious researchers, given privacy concerns are met.

As of now, the robot has discovered 170,000 Persian blogs, 78,000 of which have been thoroughly analyzed and 94,000 others await in the queue. Since two weeks ago, the number of fully-processed blogs has increased by more than 500% but the length of the queue has remained almost unchanged. This might be an indicator that the system has reached the point of convergence i.e. the point where few new blogs are found. Though, the estimates point at numbers around 700,000 for the number of Persian blogs and thus I am not willing to draw any conclusion yet. The system has also discovered 50,000 email addresses on the analyzed blogs. These are email addresses used by Persian bloggers, and thus useful for analyzing their preferences.

emails.png

Clearly, yahoo is the dominant choice for Persian bloggers, gmail is the next.

scripts.png

About three fourth of the Persian blogosphere is operating on Blogfa servers, according to the information collected so far. Please note that due to technical reasons, the robot only auto-detects Persian blogs based on the service they use, i.e. Persian blogs on blogspot, wordpress, and other services are not included in the analysis, yet. This however does not seem to invalidate the results because experience shows that Persian bloggers generally tend to prefer Iranian services to the likes of blogspot and wordpress.

kibeki2.pngStarting last month, I have been working on a project which I have titled “KiBeKi“, or “What’s up?” in English. The project aims at developing a software robot which would crawl inside the Persian blogosphere in order to generate the connectivity graph of this dynamic place as well as to extract other information.

At this moment, the robot has found 47317 sources, 3408 of which have been completely analyzed and 30866 of which have been determined to be Persian blogs but still await full analysis. During this process, the robot has discovered 2945 email addresses which are used by Persian bloggers. The robot is working right now and is gathering more information by the minute.

It is worth to mention that the robot does not have language recognition skills yet, and thus is only able to spot Persian blogs based on the server they are located on, plus the clues I manually give to it. Therefore, although there are many Persian blogs on blogspot and wordpress, these tentative results mainly include blogs found on Persian-language servers such as blogfa and persianblog.

emails.png

This graph shows the email services used by Persian bloggers, as found so far by the robot. Clearly, for the group analyzed here, yahoo email is the most favorite service by far. gmail and hotmail follow with gmail being about six times more popular.

script.png

This graph shows the servers on which the spotted Persian blogs are located. The graph is titled as showing the “scripts” because there are many on-domain blogs which use wordpress, movable type or even blogger. Clearly, for the blogs analyzed here, blogfa.com is the most favorite service.

I am ready to share this information with any serious researcher in order to carry out a joint research project which would hopefully result in the publication of the generated results and analysis. If you have any interest in the subject, or know of any similar research going on anywhere, please leave a comment or send me an email to kamangirblog@yahoo.ca.