Kamangir (Archer)

An Iranian looking at Iran as a foreigner…

Didish: Analysis of Shared Links

didish.png“Didish?” (means “have you seen it?” in Persian) is the project aimed at analyzing the links shared by Persian bloggers. This is a short introduction to the building blocks of this project.

1) Everything starts with Persian bloggers sharing links. They mainly use del.icio.us or Google Reader as the tool to realize this goal. The shared links are then showed off as badges on the person’s blog. Didish collects the RSS feeds to these shared links.

2) The feed aggregator installed at didish.kamangir.net fetches in the feeds and stores them in a database. The database is updated once every day, at least.

3) The Delphi-based code DidishExtract inputs the entries in the database in the following manner. For every source, the latest 1000 shared links are collected. If the feed contains less than this number of items, all available links are collected. The links go through a provisioning stage in which the structure of each link is corrected, if needed. This stage is especially necessary for the blogging service Persianblog and also for links posted using the corresponding link in Feedburner.

4) The links are analyzed from different perspectives. A report is then generated and posted on the report page. A sample report contains these items,

  1. The topmost one hundred sources according to the number of links available in all the sources (example).
  2. List of all the sources (example).
  3. List of servers (example). This list will help the comparison of the popularity of different blogging services, such as wordpress and blogspot.
  4. The 10 topmost sites and their share of links (Example). Note that the title of this chart shows the percentage of total links which referred to these sites. The pie chart then exhibits how these links are shared among the 10 topmost sites. For example, in this case, 17% of the links were to the ten topmost sites, 22% of which referred to radiozamaaneh.com, making the share of this site equal to 17%*22%=4% of the total linkage.
  5. The 10 topmost servers, similar to the above (example).
  6. The 10 topmost sites, according to the number of sources which had at least one link to them (example).
  7. Number of links vs. number of sources which had at least one link to the site, for the topmost sites (example).
  8. The connection graph (example). This graph is generated as follows. For the one hundred topmost sites, the ones which also share links are extracted. Generally, this list will only include blogs, and not media sources. Then, using a color code, the connections between the sites are presented as edges of a graph. A dark link from site A to site B indicates that a big portion of the links shared on site A are to site B.

It is important to emphasize that Didish is not concerned with links referred to in posts. The sole aim of Didish is the analysis of shared links. The more extensive analysis of all the links in blogs is carried out through the sister project KiBeKi.

For more information you can follow the regarding posts at this address or send me an email at arash@kamangir.net.

While I do all my best to carry out extensive checks before I publish any result, this project is still in its early days and therefore it has to undergo more tests of validity and integrity.

Related posts:

Acknowledgment:

The original idea of Didish is from Sara (Avayemoj). I have benefited from many constructive discussions with Vahid about this project.

This post is also available in Persian.

Last update: 4 March 2008