Visualising the blog network of #edc3100 students

The following describes the process and results of using Gephi to generate some visualisations of the inter-connections between the blogs of students in the course I’m teaching. The process is heavily informed by the work of Tony Hirst.

The result

The following represents the student blogs that have connected with each other. Size of the node is weighted towards the number of connections coming in. You can see a couple in the bottom right hand corner who have linked to themselves. The figure also suggests that there are 6 or 7 communities within these.


There are actually 300+ blogs in the data set. However, a significant number of those are not yet connected to another blog. Hence the small number in the above image. Two possible explanations for this

  1. Many of the students haven’t yet taken seriously the need to connect with each other.
  2. There’s a bug in the code producing the file.

Will need to follow up on this. Will also need to spend a bit more time exploring what Gephi is able to do. Not to mention exploring why 0.8.2 of Gephi wouldn’t run for me.

The process

The process essentially seems to be

  1. Generate a data file summarising the network you want to visualise.
  2. Manipulate that data file in Gephi.

The rest contains a bit of a working diary of implementing the above two steps.

Generating the file

The format of the “GDF” file used in Tony’s post appears to be

  • A text file.
  • Two main sections
    1. Define some user/node information.

      The format is shown below. The key seems to be the “name” which is a unique identified used in the next section.

      [code lang=”bash”]
      nodedef> name VARCHAR,label VARCHAR, totFriends INT,totFollowers INT, location VARCHAR, description VARCHAR
      67332054,jimhillwrites,105,282,"Herne Hill, London","WIRED UK Product Editor."

    2. Define the connections

      Essentially a long list of id pairs representing a user and their friends. I’m assuming this means the use connects to the friend.

      [code lang=”bash”]
      edgedef> user VARCHAR,friend VARCHAR

More on the GDF format available here. It mentions a minimal GDF file and also mentions that the edge thickness can be specified. This seems useful for this experiment i.e. edge thickness == number of links from one student blog to another.

So the file format I’ll use will come straight from the minimal spec, i.e.
[code lang=”bash”]
nodedef>name VARCHAR,label VARCHAR
s1,Site number 1
s2,Site number 2
s3,Site number 3
edgedef>node1 VARCHAR,node2 VARCHAR, weight DOUBLE
s3,s2, 2.34
s3,s1, 0.871

Thinking I’ll use the “hostname” for the student’s blog as the “site number”. Maybe just the first four letters of it. Just to keep student anonymity.

Questions for later

  1. Can I modify the file format to include with each “friend” connection a date?

    The idea is that the date will represent when the post was made. Using this I might be able to generate a visualisation of the connections over time.

  2. Is there value in also mapping the connections to the other links within the students’ posts?

    Could provide some idea of what they are linking to and help identify any interesting clusters.

The data

The database I’m maintaining contains

  • All the URLs for the students’ blogs.
  • All the posts to the students’ blogs.

I already have a script that is extracting links from each of the student blogs, I just need to modify this to count the number of connections between student blogs…..a bit easier than I thought it might be.

Now a quick script to generate the GDF file. Done.

Using Gephi

This is where I’m helping Tony Hirst’s instructions work with a minimum of hassle.

That’s a bugger. Empty menus on Gephi. It’s not working. Is it wrong of me to suspect Java issues?

Going back to older version. That’s worked, but I haven’t installed it yet into Applications. 0.8.2 seemed to have worked previously as well. Get this done and figure it out later.

File opened. We have a network.

001 - First Graph

Tony Hirst then removes the unconnected nodes. I’d like to leave them in as it will illustrate the point that the students need to connect with others.

The Modularity algorithm doesn’t seem to be working as I’d expect (on my very quick read). It’s finding 200+ communities. Perhaps that is to be expected given that most blogs are connected by one or two links and that I haven’t removed the unconnected nodes. Yes works much better if you do that.

A bit more playing produces the final result above.

0 Replies to “Visualising the blog network of #edc3100 students”

  1. I’m impressed. Love the learning in public style. The best of it is the next set of questions. I’m re-reading Weinberger’s too big to know – one of his many interesting points is that links “filter forward”. Thhe other random thought. These maps always look neuron like. So what do they look like over time? An animation of this might be interesting.

  2. And, purely in your interests – I hope you’ve got a few drafts of papers out of all of this. Quality and significance of the ideas so far above so much of the dross you see in ed tech research/scholarship.

    1. Thanks Chris. That’s the aim. But there’s always the threat of more interesting stuff to do and the process of transforming this into something that will get accepted tends to be boring.

  3. I agree. Long form stuff can be – but the trick might be to find places which are more amenable to different modes- blog-like i.e. publish then get reviewed kind of thing. The sad thing for me is that I look at some acas and they routinely churn out mindless **** and they get gongs for it. Maybe you can leverage you already decent following into something that makes a bit more noise šŸ™‚

  4. Hi David, I am trying my nerdie best to understand how to create the data set in order to do the visualisation. Any chance of posting a sample of your data set or how you created it. Thank you, Brigitte

    1. G’day Brigitte, This is where SNAPP used to be good, but it’s no longer supported/working.

      With the above, I’m at a slight advantage in that I have access to the database that contained the student blog posts, and have the ability to write programs that generate data in the form required. I used Gephi above so it wanted data in this form

      Define the different nodes (different blogs)

      nodedef>name VARCHAR,label VARCHAR

      Define the connections between nodes

      edgedef>node1 VARCHAR,node2 VARCHAR, weight DOUBLE

      How you might be able to do this will depend on the source of the data (which LMS – Moodle?) and which software you might use to generate the visualisations. Increasingly, most LMS have some sort of service that might do this.

      Does that help at all?


Leave a Reply

Your email address will not be published. Required fields are marked *