The following describes the process and results of using Gephi to generate some visualisations of the inter-connections between the blogs of students in the course I’m teaching. The process is heavily informed by the work of Tony Hirst.
The result
The following represents the student blogs that have connected with each other. Size of the node is weighted towards the number of connections coming in. You can see a couple in the bottom right hand corner who have linked to themselves. The figure also suggests that there are 6 or 7 communities within these.
There are actually 300+ blogs in the data set. However, a significant number of those are not yet connected to another blog. Hence the small number in the above image. Two possible explanations for this
- Many of the students haven’t yet taken seriously the need to connect with each other.
- There’s a bug in the code producing the file.
Will need to follow up on this. Will also need to spend a bit more time exploring what Gephi is able to do. Not to mention exploring why 0.8.2 of Gephi wouldn’t run for me.
The process
The process essentially seems to be
- Generate a data file summarising the network you want to visualise.
- Manipulate that data file in Gephi.
The rest contains a bit of a working diary of implementing the above two steps.
Generating the file
The format of the “GDF” file used in Tony’s post appears to be
- A text file.
- Two main sections
- Define some user/node information.
The format is shown below. The key seems to be the “name” which is a unique identified used in the next section.
[code lang=”bash”]
nodedef> name VARCHAR,label VARCHAR, totFriends INT,totFollowers INT, location VARCHAR, description VARCHAR
67332054,jimhillwrites,105,282,"Herne Hill, London","WIRED UK Product Editor."
[/code] - Define the connections
Essentially a long list of id pairs representing a user and their friends. I’m assuming this means the use connects to the friend.
[code lang=”bash”]
edgedef> user VARCHAR,friend VARCHAR
67332054,137703483
[/code]
- Define some user/node information.
More on the GDF format available here. It mentions a minimal GDF file and also mentions that the edge thickness can be specified. This seems useful for this experiment i.e. edge thickness == number of links from one student blog to another.
So the file format I’ll use will come straight from the minimal spec, i.e.
[code lang=”bash”]
nodedef>name VARCHAR,label VARCHAR
s1,Site number 1
s2,Site number 2
s3,Site number 3
edgedef>node1 VARCHAR,node2 VARCHAR, weight DOUBLE
s1,s2,1.2341
s2,s3,0.453
s3,s2, 2.34
s3,s1, 0.871
[/code]
Thinking I’ll use the “hostname” for the student’s blog as the “site number”. Maybe just the first four letters of it. Just to keep student anonymity.
Questions for later
- Can I modify the file format to include with each “friend” connection a date?
The idea is that the date will represent when the post was made. Using this I might be able to generate a visualisation of the connections over time.
- Is there value in also mapping the connections to the other links within the students’ posts?
Could provide some idea of what they are linking to and help identify any interesting clusters.
The data
The database I’m maintaining contains
- All the URLs for the students’ blogs.
- All the posts to the students’ blogs.
I already have a script that is extracting links from each of the student blogs, I just need to modify this to count the number of connections between student blogs…..a bit easier than I thought it might be.
Now a quick script to generate the GDF file. Done.
Using Gephi
This is where I’m helping Tony Hirst’s instructions work with a minimum of hassle.
That’s a bugger. Empty menus on Gephi. It’s not working. Is it wrong of me to suspect Java issues?
Going back to older version. That’s worked, but I haven’t installed it yet into Applications. 0.8.2 seemed to have worked previously as well. Get this done and figure it out later.
File opened. We have a network.
Tony Hirst then removes the unconnected nodes. I’d like to leave them in as it will illustrate the point that the students need to connect with others.
The Modularity algorithm doesn’t seem to be working as I’d expect (on my very quick read). It’s finding 200+ communities. Perhaps that is to be expected given that most blogs are connected by one or two links and that I haven’t removed the unconnected nodes. Yes works much better if you do that.
A bit more playing produces the final result above.
elketeaches
Is this what you do on your weekends? š very cool
cj13
I’m impressed. Love the learning in public style. The best of it is the next set of questions. I’m re-reading Weinberger’s too big to know – one of his many interesting points is that links “filter forward”. Thhe other random thought. These maps always look neuron like. So what do they look like over time? An animation of this might be interesting.
cj13
And, purely in your interests – I hope you’ve got a few drafts of papers out of all of this. Quality and significance of the ideas so far above so much of the dross you see in ed tech research/scholarship.
David Jones
Thanks Chris. That’s the aim. But there’s always the threat of more interesting stuff to do and the process of transforming this into something that will get accepted tends to be boring.
cj13
I agree. Long form stuff can be – but the trick might be to find places which are more amenable to different modes- blog-like i.e. publish then get reviewed kind of thing. The sad thing for me is that I look at some acas and they routinely churn out mindless **** and they get gongs for it. Maybe you can leverage you already decent following into something that makes a bit more noise š
beebrigitte
Hi David, I am trying my nerdie best to understand how to create the data set in order to do the visualisation. Any chance of posting a sample of your data set or how you created it. Thank you, Brigitte
David Jones
G’day Brigitte, This is where SNAPP used to be good, but it’s no longer supported/working.
With the above, I’m at a slight advantage in that I have access to the database that contained the student blog posts, and have the ability to write programs that generate data in the form required. I used Gephi above so it wanted data in this form
Define the different nodes (different blogs)
Define the connections between nodes
How you might be able to do this will depend on the source of the data (which LMS – Moodle?) and which software you might use to generate the visualisations. Increasingly, most LMS have some sort of service that might do this.
Does that help at all?
David.