A friend is in the final throes of her PhD and wants to visualise some of her qualitative findings using a network diagram. I’ve skirted around network visualisation without ever really doing anything but wanted to learn more. Following documents and reflects on my explorations and experiments trying to help out. The main focus being on figuring out how to produce a visualisation that meets requirements.
Previous work has led to the use of Gephi. In part because my current institution provides access to Turbo.net which provides Gephi as a container to install. Minimising the the dependency “hell” that can be the Java requirements to run Gephi.
The hardest part of all this was my old head building a conceptual model that bridged the data analysis outputs and requirements of my friend, network analysis/visualisation concepts, and Gephi’s (and other software) capabilities and operations. The biggest help in all this was various freely available tutorials (e.g. this list) developed by Gephi users.
My current understanding of the requirements, includes:
- Data is coming out in spreadsheets that need to be visualised.
- Data contains 19 categories (i.e. nodes) which might be mentioned together (i.e. links)
- The number of times a node is mentioned is available and should be part of the visualisation.
- The number of times two nodes are mentioned together is also available and should be part of the visualisation
- Those categories can also belong to different groups and these groups need to be visualised.
- There might be data from individual interviews and other combinations (i.e. multiple data sets to be visualised).
Getting the data into Gephi
Data from the analysis phase comes in a spreadsheet with the 19 categories in a 2×2 matrix. The cells represent either
- the number of times two different categories were mentioned together, or
- the number of times a category was mentioned in total.
- Nodes; and
Establish the node label, it’s ID and provide a range of additional attributes.
Specify if and how two nodes (source and target) are connected, the type of edge (directed or undirected) and optionally the weight.
- Id – manually assigned.
- Label – from the spreadsheet with the CoI labels combined.
- Mentions – taken from the cell where the row/col matches the node
- Group – whether the node belongs to context; technology-enabled design; or learning and teaching environment.
- Source & Target – the Ids for the nodes the edge connects.
- Weight – how many times nodes (categories) were co-mentioned.
- Type – undirected.
Importing and first look
With the two CSV files created, it’s a straight forward process to import and visualise. The test data gives the following visualisation out of the box. It needs some work.
Visualising node and edge weights
The first visualisation has many limitations, next step is to address the following:
- Node size isn’t being shown.
- Node labels aren’t being shown.
- It’s all black and white.
- Is the layout as good as it can get?
This tutorial provides an in context explanation of how to make these fixes.
Showing node labels is a simple switch.
Scaling node size uses the appearance dialog and allows changes to colour, node size etc based on various factors, including node attributes. In this case, select mentions.
Appearance also allows colouring of nodes based on node attributes. In this case, group will be used.
Random experimentation with layouts reveal the Fruchterman Reingold layout doing a reasonable first pass. All that combined gives the following image.
As illustrated in the previous diagram, colouring nodes based on group membership provides some representation that helps see this relationship. However, relying on colour for a this might be problematic.
Is there a layout/method that co-locates nodes of the same group?
This blog post illustrates something similar by grouping according to organisational membership. Though it appears this is done manually, perhaps with a bit of post Gephi touching up. Time to explore different layouts in more detail.
Start with this tutorial, which begins with the installation of some new layout plugins (and shows some important Gephi interface actions e.g. dragging).
The circular layout generates the following image. Which may be getting close. Would be good if the labels didn’t overlap, but that’s for another day.
It appears that Gephi (perhaps combined with some post-visualisation manual image editing) can be used.
The next major question will be how to automate/manage the process for converting analysis data into visualisations that are then integrated into publications.
“Lantana camara plant NC3” flickr photo by Macleay Grass Man https://flickr.com/photos/macleaygrassman/48735662157 shared under a Creative Commons (BY) license