The external validity problem

Via a very roundabout method I came across an editorial from the journal “Computers in the Schools” (Maddux and Johnson, 2012) titled “External Validity and Research in Information Technology in Education”.

The problem?

The problem is that “very few educators would argue that information technology has succeeded in bringing about a paradigm shift in instruction” (Maddux and Johnson, 2012, p. 249). Which links to the observation by Prof Mark Brown in the image below and my perception of university e-learning.

Is elearning like teenage sex?

The cause?

Maddux and Johnson (2012) argue that while there are many complex reasons for this problem, one of the major contributors is limited external validity. Defined as “the extent to which results of a study or a given program development project can be assumed to apply to other people in other places and at other times” (p. 250). The suggest that research – especially research around ICTs in education – tend to occur in situations that are not representative of the “average” setting.

Master teachers

They go onto argue

Specifically, we believe that we continue to design, implement, and test programs and methods that work well when used by master teachers in classrooms and schools where personnel are strongly committed to the success of these programs and methods. Then, practitioners put these programs and methods into place for use by average teachers in settings in which not everyone is highly committed to success and where some individuals or groups are apathetic toward information technology or may even be in opposition to the use of technology in education. (Maddux and Johnson, 2012)

This links to the distinction made by Goodyear (2009) between the “long arc” and “short arc” approaches in terms of national funded learning and teaching projects (a close cousin to ICT in education research). Most of these projects focus on the “long arc” and imagine the (master) teacher as someone with the time and insight to pro-actively plan and design their next course offering. As opposed to the more typical “short arc” approach which is more reactive or just-in-time.

Leaders and managers as master teachers

This assumption of master teachers or the long arc gets taken across into institutional learning and teaching because the leaders and managers (many central L&T and e-learning folk fall into this category as well) of such operations will tend to see themselves as “master teachers”. They will tend to be experienced and see themselves as good teachers and many of them will be. They will tend to be very different from their colleagues, what works for them will not work for their colleagues.

The assumption of external validity

Add to this is the belief amongst some of them that there must be external validity. That their teaching model – or that of someone else – is applicable across the board. Subsequently, their role as a leader or manager becomes the rolling out of that model. i.e. they are becoming like the researchers mentioned by Maddux and Johnson (2012)

Those who are charged with delivering services in grant-supported projects are almost always advocates and experts in the use of the kinds of programs they are using. Such individuals have a tendency to work tirelessly toward proving the efficacy of their programs. (pp. 250-251)

The blame the teacher approach

This discrepancy in outlook can result in the “blame the teacher” approach. Rather than value the difference that means their pet approach doesn’t work, most often leaders and managers will blame the teacher. It failed because they didn’t try hard enough. Even Maddux & Johnson (2012) lean toward this mistake when the characterise the “average teachers” above as not being committed to success, apathetic and even in opposition.

The “blame the teacher” approach goes back at least as far as Pressey in the 1920s and his teaching machines.

Built-in assumptions of external validity

The link between “blame the teacher”, external validity and educational technology extends (or is perhaps made worse) by the types of enterprise systems being adopted by universities (e.g. the LMS, the eportfolio) and the type of assumptions universities use when rolling these systems out (e.g. it’s more efficient if everyone uses it – like this principle “Decisions are made to provide maximum benefit to the enterprise as a whole” number 1 in an Enterprise Architecture policy). i.e. the LMS evaluation process identified this as the best LMS, so it will work well for everyone. It’s the same thinking that underpins consistent website standards.

Which leads to Kaplan’s law of instrument

Earlier this week when commenting on the Thurn/Udacity/MOOC issue, George Veletsianos mentions the “Law of Instrument” – “if all you have is a hammer, everythign looks like a nail”.

Veletsianos writes

If educational technology companies (and Centers for Teaching and Learning) are eager to improve education, rather than searching for problems to apply their solutions, they should focus on identifying problems and designing solutions to those problems.

If a University has installed Moodle (insert your favourite tool) then it is only efficient and rational if the people employed to support learning and teaching rely heavily on Moodle (or other tool) as the “solution” search for problems.

Is external validity a good idea?

While external validity may be appropriate for a certain type of research project. Is it an appropriate concern for institutional e-learning. Veletsianos’ suggestion that the focus be on “identifying problems and designing solutions” tends to suggest a move away from external validity and a focus on context specific requirements.


Maddux, C. D., & Johnson, D. L. (2012). External Validity and Research in Information Technology in Education. Computers in the Schools, 29(3), 249–252. doi:10.1080/07380569.2012.703605

Examining diffusion and sustainability of e-learning strategies thorugh weblog data

The following is a summary and some thoughts on Lam et al (2010). It’s a paper from the same authors/research from which I summarised an earlier paper.

The abstract for Lam et al (2010) is

The study focuses on ‘horizontal’ and ‘vertical ‘adoption of e-learning strategies at The Chinese University of Hong Kong as revealed through computer log records in the centrally supported learning management systems. Horizontal diffusion refers to whether e-learning has spread to influence the practice of more teachers and students. in vertical diffusion, the authors examined whether or not teachers tend to adopt more varied online learning activities in successive years. The overall findings are that, while adoption of simple strategies is increasing. There is little evidence of horizontal and vertical diffusion of more complex strategies. Indeed, the use of some of the more complex strategies, which may relate to greater potential learning benefits, decreased. Results have led to discussions about new focuses and strategies/or our institutional eLearning Service

My thoughts

The same data is used to drive another examination of the use of the LMS at a Hong Kong University. Data suggests that e-learning usage is fairly typical. Emphasis on content distribution. But more surprisingly is that usage is dropping. Which suggests e-learning is in more trouble. Would be interesting to see how its evolved since then.

Using just two years worth of data is limiting.

There is less contextual information provided to explain some of these trends than in the prior paper.

The recommendations for e-learning support can’t be supported from the log analysis and seem to rely more on existing conceptions held by the authors. Log analysis couldn’t give you these insights, would have to use other means or use log analysis to measure changes after interventions.

The recommendations are all fairly typical change management recommendations, though the “comfort zone” recommendation is a little unusual.


Adoption of innovations is hard. Links to the need for teachers’ to undergo a conceptual change required as finding new ways of working. Draws on Lewin (1952) and then some educational researchers to have a 3 stage conceptual change process

  1. Diagnosing existing conceptual frameworks and revealing them
  2. A period of conflict which creates dissatifaction with existing conceptions
  3. A reforming/reconstruction phase of the new conceptual framework.

Raises the question of how and when these 3 stages aren’t followed or are corrupted. Or is that simply where the new conceptual framework is the old slightly modified to fit the new ways of working under old concepts. e.g. placing lecture slides and tutorial sheets into the LMS.

Brings in the J curve to explain that “things often get worse before they get better because of the expenses and challenges that occur early on in the innovation cycle”.

Not sure how this works as an introduction, but good to see this idea of that things get worse early in the innovation cycle entering into the discussion of the LMS. Will be interesting to see how the authors build on this.

I wonder how this plays in the context of regular upgrades of Moodle every year. Is there sufficient change between versions to lead to a continual J curve? Is broader change within a university setting enough to contribute to this continual J curve?

Does the adoption of a new LMS really class as innovation? Conceptual change perhaps, but innovation?

Adoption of e-learning innovations

Talks about two possible ways of examining adoption of e-learning innovations

  1. Horizontal – see as diffusion of the innovation.

    Important if e-learning will have significant impact. Mentions own research showing that innovative teachers don’t effectively disseminate their practice (Why should they? Do they have to be innovative and disseminators? Also ignores the very personal/contextual nature of innovation. What works for innovators probably won’t work for others. Even other innovators.).

    Mentions reluctance of staff to spend the time required on e-learning and the removal of senior professors from innovative work of more junior staff as barriers.

    Now picks up the idea of others perhaps not likely to pick up innovative practice of others. But mentions UK research about lack of time, confidence, competence and comfort for new skills. What about the inherent difference between people and where they are?

    Brings up one of the authors prior research reporting on e-learning developments being behind schedule so that evaluation misses out. Of 26 projects, less than half had evaluations completed as planned.

    Mentions that evaluations need to be specific to courses which limits their usefulness in aggregation of findings to a higher level.

  2. Vertical – linked to the concept of sustainability where functionality is used in subsequent offerings of a course to benefit (benefit may be a strong word) multiple cohorts.

    The costs of moving online can provide benefit through reuse and the smoothing out of procedures. The idea of “threshold obstacles”.

    Sustainability focuses on the reuse of new teaching and learning designs and strategies. Mentions no change in approach given an LMS.

Factors impinging on diffusion and sustainability

Starts with Rogers 5 characteristics of an innovation. But again – like so many others – fails to mention the other parts of of Rogers theory though rate of adoption gets a mention. Of course doesn’t mention critiques of Rogers and diffusion.

But argues that the adoption of different e-learning strategies can be explained by difference in the above characteristics.

What this misses, however, is that the innovation characteristics are meant to be personal perceptions. Suggesting that someone from a constructivst background (e.g. a faculty of education instructor) would see great advantage in a discussion forum or the use of blogs and hence should be more likely to adopt that practice.

Mentions the over focus of LMS use as a content distribution mechanism.

Proposes the idea of a “mutual comfort zone” for all stakeholders (teachers, students, technical and pedagogical support staff) as a requirement for an e-learnign project to be successful and thus reused. The idea is that this zone is quite small and that explains why you don’t see many complex, successful and sustainable e-learning projects.

Ahh, now goes onto to cite one of the author’s earlier research about contextual factors and a model of drivers that influence the growth of blended learning and identifies the following of most relevance as

  • senior management committment

    I always wonder how much of this is “compliance”. i.e if senior management think something is important, is it possible for it to be a failure? And other logical flaws.

  • allocatioon of time
  • positive cost-benefit decision by teachers of a pay off from investment

LMS logs

Talks about logs and that they don’t reveal everything. Some similarities with similar section in prior paper.

No totally online courses in this work.


  • two academic years 2007/2008 and 2008/2009.
  • Term 1 and 2 data included
  • Course was unit of analysis, if at least one session had an active website
  • Both under and post graduate included
  • Excludes classes with less than 10 students
  • One student had to access the course site during term
  • Same gain to other features
  • Only online services on central servers counted



Compared and contrasted active course sites between years

  • 48.8% to 53.3% with active websites
  • Independent t-tests reveal statistically significant increases
  • Active content – 97.3 down to 96.6
  • Active discussion – 20.8 to 18.1 (numbers are hard to read on scanned document, so may be slightly out).
  • Active assignment – 21 to 19
  • Active quiz – 39? to 8



  1. Identify courses run in two consecutive years
  2. If a site in first year, was their a site in 2nd year?
  3. If yes, compare e-learning strategies


  • 790 courses with a website in both
  • 82 courses ceased using the LMS in 2nd year.
  • 158 courses started in 2nd year.
  • 31.1% of the 790 courses had fewer active features in the second year – suggested teachers had stopped using some strategies.
  • Chi square tests reveal stat significance of p less than/equal to 0.001.
  • Mostly it was discussion forums that were dropped.


Talks about multiple J curves to explain. With content, most are comfortable, but others are “still near the bottom of their J-curves”.

Suggests that e-learning support should focus on the areas where there is difficulty.

With the factors that influence adoption in mind to suggest

  • Acknowledge the need for appropriate motivation for teachers.

    Online can be both draining (for most) but also rewarding.

  • Support the departmental context.

    A collection of approaches – promotion, technical staff, peer groups, money.

  • Use an evidence-based and pragmatic approach

    If you hit the comfort zone, then features are more likely to be diffused and sustained. And a willingness to evaluate and respond to the evaluation No real mention made here of the institution perhaps having to evaluate it’s move to e-learning and change what it’s doing

  • Support projects that are most likely to succeed

    This should be the focus at the institutional level. Characteristics of such projects are identified as

    • Start with the intention to sustain and diffuse
    • Projects headed by the “right” people
    • In the “right” context (collegial, supportive)
    • In the comfort zone
    • Commitment to evaluation


Lam, P., Lo, J., Yeung, A., & McNaught, C. (2010). Examining diffusion and sustainability of e-learning strategies through weblog data. International Journal of E-Adoption, 2(3), 39–52.

Approaches for literature analysis

One of the on-going research projects we have underway (really just starting up) is an analysis of the learning analytics literature. The following is an ad hoc record of a search into the literature around different approaches to literature analysis. The aim is to further inform the work. Essentially a summary of some readings.

Origins in Information Systems

I’m from the IS discipline originally so I’m aware of some of this type of work there.

Arnott and Pervan (2005) analysed the Decision-Support Systems literature. Their approach was the content analysis of 1000+ papers by reading and applying a data collection protocol. Two authors and a research assistant using the same protocol. Results entered into SPSS.

As the use of SPSS suggests, the “Article Coding Protocol” was a series of questions with numeric answers that characterised an article. Questions covered topics such as

  • Research type including research stage, epistemology, article type.
  • DSS factors.
  • Judgement and decision making factors.

Arnott and Pervan (2005) use the phrase “literature analysis” for their work. But mention other terms from the IS literature ‘review and assessment of research’ (Robey, Boudreau and Rose, 2000), ‘literature review and analysis’ (Alavi and Leidner, 2001), ‘survey’ (Malone and Crowston, 1994). But none of those provide any interesting pointers to further literature.

Content analysis

Hsieh & Shannon (2005) look at qualitative analysis, but start with the development of content analysis. Initially used as a quantitative method “with text data coded into explicit categories and then described using statistics” (p. 1278).

They define qualitative content analysis as

Qualitative content analysis goes beyond merely counting words to examining language intensely for the purpose of classifying large amounts of text into an efficient number of categories that represent similar meanings(Weber, 1990). (p. 1278)

And another nice quote “The goal of content analysis is “to provide knowledge and understanding of the phenomenon under study” (Downe-Wamboldt, 1992, p. 314)” leading to their definition

qualitative content analysis is defined as a research method for the subjective interpretation of the content of text data through the systematic classification process of coding and identifying themes or patterns.

They identify three distinct approaches – conventional, directed or summative – which differ on coding schemes, origins of codes, and threats to trustworthiness.

Directed content anaylsis seems relevant as it draws on existing theory for the initial coding scheme and its goal is “to validate or extend conceptually a theoretical framework or theory”. Identifies limitations and suggests strategies.

Seven classic steps of the analytical proecss underpinning all qualitative content analysis

  1. Formulating the research questions to be answered;
  2. Selecting the sampel to be analysed;
  3. Defining the categories to be applied;
  4. Outlining the coding process and the coder training;
  5. Implementing the coding process;
    “The basic coding process in content analysis is to organize large quantities of text into much fewer content categories” (p. 1285).
  6. Determining trustworthiness;

Julien (2008, n.p.) defines content analysis as

the intellectual process of categorizing qualitative textual data into clusters of similar entities, or conceptual categories, to identify consistent patterns and relationships between variables or themes.

Apparently a method “independent of theoretical perspective or framework” but originates as a quantitative method. Julien (2008) suggests quantitative helps in answering “what” questions while qualitative content analysis helps in answering “why” questions and analysing perceptions.

Multiple coders a common method to improve trustworthiness. 60% agreement between coders is apparently considered acceptable (Julien, 2008).

Krippendorff (2010) suggests content analysis is “a scientific tool”, “can provide new kinds of understanding of social phenomena or inform decisions or pertinent actions” it is teachable.

Krippendorff (2010) also suggests that content analysis uses abduction rather than induction/deduction as used by observation methods. And three criteria for judging results

  1. reliability – can the process be replicated. Human coding is the most unreliable aspect of the process. There are agreement coefficients e.g Scott’s and Krippe-dorff’s.
  2. plausibility – of the path taken from texts to results. Apparently a dig at computer coders who think they’ve solved reliability. Can’t hide behind obscure algorithms.
  3. validity – various forms outlined.


Arnott, D., & Pervan, G. (2005). A critical analysis of decision support systems research. Journal of Information Technology, 20(2), 67–87. doi:10.1057/palgrave.jit.2000035

Hsieh, H.-F., & Shannon, S. E. (2005). Three approaches to qualitative content analysis. Qualitative health research, 15(9), 1277–88. doi:10.1177/1049732305276687

Julien, H. (2008). Content Analysis. In L. M. Given (Ed.), The SAGE Encyclopedia of Qualitative Research Methods Content Analysis (pp. 121–123).

Krippendorff, K. (2010). Encyclopedia of Research Design Content Analysis. In N. J. Salkind (Ed.), Encyclopedia of Research Design (pp. 234–239). Thousand Oaks, CA: Sage Publications.

Evaluations of online learning activities based on LMS logs

The following is a summary and some thoughts on Lam et al (2012). The abstract from the chapter is

Effective record-keeping, and extraction and interpretation of activity logs recorded in learning management systems (LMS), can reveal valuable iriformation to facilitate eLearning design, development and support. In universities with centralized Web-based teaching and learning systems, monitoring the logs can be accomplished because most LMS have inbuilt mechanisms to track and record a certain amount of information about online activities. Starting in 2006, we began to examine the logs of eLearning activities in LMS maintained centrally in our University (The Chinese University of Hong Kong) in order to provide a relatively easy method for the evaluation of the richness of eLearning resources and interactions. In this chapter; we. 1) explain how the system works; 2) use empirical evidence recorded from 2007 to 2010 to show how the data can be analyzed; and 3) discuss how the more detailed understanding of online activities have informed decisions in our University.

It’s a chapter in an IGI book, which means gaining access was not easy.


Well it is an example of longitudinal examination of LMS logs from one university – 2007, 2008 and 2009. Detail some of the considerations in doing a cross-LMS comparison. Find some interesting outcomes (e.g. use of LMS functionality in courses drop significantly as time went by) but without further research unclear what the reasons are.

It remains a surprise to me that the two Universities I’m most familiar with, don’t have something like this in place already to guide what they are doing in support.


Starts with a definition of an LMS.

Moves onto discussion of log analysis. Including some older references.

Difficulties of doing institutional analysis as web-log tracking tools aimed at the individual teacher (and we know how effective those are). Made even more difficult with the version of WebCT they were using. Leading to a need to investigate the database and develop their own software.

Their focus more institutional, hence not a close monitoring of student activity, but does include both LMS.

Earlier work reported on data interpretation, focus here on automation of interpretation and reporting. Earlier work suggests log data can provide information on

  1. Popularity – yes/no indication per course whether any eLearning activities are recorded in the logs.
  2. Nature of functions/strategies – what facilities are used.
  3. Engagement of teachers and students – how involved folk are in the activities.

Frame these three as steps. 1) popularity indicates whether there is a course website, 2) nature reveals what is there, and 3) engagement shows how it is used/engaged with.

Claimed that the data to some extent fits the requirements for a naturalistic research paradigm. Recognises the need for comprehensive evaluation studies to consider other forms of evidence and sources.

Lists benefits of the log data approach

  • relative ease of access;
  • non-intrusive
  • Repeated measures enable longitudinal comparisons
  • with automation can enable an institutional system .


  • monitors only use of the LMS
  • bias on quantity rather than quality
  • the activities are fairly abstract (e.g. discussion forum, content file etc) and not institutional specific (e.g. is the content file a course outline?)
  • The picture from logs is partial

How the system works

Starts with the measures used to refine and standardize the data. Especially due to the two LMS.

  • Not al websites on the LMS are active (made available to students).
  • Other considerations for “active” other strategies – had at least one student
    • accessed any forum.
    • attempted any quiz.
    • submitted an assignment.
    • download a content file.
  • Only included classes with at least 10 students or more.

Describes the actual system which appears to have been a web-based report/query system. Choose various variables and generate comparisons against popularity, nature and engagement. Levels of analysis include

  • institution-wide overview of eLearnign activities
  • differentiate faculty or department level eLearnign practices
  • popularity of the different LMS
  • How students and teachers are engaged in various activities

Some sample findings

These samples are provided to be illustrative. Usually at faculty level and being reported across the 3 years (2007-2009)

  • Popularity of LMS

    More courses are using the web. Both LMS percentage increased. Moodle increased quickly from introduction in 2007.

  • Comparison of two functions

    Main use was for content delivery.

    • 90% of active websites contained content (reducing from 96.6% (1432) in 2007 to 91.9% (1891) in 2009).
    • 23.8% (2007), 7.1% (2008), 6.3% (2009) had active quizzes.
    • 21.5%, 20.9% and 14.8% had discussions.
    • 27.6, 24 and 14.6% used online assignment submission.

    No discussion/explanation about why the reduction in percentage.

  • Engagement in four areas

    90% of students access sites. 30% accessed the LMS more than 20 times during the year. Aside: an incredibly low figure, perhaps the on-campus only factor?

    Only 1995 students out of 15,000 wrote anyting in a forum. Most only wrote one or two. Same with assignment submission.

    Different with quizzes. 4742 used quizzes, one-third attempt 11-20 quizzes. 10% made 20 or more attempts.

  • Use of LMS in four faculties

    Faculties used the LMS differently. Experience not the same across disciplines.

    All faculties increased course websites. But overall percentage of courses with websites varied between faculties.

  • Use of three functions in four faculties

    All used content. Different use of other functions was observed. One faculty had 50% of courses with online forums in 2007 dropping to 5% in 2009. Similar observations made.

    Some explanations given arising from personal communications. e.g. new faculty started strong with e-learning but as lots of new staff arrived and teaching loads increased e-learning suffered.

Refining our elearning strategies

Makes claims about the value of this type of analysis

Some more general discussion. e.g. the content focus with some references about the limitations of technologies to change learning and teaching.

Highlights the content centric nature of the LMS Aside: I wonder how that gels with the Moodle socio-constructivist design philosophy?

Bringing up the modern educational trend – learner centered – teacher as facilitator etc.

findings thus suggest that institutional eLeam- ing support should not merely focus on having a web presence in courses or using the Web for courseware delivery. Attention also needs to be on the diffusion and sustained use of interactive online learning activities

Raises concerns about sustainability of e-learning, given the descrease in feature usage observed. Uncertain about what the cause is. Suggest it might be an LMS problem given observations of teachers using Web 2.0 strategies. Raise some questions about percieved usefulness etc.

MIght also be the staff rejecting online. Does mention that the university is face-to-face.

Mentions difficulties with engaging staff and the institution reviewing its support measures.


Lam, P., Lo, J., Lee, J., & Mcnaught, C. (2012). Evaluations of Online Learning Activities Based on LMS Logs. In R. Babo & A. Azevedo (Eds.), Higher Education Institutions and Learning Management Systems: Adoption and Standardization (pp. 75–93). Hershey, PA: IGI Global.

On the limitations of learning design for improving learning and teaching

A quick followup to some comments/replies on @marksmithers post “Because academic freedom does not include the freedom to create a poor learning experience”. In particular, on Mark’s suggestion

I prefer a model (incidentally supported by Clayton Christensen’s thoughts on adapting to disruptive innovation) whereby a semi autonomous organisation with responsibility to provide course development is tasked with providing learning design support (amongst other things). Course development is prioritised and scheduled over the five year life of most programs.

While there are some things to like with this suggestion, I think there are some limitations.

Ignores “maintenance”

In a comment on Mark’s post @KateMFD mentions some concern about learning design. I’d like to expand it a bit, it’s a hobby horse.

I often quote Glass (2001) on software engineering and the suggestion that when designing software systems between 40-80% of that cost will be on maintenance. i.e. making changes to the software while its being used. The trouble is that most of software engineering teaching and almost the entire focus of organisations in purchasing software is in the selection or design of software. They tend to ignore what is likely to be the larger costs involved in keeping the software in use. This causes all sorts of problems.

Increasingly, I believe a similar problem exists with university approaches to learning and teaching. All the L&T support resources (what little there is) are focused on design and bugger all on the actual act of learning and teaching. This has all sorts of negative ramifications. Perhaps the largest of which is that central L&T have almost no idea about what happens during learning and teaching which impacts decision making.

To some extent some of this connects with Goodyear’s (2009) idea of “long arc” and “short arc” approaches. He suggests that the OLT is well set up for the “long arc” where a teacher is imagined as someone with time to think about the redesign of next year’s course. As opposed to imagining the teacher as more time-pressed and somewhat more reactive. The focus on learning design relies on the “long arc” view which I think is unrealistic in the current Australian Higher Education context.

Related to this is the reframing of design for learning from Goodyear and Dimitriatdis (2013) and in particular the idea that the idea of design needs to be extended to include

  1. design for configuration – what actors do to customise/modify the design to suit specific needs.
  2. design for orchestration – provide support for the teacher’s work at learn time.
  3. design for reflection – ensure that actionable data is gathered at learn time to inform system evaluation
  4. design for redesign – making it easier to modify.

Which to me means recognising the need to move beyond just design into maintenance.

In particular, this links to the idea of “orchestration” which is getting some traction. Roschelle et al (2013, p. 523) offer this definition

Orchestration is an approach to Technology Enhanced Learning that emphasizes attention to the challenges of classroom use of technology, with a particular focus on supporting teachers’ roles.

Ignores the distributed nature of knowledge

Effective learning and teaching with technology requires the right knowledge. Almost all of the attempts to improve the quality of learning at universities have relied on the idea that
this knowledge must reside in someone’s head. For example, we’ll get better learning and teaching by forcing academic staff to have formal qualifications in learning and teaching. Or, in terms of learning design, we’ll get better learning and teaching by requiring academic staff to work with a learning designer who has the knowledge. Of course there are problems with both of these.

Going back to Goodyear (2009, p. 6)

tools and resources that support educational design activity can be carriers of good ideas: research-based evidence and the fruits of successful teaching experience can be embodied in the resources that teachers use at design time

The idea is that the knowledge doesn’t have to live in the heads of people, it can be distributed. After all, this is one of the fundamental principles of connectivism.

Beyond simply having knowledge embedded into the tools we use. The tools, processes and policies of institutional learning and teaching could be re-designed by drawing on some of the principles of connectivism and other social learning theories. For example, to make it easier for me to see who at my institution has used LMS feature X and how they used it. Make it easy for staff to approach others who have tried something previously. Dave Snowden’s 7 principles of knowledge management are applicable here.

I have some hypotheses why we don’t see more of this idea, including

  1. Changing the current tools (the LMS) is really hard, both technically and organisationally.
  2. Learning designers typically don’t have the knowledge to see how these changes could be made.
  3. Information technology people typically don’t have the pedagogical knowledge.
  4. Due to the “ignorance of maintenance” and the general pre-dominance of the techno-rational approach to problem solving, none of them realise that these changes should be made.

Ignores the broader higher ed environment

i.e. academics aren’t promoted on the quality of their teaching. It’s on their research that this will happen.

Goodyear (2009, pp 12-13) again

the sustainability of established teaching practices is in doubt because (1) more students, with increasingly diverse needs, are entering higher education; (2) we need to improve the quality of the education we provide; the social, environmental, political and economic challenges of the 21st Century will place extraordinary demands on our graduates; (3) the pace of technological change is accelerating; technology is not a solved problem and it is not going to go away; (4) the demands on university teachers are intensifying; good teachers are burning out; the workforce is ageing fast; it will get harder to recruit and retain good teachers as global competition for talent heats up

Ignores task corruption

The solution to the reluctance of academics to engage in quality learning and teaching is typically standards, policy and requirements. This is related to Mark’s suggestion that

Course development is prioritised and scheduled over the five year life of most programs

This can work, but it can also cause task corruption as Dilbert illustrates.

Ignores the university as a complex systems

For me, all of this is summed up with the ignorance of the nature of complex systems. As we argued (Beer et al, 2012), Universities are complex systems and

Complex systems are not causal, patterns are emergent and there exists no single correct solution. Managing
complex systems requires an evolutionary approach as small changes can have disproportionate and non-linear consequences

Universities are currently being managed as simple systems. This will never work.


Beer, C., Jones, D., & Clark, D. (2012). Analytics and complexity : Learning and leading for the future. In M. Brown, M. Hartnett, & T. Stewart (Eds.), Future Challenges, Sustainable Futures. Proceedings of ascilite Wellington 2012 (pp. 78–87). Wellington, NZ.

Goodyear, P. (2009). Teaching, technology and educational design: The architecture of productive learning environments (pp. 1–37). Sydney.

Goodyear, P., & Dimitriadis, Y. (2013). In medias res: reframing design for learning. Research in Learning Technology, 21, 1–13.

Exploring current institutional e-learning usage

The following is a summary of an exploration of the recent literature analysing University LMS usage and some thinking about further research.

In summary, thinking there’s some interesting work to be done using analysis of LMS databases to analyse the evolution (or not) of student and staff usage of the LMS over a long period of time. Use a range of different indicators and supplement this with participant observation, surveys, interviews etc.


Way back in 2009 we wrote

Extant literature illustrates that LMS system logs, along with other IT systems data, can be used to inform decision-making. It also suggests that very few institutions are using this data to inform their decisions…….When it comes to Learning Management Systems (LMS) within higher education it appears to be a question of everyone having one, but not really knowing what is going on. (Beer et al, 2009, p. 60)

That paper was the first published work from the Indicators Project which was designed to increase awareness of what is being done with institutional LMS and consequently help address questions such as what can and does influence the quantity and quality of LMS usage by students and staff

Along with many others we thought that “LMS system logs, along with other IT systems data, can be used to inform decision-making” (Beer et al, 2009, p. 60). Since then I’ve seen very little evidence of institutions making use of this data to inform decision making. At the same time the literature has suggested that it is possible. Macfadyen and Dawson (2012, p. 149)

Learning analytics offers higher education valuable insights that can inform strategic decision-making regarding resource allocation for educational excellence.

but even they encountered the socio-technical difficulties that can get in the way of this.

It’s now four years later, I’m at a different university and there remains little evident use of learning analytics to understand what is going on with the LMS and why. It appears time to revisit this work, see what others have done in the meantime and think about what we can do to contribute further to this research. The following outlines an initial exploration and thinking about how and what we might do.

Findings and ideas

In Beer et al (2009) we described the project as

intended to extend prior work and investigate how insights from this data can be identified, distributed and used to improve learning and teaching by students, support staff, academic staff, management and organizations.

We’re probably still at the identification stage, figuring out what can be derived from the available data. The exploration and testing of interesting indicators.

Little whole institution, longitudinal analysis

Much of the published work I’ve seen has focused on snapshots or short time frames. A semester, or perhaps a whole year. Some isn’t even at the institutional level, but just a handful of courses.

Question: Has anyone seen any research comparing LMS usage over 4/5 years?

Why hasn’t this happened? Perhaps some of these help explain

  1. Over a 5 year period most institutional will have changed LMS. Cross LMS comparisons are difficult.
  2. No-one’s kept the data. Most IT divisions are looking to save disk space (after all it’s such an expensive resource these days) and have purged the data from a few years ago. Or at least, have it backed up in ways that make it a bit more difficult to get to it.
  3. Over a 5 year period, there’s probably been or about to be an organisational restructure brought on by a change of leadership (or other difficulties) that focuses people’s attentions away from looking at what’s happened in the past.
  4. Looking at the data might highlight the less than stellar success of some strategies.

What path adoption and why?

It appears that there is a gradual increase in usage of the LMS over time (“usage” is an interesting term to define). I wonder how well this pattern applies? If it is impacted by various institutional factors? Is the “technology dip” visible?

Mapping the adoption trend over time and exploring factors behind its change could be interesting.

Student usage of features – adoption measure?

Macfadyen and Dawson (2012, p. 157)

A more detailed understanding of what, exactly, is occupying student time in LMS-supported course sites provides a more meaningful representation of how an LMS is being used, and therefore the degree to which LMS use complements effective pedagogical strategies.

Perhaps adoption of LMS features should be measured by the percentage of student time is spent using that feature?

Might open up some interesting comparisons between teacher expectations and student practice.

This could be an interesting adaptation of MAV’s heatmaps

Malm and DeFranco (2011) suggest logins divided by enrolment.

Other indicators

  • Average user time online. (Macfadyen and Dawson, 2012)
  • Student usage of LMS tools by minutes of use time per student enrolled (Macfadyen and Dawson, 2012)
  • percentage of content by type. (Macfadyen and Dawson, 2012)
  • distribution of average student time per “learning activity category” (Macfadyen and Dawson, 2012) based on earlier four categories of activities where LMS tools are allocated to activities
  • correlation between student achievement and tool use frequency (Macfadyen and Dawson, 2012)

Various bits and pieces found

The planned process and the summary below, goes something like this

  • Explore citations of Beer et al (2009).
  • Explore citations of Malikowski et al

    Some of the inspiration for our work.

  • Explore existing literature I’ve saved.
  • Do a broader search.
  • Stuff that just came up in the above.

Citations of Beer et al

According to Google Scholar, cited by 16 and only 4 or so of those are our own publications.

  • Agudo-Peregrina et al (In Press)
    Defines 3 classifications (by agent, frequency, mode) of interactions and evaluates the relation to academic performance across VLE supported ftf and online learning. Empirical study with data from 6 online and two VLE supported courses. relationship to performance found only with online courses, not VLE supported.

    Beer et al (2009) mentioned as part of literature focusing on the relationship between interactions and students performance and mentions the Indicators project. Identifies 6 main areas for future research

    1. moderating factors of interactions in online courses e.g. user experience in the use of the VLE;
    2. capture “PLE” and other non-LMS data
    3. analysis of interactions based on semantic load
    4. inclusion of static/semi-static user data to allow customization
    5. complementary use of data visualisation to help explain and steer the learning process
    6. development of recommender systems
  • Goldsworthy & Rankine (2010)
    Analysis of 72 sites to identify learning design strategies which promote effective collaboration. ASCILITE short paper. Links to work at UWS exploring usage of the LMS. Beer et al (2009) referenced for the three choices: surveys, mine LMS data, manually review sites. They reviewed sites.
  • Hartnett (2011)
    Analysis of 2 cases (different courses) to explore relationships between motivation, online participation and achievement. Used a variety of measures including surveys for motivation, analytics etc. “The mixed results point to complex relationships between motivation, online participation, and achievement that are sensitive to situational influences” (Hartnett, 2011, p. 37)
  • Pascual-Miguel et al (2010) (a closed off paper).
    Exploration of whether interaction is an indicator of performance and whether it differs with mode. Results show partial or no evidence of a link
  • Greenland (2011)
    Log analysis for 10 courses that differ based on learning activity design. The design has substantial impact on levels of student interaction. Highlights some challenges.

Malikowski citations

  • Alhazmi & Rahman (2012)
    Aim to identify why the LMS has failed mentions related journal article to identify 5 failure aspects

    1. Content management – LMS used as content container
    2. Feature utilization – interactive features left unused
    3. Teaching and learning methods – one way delivery of information, passive learner
    4. Learners’ engagement – low level
    5. Assessment management – inflexible, difficult to use and no aligning between assessment and ILOs
  • Lon et al (2007)
    Relationship between course ratings and LMS use. Found students do not rate courses more highly when instructors use the LMS. But show student value LMS for different reasons. COmbined survey data with analysis of course sites.
  • Luis et al (2013)
    Uses CMS to mean Content Management System but refers to Blackboard/WebCT as examples. Looks at how students regulate their tool use throughout the course by considering th emoment tools are used – a temporal dimension missing from earlier studies. “More insight into students’ tool-use is particularly important from an instructional design perspective since research has repeatedly revealed that a learning environment’s effectiveness depends heavily on students’ adaptive tool-use.” 179 students. Only a minority of students used tools inline with the course requirements.

    Draws on 3 phases of learning – novices/disconnected knowledge; organised into meaningful structures; structures are highly integrated and function in an autonomous way. Done in a single course.

  • Naveh et al (2012)
    Through surveys and interviews proposes 5 critical success factors for increasing student satisfaction with the LMS: content completeness; content currency, easy to navigate, easy to access, and course staff responsiveness.
    Interestingly draws on institutional theory and the idea of environmental legitimacy outweighing efficiency.
    Developed survey 8000+ (13%) responses. Semi-structured interview of students of top and bottom ranking courses.

Existing literature

  • Romero et al (2013)
    “This paper compares different data mining techniques for classifying students (predicting final marks obtained in the course) based on student usage data in a Moodle course” (p. 136)
  • Malm and DeFranco (2011)
    “This article describes a student-centered measure of LMS utilization, average number of student logins per student, as a primary tool for policymakers” and illustrates how it can be used in several ways.

    “…most commonly used adoption metrics are faculty-focused and binary in nature” (p. 405) Binary as used or not.

    Suggesting ALPSi = ( Total student loginsi / Total enrolled studentsi ) as the solution (where i is class section). The advantages are meant to be

    • student focused;
    • based on easily available system data and simple to calculate
    • simple measure of intensity of use that can be useful for analysing and discussing the role of the LMS on campus.
    • section based.

    used the figure in various ways, including intensity of site usage based on age of faculty member (digital natives are apparently under 35). This was confirmed by a t-test.
    But did not find a change over time.

  • A couple of papers by Lam and McNaught (including one referenced in Malm & DeFranco above) look very interesting, but sadly IGI stuff is inaccessible. Some of it is summarised here
    “overall findings are that, while adoption of simple strategies (information-based) is increasing, there is little evidence of horizontal and vertical diffusion of the more complex strategies that engage students in interaction.” also found in terms of four elearning functions (provision of content, online discussion, assignment submission and online quiz) found a “slight decline in the use of diverse online strategies”. “Use of some more complex strategies actually decreased”.
  • University of Kentucky EAD results from Malm and DeFranco (2011) a concerted effort to promote adoption of the LMS.

Broader search

Stuff that came up

Classifications of LMS feature usage

  • Malikowski et al (2007).
  • Dawson et al (2008).
  • Macfadyen & Dawson (2012)


Agudo-Peregrina, Á. F., Iglesias-Pradas, S., Conde-González, M. Á., & Hernández-García, Á. (2013). Can we predict success from log data in VLEs? Classification of interactions for learning analytics and their relation with performance in VLE-supported F2F and online learning. Computers in Human Behavior. doi:10.1016/j.chb.2013.05.031

Alhazmi, A. K., & Rahman, A. A. (2012). Why LMS failed to support student learning in higher education institutions. 2012 IEEE Symposium on E-Learning, E-Management and E-Services, 1–5. doi:10.1109/IS3e.2012.6414943

Beer, C., Jones, D., & Clark, K. (2009). The indicators project identifying effective learning, adoption, activity, grades and external factors. In Same places, different spaces. Proceedings ascilite Auckland 2009 (pp. 60–70). Auckland, New Zealand.

Greenland, S. (2011). Using log data to investigate the impact of (a) synchronous learning tools on LMS interaction. In G. Williams, P. Statham, N. Brown, & B. Cleland (Eds.), Changing Demands, Changing Directions. Proceedings ascilite Hobart 2011 (pp. 469–474). Hobart, Australia.

Goldsworthy, K., & Rankine, L. (2010). Learning design strategies for online collaboration : An LMS analysis. In C. H. Steel, M. J. Keppell, G. P, & H. S (Eds.), Curriculum, technology and transformation for an unknown future. Proceedings of ASCILITE Sydney 2010 (pp. 382–386). Sydney.

Hartnett, M. (2011). Relationships Between Online Motivation , Participation , and Achievement : More Complex than You Might Think. Journal of Open, Flexible and Distance Learning, 16(1), 28–41.

Lonn, S., Teasley, S., & Hemphill, L. (2007). What Happens to the Scores? The Effects of Learning Management Systems Use on Students’ Course Evaluations. In Annual Meeting of the American Educational Research Association (pp. 1–15). Chicago.

Lust, G., Elen, J., & Clarebout, G. (2013). Regulation of tool-use within a blended course: Student differences and performance effects. Computers & Education, 60(1), 385–395. doi:10.1016/j.compedu.2012.09.001

Malikowski, S. (2010). A Three Year Analysis of CMS Use in Resident University Courses. Journal of Educational Technology Systems, 39(1), 65–85.

Macfadyen, L., & Dawson, S. (2012). Numbers Are Not Enough. Why e-Learning Analytics Failed to Inform an Institutional Strategic Plan. Educational Technology & Society, 15(3), 149–163.

Naveh, G., Tubin, D., & Pliskin, N. (2012). Student satisfaction with learning management systems: a lens of critical success factors. Technology, Pedagogy and Education, 21(3), 337–350. doi:10.1080/1475939X.2012.720413

Pascual-Miguel, F., Chaparro-Peláez, J., Hernández-García, Á., & Iglesias-Pradas, S. (2010). A Comparative Study on the Influence between Interaction and Performance in Postgraduate In-Class and Distance Learning Courses Based on the Analysis of LMS Logs. In M. D. Lytras, P. Ordonex De Pablos, D. Avison, J. Sipior, & Q. Jin (Eds.), Technology Enhanced Learning. Quality of Teaching and Educational Reform (pp. 308–315). Springer.

Romero, C., Espejo, P. G., Zafra, A., Romero, J. R., & Ventura, S. (2013). Web usage mining for predicting final marks of students that use Moodle courses. Computer Applications in Engineering Education, 21(1), 135–146. doi:10.1002/cae.20456

BIM for Moodle 2.5

Earlier this week @sthcrft asked

Talk about good timing. My shiny new Mac laptop arrived the same day and I’d been waiting on its arrival to explore whether or not BIM was Moodle “2.5ish happy”. It turns out that there are a few tweaks required and some improvements made possible. The following is records those tweaks.

Current status

BIM seems to be working on Moodle 2.5.

I have made a minor change so that there is now a branch of BIM specific to Moodle 2.5. Will probably become the master branch in coming days.

Tested the changes with my current course’s use of BIM – about 100 students – but have yet to add this to the Moodle plugin database.

Crashing on tabs

It was looking quite good for BIM on Moodle 2.5. Installed without a problem and appeared to be basically working. Some of the interface tweaks helped the unmodified BIM look a bit nicer.

But then I tried to “Find a student”. At which stage it appears to crash/stall/hang. Sit’s there never completing (or at least not for a very long time).

A bit of exploration of what’s happening suggests that the problem is with print_tab which appears to be deprecated from Moodle 2.5 onwards. A quick translate to the new alternative still left the same problem. The tabs work for all of the pages, but not on the submission of “Find Student”.

And back to this on the next day.

After a lot of wasted time – you idiot – I haven’t setup the http proxy on my server and that’s causing the delay. And again, you idiot.

Other tests

Tests as other users all seemed to work fine.

Layout issues

Some of the more “busy” pages for the coordinator (some overlap with the marker) don’t display very well. Never have really, but the current default theme emphasises those problems. Let’s change to another theme and see.

  • The text editor for comments on MarkPost overlaps a bit

These are minor issues and after a quick look, can’t see any quick way to solve it beyond a broader re-working of the interface.

Nested tabs

The move to tab tree apparently gives scope for nested tabs, that could solve one of the (many) uglies in BIM. i.e. the coordinators ability under “Your students” to view details and mark posts. Implementing these as nested tabs could be useful. An exploration.

That seems to work surprisingly easily. Now to remove the old kludge.

Big data in education – part 2

And now onto Week 2 of the Coursera MOOC “Big Data in Education”. Focusing on the evaluation of models – is it any good?

Detector confidence

Sadly, the audio for the first week’s problem with buzzing.

Classification – predicting a categorical label.

Value of knowing the certainty of a model’s prediction – confidence matters.

Uses of detector confidence

  • Gradated intervention – cost/benefit can be used to judge (assuming you know things like – how much learning in a minute)
  • Discovery with models analyses

Not all classifiers provide confidence. Some provide pseudo-confidence. Some straight forward.

The confidence provided is based on the initial data.

Diagnostic Metrics

Metrics for classifiers.

  • Accuracy (aka agreement in inter rate reliability)

    Not a good measure. e.g. not unusual to say 92% of students pass Kindergarten – if the detector says PASS always will get accuracy of 92%.

  • Kappa

    Percentage of progress from expected agreement to perfect.

    Interpreting Kappa is not easy. A negative value of Kappa suggests your model is worse than chance. Seen commonly with cross-validation. Your model is junk.

    Between 0 and 1 harder to judge. Typically 0.3-0.5 is considered good enough to call the model better than chance and publishable

    Some ed journals want 0.9

    Why no standard? 0.8 is sometimes used as a magic number. Kappa is scaled by the proportion of each category – the data set influences outcomes.

    Comparing Kappa values between data sets is not great. If the proportions of data sets can make it okay, informally.

And now a quiz to calculate kappa didn’t really pay close attention to the formula – it made sense. So, let’s jump to the lecture PDF – to painful to do it in the video. So Kappa is

Update: After struggling through the following I’m not confident that the slides give a good grounding to do this calculation. I got the Kappa calculation correct on the final week quiz by working through the example on the Wikipedia page for Cohen’s Kappa.

(Agreement – Expected Agreement )
(1 – Expected Agreement )

Detector – Insult during collaboration Detector – No insult
Data insult 16 7
Data – No insult 8 19

Agreement = 16 + 19 = 35%


  • Expected frequency for insult = 23
  • Expected frequency for no insult = 27


  • Expected frequency for insult = 24
  • Expected frequency for no insult = 26

Expected no insult agreement = 0.27 * 0.26 = 0.0702
Expected insult agreement = 0.23 * 0.24 = 0.0552

Expected agreement = 0.0702 + 0.0552 = 0.1254

(Agreement – Expected Agreement )
(1 – Expected Agreement )

(0.35 – 0.1254 )
(1 – 0.1254 )


0.2568….. which is not the answer but I guessed the answer closest (0.398) and got it right. Have checked but can’t see what’s wrong. The assumption is the percentage calculation. This is something that isn’t clear or apparent in the lecture.

Oh joy, another one. Let’s try the percentage

Detector – suspension Detector – no suspension
Data – suspension 1 2
Data – no suspension 4 141

n = 1 + 2 + 4 + 141 = 148
Agreement = 1 + 141 / ( 100/148) = 95.95%


  • Expected frequency for suspension = 3 = 2.02%
  • Expected frequency for no suspension = 145 = 97.97%


  • Expected frequency for suspension = 5 = 3.37%
  • Expected frequency for no suspension = 143 = 96.62%

Expected no suspension agreement = 0.9797 * 0.9662 = 0.947
Expected suspension agreement = 0.0202 * 0.0337 = 0.00068074

Expected agreement = 0.947 + 0.00068074 = 0.94768074

(Agreement – Expected Agreement )
(1 – Expected Agreement )

(0.9595 – 0.94768074 )
(1 – 0.94768074 )


0.226 … again, not exactly one of the options, but the closest one gets a correct answer. Could be a rounding error, small enough difference.

So well done, I have no confidence at all that I know how to calculate Kappa. Good thing RapidMiner (and I assume other tools) will do it for me!!!

More metrics for classifiers

  • Receiver-Operating Characteristics (ROC) Curve
    Predicting something which has two values (Yes/No etc). Prediction model outputs a probability — how good is the prediction.
    Have a threshold – any prediction above take on 1. Anything under 0. This is then compared against truth. Changes to threshold changes the possibility for a case. Four possibilities (true positive, false positive, true negative, false negative).
    X axis = percent false positives versus true negatives – false positive to the right
    Y axis = precent true positives versus false negatives – true positives up
    Want the curve to be above chance – the diagonal.
  • Ai – A prime – close to ROC
    Probability that if a model is given an example from each category, it will accurately identify which is which. Mathematically equivalent to the Wilcoxon statistic. Enabling computation of statistical tests. There are Z tests for comparisons to other sets and chance.
    Not a good way to compute A prime for 3 or more categories.
    A prime assumes independence. So can’t use if for multiple students. Need to compute A prime for each and integrate across students.
    Which leads to A prime approximate the area under the ROC curve
    Implementations of AUC are buggy in all major statistical packages.
    A prime versus Kappa – A prime more difficult to compute, only works for two categories, invariant across data sets and easy to computer statistical.
  • precision – when the model says something is true, how often is it right TP / (TP + FP)
  • recall – of cases that are 1, what percentage of these are capture. TP / ( TP + FN )

Metrics for regressors

  • Linear (Pearson’s) correlation
    1 to 0 to -1
    What’s good – physics 0.8 is weak. Education – 0.3 is good. Correlation is vulnerable to outliers.
    Mean Absolute Deviation – Average of absolute value (actual minus predicted)
    Root mean squared error – Squared difference (close cousin)
    MAD – average amount to which predictions deviate from actual values. RMSE can be interpreted the same way (mostly) but large deviations penalised more

Low RMSE/MAD is good – high correlation is good

Information criteria – Bayesian Information Criterion BiC – trade-off between goodness of fit and flexibility of fit.

BIC prime

  • values over 0 – worse than expected given number of variables
  • values under 0 – better than expected given number of variables
  • Can be used to understand significance of difference between models

AIC – Alternative to BiC – slightly different trade-off between goodnes of fit and flexibility of fit.

No single best method. Understand the data, use multiple metrics.

Cross validation and over-fitting

Over-fitting – fit to the noise as well as the signal. Can’t get rid of it.

Reducing over-fitting

  • Use simple models – fewer variables (BiC, AIC, Occam’s Razor); less complex functions (minimum description length)

Questions are: how bad and what do we fit.

Assess generalizability

Getting into training set and test set. But uses data unevenly.

Cross validation

  • split data point into N equal-size groups
  • train on all groups but one, test on last group
  • For each possible combination

How many groups

  • K-fold
    Split into K groups; quicker, some theoreticians prefer
  • leave-out-one
    Every data point is a fold; more stable; avoids issue of how to select folds (stratification issues).

Cross-validation variants

  • Flat cross-validation – each point has equal chance of being in each fold
  • stratified cross-validation – biases fold selection so that some variable is equally represented – e.g. the variable you’re trying to predict.
  • Student-level cross validation – ensure no studetn’s data is represented in two folds; allows testing of model generalizability to new students. As opposed to testing generalizability to new data from the same students.

    Minimum cross-validation needed in the EDM conference Papers that don’t, usually get rejected. Okay to explicitly choose something else and discuss the choice.
    Weka doesn’t support this. Batch X-Validation in RapidMiner supports this.
    Can apply this to other levels – lesson, school, demographic (new)

Where do you want to be able to use your model? Cross-validate at that level.

Types of validity

Generalizability – does your model remain predictive with a new data set

Knowing the context the model will be used in drives what kind of generalisation you should study.

Ecological validity – does findings apply to real-life situations outside research (lab) settings. Could also be schools that easy to do research in (e.g. middle class suburban schools) but fails in other types of schools.

Construct validity – does your model actually measure what it was intended to measure – e.g. does your model fit the training data. But is the training data correct? e.g. which students will end up at other school based on disciplinary records – but other school also includes students moved to a special school for other reasons.

Predictive validity

Substantive validity – do your results matter? are you modelling a construct that matters? If you model X, what impacts will drive?

e.g. boredom correlates with many important factors (good), but visual/verbal preferences for learning materials doesn’t predict learning better.

Content validity – does the test cover the full domain. A model of gaming that only captures one approach (systematic guessing) but not another (hint abuse) has lower content validity

Conclusion validity – are your conclusions justified

Many dimensions, all must be addressed.

That’s the lectures for week 2 – is there a quiz? (Not really looking forward to that).

The quiz

Applying a set of metrics. Our choice of tool. Mmm. One question appears to require compiled code for A prime – but told we can ignore that question.

Pearson correlation straight off the bat. Wonder if RapidMiner helps with this. Would appear so – operator called Correlation Matrix. Check the result with Excel. In business.

RMSE – back to the PDF to get the formula and into Excel (with a side visit to Google and the web)

MAD – done

Now onto accuracy of a model – apparently easy to do in Excel. But I’ve probably exhausted my knowledge of Excel. And it appears from the forums that I am not alone. The forums also suggest there are some folk having difficulties with the formula – but I don’t think that’s my problem.

This does appear to be a problem with this MOOC. An assumption of a fair bit of outside knowledge note covered in the lecture to answer the quiz.

I’m switching to Perl. Done, I think. Some of these questions appear to be a stretch from what I remember from the lecture.

And so Cohen’s Kappa – is this the same as the Kappa that is mention in the lectures – doesn’t appear “Cohen” gets a mention in the lecture. Just little things like this which can cause uncertainty.

Need to construct a detector/data 2 by in Perl. That’s easy. Getting the calculation of Kappa to align with what they expect in the quiz system (no MCQ this time). Mm, try that.

And now precision. Easy enough to calculate (I think) – but am unsure that I’m getting to know the value of using the measures in a real bit of research.

Leaving out question 11 – I got 6 out of 10 for this one. Not great by any stretch of the imagination. Where did I go wrong? Looks like everything I did in Perl is wrong. The questions were

  • Cohen’s Kappa – given problems earlier with Kappa, I wasn’t expecting this to be right. But I included 3 decimal points, it wanted 2!!!
  • Precision – wrong – looks like this was the recall
    Looks like the problem here is a silly mistake around definition of true positive etc.
  • Recall – this was wrong
  • Question about applicability of detector based on precision and recall. This I just guessed. Too late and the leap too large.

Try again. 8 out of 10/11. Precision is still wrong. Cohen’s Kappa is correct after fixing the rounding and the judgement about application is wrong as well – but given that precision is wrong, that’s not surprising.

Don’t have the energy to solve this tonight.

Big data in education – part 1

Yet another MOOC to start. This time “Big data in education” by Ryan Baker – one of the names that spring up often in the Educational Datamining field. A coursera MOOC.

Week 1

Oh dear the audio from the video is not great, wonder how much of that my local set up.

Setting up the importance of EDM/LA with a quote from the editor of the editor of the Journal of Educational Psychology from here that EDM/LA is

escalating the speed of research on many problems in education

Source of methods is data mining, machine learning, psychometrics and trad statistics.

“Educational data is big, but it’s not google-big”

Neural nets not used that much in this field because of context and “smallness” of data.

How big is big? Big data in education larger than classical education research. Big enough for differences in r2 of 0.0019 to come up routinely as statistically significant.

Oh, the prev/next buttons on the video aren’t to go between slides in the slideshare, but to videos.

Key types of methods

  • Prediction
    Develop a model that infers a single aspect (predicted variable) from a combination of other aspects of the data (predictor variables).
  • Structure discovery
    Find structure and patterns “naturally” in the data.
  • Relationship mining
    Discover relationships between variables.
  • Discovery with models
    Take a model, combine with other as a component of analysis.

Some coverage of why EDM has been evolving recently.

RapidMiner 5.3 is the tool for this course.


Regressor – a model that predicts a number.

Training label – data set where you know the answer.

Each label has a set of features/other variables which are used to predict the label

Linear regression – variables of different scales requiring transformation. Different types of transformation (e.g. Unitization)

Regression trees


The label is categorical – not a number.

Specific algorithms work better for specific domains. Useful algorithms in EDM:

  • Step regression;
    Used for binary classification. Fit a linear regression function with arbitrary cut-off.
  • Logistic regression;
    Binary classification. Logistic function generates frequency/odds of variably.
  • J48/C4.5 decision trees,
    Explicitly deal with interaction effects.
  • JRip decision rules,
  • K* instance-based classifiers

As advised, RapidMiner 5.3 installed. Aside: like the start “page” for RapidMinor. Links to videos etc with an interface intended to help folk get started. Exactly the sort of thing BIM needs.

Interesting to note that they are using screenshots of what to expect, rather than actually using the software. The data mining with Weka approach – using the software – was better. Especially given the speed at which it’s been skimmed through and especially given when the data set provided, doesn’t appear to match the one being used. Ahh, that’s because I’ve got the wrong data set, but where is the data set?

Oh, so we do need to install the Weka extension to do this. Would have been nice to have been told this up front. In fact, much of that wasn’t that useful. At least not for indepth understanding. Built vague understanding of driving RapidMiner, but that’s about all.

More “theory” – more classifiers

Decision rules – sets of if-then

Now onto a case study. This has some good value. Gives broader context.

Given the “we know you’ve had troubles” messages from the course staff it appears that the brevity of the RapidMiner introduction has hit quite a few people.

The quiz

So now there’s an assignment. Be interesting to see what they ask and how it connects with the lectures.

The quiz comes with the very American notion of the honor code.

So using rapidminer to analyse data sets. Interesting because the lectures treated this process as an after thought. They focused on the concepts with data mining and only quickly mentioned in passing the rapidminer techniques. Not a big leap, but not one that was telegraphed in the lectures as something you should be taking note of (see the above for the absence). So a bit of revision is called for (it’s two days since I listened to the above lectures)

Thankfully they have PDFs of the slides. Still a bit of trouble in figuring out the interface, the meaning and purpose of all the operators. Also assumed the need to connect the “Read CSV” operator up to the input port – but not necessary. Let’s see if the answer is correct?

Ahh, of course it doesn’t appear that I get feedback on the first question until I’ve complete the whole quiz. Given the uncertainty around the use of RapidMiner this is problematic.

So the operators I’m using

  • Read CSV – nice CSV import of the data.
  • Set Role – to identify which column in the CSV is the label. What we’re trying to develop a model that will help us predict.
  • W-J48 – the Weka extension for the J48 algorithm to generate the decision tree (the particular type of model we’re using).
  • Apply Model – takes an already developed model and applies it to a data set.

    W-J48 just develops the decision tree (the model) using the CSV file we inputted as the training set. Apply Model applies that model to a data set to do some prediction, or in this case to test the performance.

    Note: I’ve gotten this mostly from reading the RapidMiner provided synopsis, not from the lecture (or at least what I remember of it).

    In this case, we’re currently running Apply Model on the same data (I think). This is not good practice.

  • Binomial Performance – generates a range of statistics about the quality of the model. It’s performance.

Now they want us to exclude one of the columns (an attribute?). Import CSV seems a good place to do that – yes you can deselect a row.

That ran a lot slower. Wonder why? In all of this we’re reporting the KAPPA value, through we’re yet (based on my memory) yet to have a good explanation of what this is telling us.

Now exclude some more columns. Most of these appear not to be specific to the model – e.g. the studentid is not going to be something impacting learning.

Since all of these questions are building on earlier questions, the lack of immediate feedback on the earlier questions is becoming troubling. If I got the first one wrong, then all of these could be wrong as well.

Okay, let’s use a different classifier. And another. And another. And another. Interesting past 40 seconds for one of these classifiers. 1m20s in total

And now onto cross-validation with the same example.

And then a question about the changes in the value of kappa due to the use (or not) of cross-validation. An important point, but not one that I thought was made all that explicit in the lecture.

All good – 11 out of 11. A little hit of dopamine from the success and the slightly unexpected nature of it. And so onto week 2.

The “interactive” part of these lectures are so focusing on simple sums. Not really addressing the big picture.

BIM and broken moodle capabilities

The following is a long overdue attempt to identify and solve an issue with BIM.

The problem

BIM provides a three different interfaces depending on the type of user, these are

  1. coordinator;

    The name is a hangover from a past institution, but essentially this is the teacher in charge. Can do anything and everything, including the management of the marking being done by other staff.

  2. marker;

    Another staff role, mostly focused on marking/looking after a specific group of students.

  3. student.

    What each student sees.

The problem is that the code that distinguishes between the different types of users is not working.

For example, a user who should be a coordinator, BIM thinks is potentially all three.

The method

The method I use (and which was used in BIM 1 and has worked fine) is based on capabilities, essentially a few ifs
[sourcecode lang=”php”]
if ( has_capability( ‘mod/bim:marker’, $context )) {
# do marker stuff
if ( has_capability( ‘mod/bim:student’, $context )) {
# do student stuff
if ( has_capability( ‘mod/bim:coordinator’, $context)) {
# do coordinator stuff

These are then defined in db/access.php via the publicised means

What’s happening

To get to the bottom of this, I’m going to create/configure three users who fit each of the BIM user types and see how BIM labels them.

  1. coordinator user – BIM thinks can be marker, student or coordinator.
  2. marker user – is a marker
  3. student user – is a student and a coordinator

The above was tested within BIM itself. There’s a capability overview report in Moodle that shows “what permissions that capability has in the definition of every role”.

For coordinator, it’s showing “Allow” for “Student” and not set for everything else. Not even the manager. Suggesting that there is a mismatch between the BIM code and what Moodle knows. Suggesting that an upgrade of the BIM module is called for.

So, let’s update the version number, visit the admin page and do an upgrade. Success. Now check the capability overview report.

The capability overview report is reporting no change. This appears to be where the bug is. What’s in the db/access.php file is not being used to update the database.

Seem to have it working.

Clean test

Need to do a test on a clean Moodle instance.

  1. Coordinator – CHECK
  2. Teacher – CHECK
  3. Student – CHECK

Glad that’s out of the way. More work on BIM in the coming weeks.