MOOCs exploded into the academic consciousness in summer 2011, when a free ‘Artificial Intelligence course was offered by Stanford University, California.

The course attracted 1,60,000 learners from around the world – 23,000 of whom completed it. Of those learners, 410 outperformed the top student at Stanford. It was clear that brick-and-mortar campuses were unlikely to keep up with the demand for advanced education.

However, distance learning has faced one major problem since the nineteenth century – the low rate of completion. MOOCs are no exception here; the completion rate has rarely risen above 15%.

MOOC providers must strive to increase the completion rates. One way to do that is to analyse the available learner data to understand, predict and modify learner behaviour. 

We conduct MOOC data analyses to try and optimize learning, benchmark learning environments, adjust the quality of hosted courses, and control the high dropout ratio.

In this blog post, our focus will be to discover the process involved in predicting user dropouts.
Data Collection

Data on learner behaviour can be collected through:

  • Logging clicks
  • Browser/platform tracking logs
  • Login frequency
  • Time spent on tasks
  • Course calendar details: start date, end date, release date and assessment deadlines
  • Video interactivity
  • Forum activity
  • Quiz performance
  • Course reviews & ratings
  • Learning analytics

Screen Shot 2018-06-20 at 10.21.44 PM.pngEvent Data Analysis

Data from event logs contains the logs for all courses on a daily basis. We can make some assumptions and precisely define the dropout prediction problem – Time slice, dropout definition, lead, and lag

Time-Slice: Temporal prediction of a future event requires explanatory variables like correct on first attempt (assessment), number of posts (forum), video watch-time along a time axis.

Dropout: Time slice (week) in which a user lastly interacts with the course. Label 1/0 – dropouts/non-dropouts.

Lead: How many weeks in advance to predict dropout.

Lag:  How many weeks of historical data used to classify.


Learner Dropout Prediction

If a course team wants to predict the learner dropout rate between week (i+1) and week 6, it can be done using data from week 1 to week i. If the data contains n weeks of course material, then for any given week (i), there are n-i prediction problems.

Each prediction problem becomes an independent modelling problem which requires a discriminative model. We can build many discriminative models that can be used every week to predict the number of learners who may drop out in the subsequent weeks.

We recommend you attend our Future of Learning Conference (dates to be announced) to understand the modelling part in more detail.

This information can be used to keep encouraging learners to participate in the course. With data-driven insights, we will be able to provide a better learning experience and prompt an overall behavioural change.

The next post on MOOC data will deal with benchmarking the learning environment. In other words, it will deal with how we can gather feedback to enhance learner experience. How can we identify learning difficulties and weak points in the course, or stalling segments in video lectures?

Read here for more on where MOOC-based research is headed.

An Overview of MOOC Design and Curriculum Based Research can be seen below.

 [This blog post was written by Satya, Ranjitha, and Sushma – Insights Leads, IIMBx] 

This will close in 50 seconds

This will close in 50 seconds