In work funded by the National Science Foundation in 2014, Assistant Professor of Educational Psychology Matt Bernacki has worked with the university’s IT team to track student interactions with the learning management system: How often students check the syllabus, view course documents and take other key actions. They’re applying machine data and predictive analytics to that information to forecast student outcomes.

 “We can identify students who are going to do poorly before they take their first test. With 80 percent accuracy in math class and 70 percent accuracy in biology, we can tell you whether a student is going to get a C or worse,” Bernacki said.

 In addition to the IT support, Bernacki also has been collaborating with professors in order to incorporate their subject matter expertise and specific course knowledge into his algorithms.

 The LMS has proven a useful source of data, in so far as it provides insight into a broad range of student behaviors. “We see students accessing documents and tools. They access the syllabus, download course notes, take practice exams and self-assessment quizzes. They can also monitor their performance by checking their grades,” Bernacki said.

 All the activity adds up to a mountain of server-level data. Researchers so far have looked at some 10 million transactions generated by about 3,000 students. They’ve focused on activities in the early weeks of a course, in order to create an opportunity for early intervention.

 The art here lies in developing algorithms that crunch data impartially, while also recognizing the educational nuance around certain actions. Its helps to know for example not just what a student is looking at in the LMS, but also when. “If a student is accessing a syllabus in the first week of a new unit, that would suggest that have planned their activities and they are thinking about the content of the course and how they are going to learn it,” he said.

 By the same, token a student who taps the syllabus for the first time in week four may be sending a very different signal. “It means you are a bit behind in your preparation, and that can be a predictor of poor outcomes,” Bernacki said.

 That sort of distinction helps to explain some of the difficulty surrounding this kind of analytic work, which needs to strike a balance between pure data science and learning science. “You have to be able to create an algorithm in such a way as to figure out which are the true predictors. It’s not fully clear as to what a student does when clicking on a diagram: We don’t have enough information all the time. But we can still now that the student who clicks on that diagram does see better outcomes,” he said.

 If this kind of research were to bear fruit, it could open up new avenues of intervention for educators.

 “It means we can reach out with new learning materials or recommendations,” Bernacki said. “If we are good at knowing who is going to do poorly in week three, before they get that first poor grade, that gives us an opportunity. When we know who to target with extra support, we can leverage university resources, offering advice and guidance and support.”

 Looking ahead, the researchers are seeking ways to make their methodology more widely applicable. Right now the algorithm is tailored course by course, “and we are working on scaling this by working getting instructors them to do a better job of labeling content to make it easier for us to trace learning activities,” Bernacki said. “We also are building out our data capture to include textbook data from online textbook use. That will make the data feeds more complete.”