Center for Digital Education & Converge: research in education technology for K-12 and higher education

4 Big Data Challenges that Universities Face

on April 9, 2014
Rensselaer Polytechnic Institute

The rise of big data and big science have provided universities with an opportunity to work together on ways to technologically manage worldwide research projects.

Misfolding proteins that cause Alzheimer's disease and the mysteries of dark matter and dark energy are just a few of the projects that generate massive amounts of data. To unlock the answers that the data holds, researchers will have to work together across different disciplines and countries.

"In this era of big data and big science, universities must serve as a crossroads for collaboration more than they ever have," said Shirley Ann Jackson, president of Rensselaer Polytechnic Institute, during a general session at the 2014 Internet2 Global Summit in Denver on Tuesday, April 8.

This crossroads for collaboration doesn't just mean that researchers should talk to each other. It also means collaboration between research and IT in a way that doesn't always happen, said Michael McRobbie, president of Indiana University. Researchers need the support of IT leaders who take the time to understand what's needed technologically and can then provide it.

Ten years ago, IT leaders weren't always sure what researchers wanted, so McRobbie decided that his campus would find out. Brad Wheeler, Indiana University's CIO and vice president of IT, pulled together a community of about 15 people and asked what they wanted. Ultimately the group was looking for the ability to store and preserve data. So that's what IT gave them.

"It is absolutely essential to ask and continually ask the researchers what it is that they want," McRobbie said. 

As university leaders support their campuses' missions, they face four major challenges on the road to unlocking the potential of big data and science.

1. Volume

The sheer amount of data coming out of big research projects is staggering. The research and education network from Internet2 allows researchers to share large amounts of data, and they're doing so to the tune of nearly 50 petabytes a month. While the Internet2 network is lightning fast, the explosion of data has made it challenging to keep up, Jackson said.

That leads to another challenge: Networks and supercomputers don't have the same capacity to handle these large volumes of data. For example, the Internet2 network provides 100 gigabit Ethernet technology, but servers may only allow applications to use 1 gigabit, according to SURFsara, which supports researchers in the Netherlands.

So while the network may be fast, the applications can't keep up, which is why researchers are sending their data via snail mail on discs, Jackson said. She suggested that cognitive computing systems could help address this problem by collecting and interpreting data for researchers.

2. Velocity

With real-time data and an abundance of information coming at them quickly, researchers must determine how to handle data that's at rest and in motion. One of the questions is whether IT leaders can embed more artificial intelligence inside networks so they can figure out what data to move and how to do it. 

3. Variety

Along with volume and velocity, a variety of data from numerous, if not unlimited, sources and geographic locations poses a research challenge. With researchers collaborating around the world, not everyone knows who has the data or the tools to work with it. Internet2 is working on this worldwide collaboration problem by partnering with the National Knowledge Network in India to improve research and education, among other things.

Another way to deal with this problem is through a system like Yellow Pages for data that's powered by a tool such as Watson, a cognitive technology from IBM. At Rensselaer Polytechnic Institute, researchers are teaching Watson to be a data adviser that can help guide them to treasure troves of relevant information in the 1 million open government data sets available around the world.

4. Veracity

By bringing together data from different sources, researchers now have to determine which information to trust and use. Jackson suggested that artificial intelligence can also help in this area. But whatever universities do, they need to shore up their Internet networks so these different connections and sources don't compromise research.

"We are connected by our exposures, and we are exposed by our connections," Jackson said. "Therefore it is of importance that greater resilience be built into our networks, both for the security of Internet of Things, as well as for avoiding disruption of important collaborative research efforts."

The importance of people networks

Ultimately, connections between researchers, university leaders and IT staff around the world prove the most challenging. But they are also the most valuable.

"The most important networks in discovery and innovation are human," Jackson said. "But unlocking human potential depends not only on the technology we put in place, but on how we are able to use them."


You may use or reference this story with attribution and a link to
http://www.centerdigitaled.com/news/4-Big-Data-Challenges.html


If you enjoyed this story, subscribe for updates.

View Sample

Tanya Roscorla

Tanya Roscorla covers education technology in the classroom, behind the scenes and on the legislative agenda. Likes: Experimenting in the kitchen, cooking up cool crafts, reading good books.

E-mail: troscorla@centerdigitaled.com
Twitter: twitter.com/reportertanya
Google+: Gplus.to/reportertanya

Comments

Add a Comment

Add a Comment

on Apr 10, 2014
Tanya, very nice article on Big Data. With the explosion of big data, companies are faced with data challenges in three different areas. First, you know the type of results you want from your data but it’s computationally difficult to obtain. Second, you know the questions to ask but struggle with the answers and need to do data mining to help find those answers. And third is in the area of data exploration where you need to reveal the unknowns and look through the data for patterns and hidden relationships. The open source HPCC Systems big data processing platform can help companies with these challenges by deriving insights from massive data sets quick and simple. Designed by data scientists, it is a complete integrated solution from data ingestion and data processing to data delivery. Their built-in Machine Learning Library and Matrix processing algorithms can assist with business intelligence and predictive analytics. More at http://hpccsystems.com