Christiaan Huygens was the Neil deGrasse Tyson of the 17th century. Huygens like Tyson was a prolific writer and a popular scientist particularly in the fields of astronomy and physics. He also invented the pendulum clock and studied the rings of Saturn, but he is probably most remembered for his wave theory of light, which helped to improve the design of optics and telescopes.
Huygens was also Dutch, which is why an award recognizing researchers from Dutch universities was named in his honor in 1998. The award, for which IBM is a sponsor, is granted each year to a different scientific field, this year going to a researcher in Information and Communication Technology (ICT). To qualify the research needs to have clear social and scientific relevance.
Young researcher, Dr. Bart Jansen, thinks he can help in this area, which is why he is being awarded the 2014 Christiaan Huygens Science Prize for his research The Power of Data Reduction: Kernels for Fundamental Graph Problems.
In his thesis he examines how to efficiently reduce the size of a dataset without changing the answer to the question at hand. Analyzing the reduced dataset rather than the original input does not only save on processing time, but also on power consumption.
Jansen accepted his award last night at a gala ceremony in Voorburg, the Netherlands. A Smarter Planet caught up with him to get his insights on the technology and his achievement.
Smarter Planet: What inspired your research?
Bart Jansen: It is already known that the time needed to solve a computational problem can often be reduced by incorporating a preprocessing step where the dataset is simplified as much as possible. However, there was no mathematical explanation of why this would often allow the answer to be found so quickly. In my research, I propose a way to quantify the complexity of a dataset and I give reduction rules to simplify large data sets of relatively low complexity. This methodology changes the process of data reduction from an art, driven by intuition and heuristics, into a science, supported by mathematical proof.
SP: Your approach makes it possible to speed up complex computations. What does this mean in practice?
BJ: Imagine planning a package delivery route: you have to find a tour that visits all delivery addresses while minimizing the total fuel consumption along the way. Once the order in which packages are delivered has been decided, it is easy to find a shortest tour that uses this order by scanning through the road network once. However, the number of different orders is very large compared to the number of packages. If you have to scan through the entire road network for every possible delivery order, this takes a lot of computing time and power. The computation can be accelerated significantly by using a preprocessing step in which roads are eliminated from the network when a mathematical argument proves there is a shortest tour that avoids these roads. The quality of a delivery order can be tested in the smaller reduced network much more efficiently than in the original network. This means that preprocessing can significantly save on computing time and power.
More importantly, current planning systems often do not search for the shortest tour because this would take too much time; fuel is therefore wasted. By speeding up the search for the shortest tour, preprocessing can make it feasible to find the optimal solution and therefore save resources.
SP: Is this the only method available to accelerate the analysis of Big Data sets?
BJ: There are other methods to accelerate calculations. In optimization problems such as delivery route planning, a common approach is to use rules of thumb to find a reasonable route rather than completely analyzing the road network to find the best solution. This saves computing time while sacrificing some of the quality of the solution.
Another scenario where my reduction rules are useful is in analyzing protein interaction networks. To understand the cell biology of an organism, networks are created in which each node is a protein. Connections between nodes are made to represent interactions among proteins. To analyze the global behavior of the cell, researchers attempt to analyze the longest reaction chains in such networks. However, finding the longest reaction chain is a time consuming process. Therefore, some researchers apply rules of thumb to find reasonably long reaction chains, rather than the longest chains. These might not give a complete picture of the behavior within the cell. Using data reduction we can speed up the search for the longest reaction chains and therefore obtain a better understanding of the cell biology. The insights obtained may be used to develop better medicine.
SP: What is the impact of your work on the future of ICT?
BJ: For the long term, it means that the development and design of algorithms is more important than technological advancement. Although Moore’s law suggests that computing power doubles every few years, a better algorithm can instantly give a 1000x acceleration. This is especially important since the rate of improvement to chip design will soon flatten out; we are reaching the limits of what is physically possible.
The application of data reduction is still in its infancy, but the possibilities are endless. My research is therefore more a beginning than an end point.