Computer Science and Mathematics Projects
Analysis on the Wire (AoW)
In the era of Big Data, in an ever-increasing amount of cases in science and industry, more data can be found at any given moment in transit than in storage media ranging from memory to tape. This raises the question whether it would be feasible to extract value out of this data while it is still in transit. There is a strong incentive for on-the-wire processing since such processing can provide near real time information to speed up decision making processes, optimize and prioritize information routing and correlation, and offer additional processing cycles that can free up data center resources. A related concept comes from the world of cybersecurity, where packet streams passing through firewalls are inspected in an attempt to detect patterns of intrusion and avert cyber attacks. Suitably designed algorithms can exploit available computing power and perform specific forms of computation on streaming data while they are “on the wire”, i.e., being transported in the network. This requires taking the concept of processing on the wire to levels beyond the common cybersecurity applications and investigate equipment and methods to enable generic, statically or even dynamically programmable data analysis and/or transformation of data streams while going through network devices.
The basic concept for analyzing data on the wire is that one or more network devices can be programmed to recognize specific data flows and transparently apply a certain type of computation on the data of a flow before forwarding it to its destination. Depending on the case, the recipient may receive data transformed in some expected way or original data, while analysis information may be gathered from the data and sent to a different recipient.
The AoW project examines the feasibility of processing data on the wire by investigating in three main directions: potential use cases, processing capabilities of networking infrastructure, and suitable algorithms. There are many prime candidate cases for AoW: real time processing on a data stream before that data arrives at a data center for decision-making purposes; sensor network data analysis, such as distributed solar irradiance prediction, security sensor networks, such as DARPA SIGMA, or the Smart Grid for Phasor Measurement Unit (PMU) and Smart Meter data reduction and state estimation; and, in the general case, processing of data in the context of the Internet of Things (IoT). From the networking perspective, Software Defined Networking (SDN) mechanisms for Network Function Virtualization (NFV) and Service Function Chaining (SFC) could be utilized, after potential modifications, to support streaming data analysis. Vendor equipment mainly targeted at enabling sophisticated cybersecurity functions, i.e., Deep Packet Inspection (DPI), Deep Packet Processing (DPP) could also be potentially adapted to support analysis algorithms. Even traditional networking devices could be used in conjunction with external computing systems to perform analysis tasks. Finally, there are many streaming algorithms that could be adapted for processing on the wire; for example, classical online algorithms for streaming outlier detection, approximated summary statistics, such as billing, or for lightweight dimensionality reduction using problem characteristics; batch supervised and unsupervised learning algorithms; and adaptive supervised and unsupervised learning algorithms.