August, 6th 2009

Very briefly, I found this example from a larger post by Jeff Jonas about predictive systems to be appropriate for the topics we cover here on blog – particularly in understanding some of the challenges in building analytics around disparate data sets:

Take a hypothetical biosurveillance system on the West Coast of the US, which is supposed to observe the trends of a future influenza outbreak – say a new swine flu mutation.  Such a system might, for example, use newsfeeds and other available data like blogs to count incidents and locations over time.  How accurate could a system make predictions if San Francisco, San Fran, SF and the Bay Area were tallied as discreet regions?  If the system cannot tally geographically there might appear to be five cities each with mild volumes – when in fact, it is one dense region with moderate volumes.

Counting like entities (Semantic Reconciliation) is fundamental to the measurement of trajectory and velocity.

How often is the prospect for creating truly smarter systems derived through collecting and analyzing data from connected systems prevented by artificial boundaries to data? All of this underscores the need for software that can understand those boundaries to connect and associate appropriate data.

Smart systems, prediction systems, sensemaking systems, situational awareness systems, incremental learning systems … whatever one calls these things … must first be able to form an opinion (aka make assertions) about context (aka count and associate) … if they are to be relevant.

