Georgia Institute of TechnologyCEISMC
Register for the Gazette | Search the Archives | Provide Feedback
CEISMC Offices

Visualizing Data Part Two: The Need for Visualizing Data in a Post 9-11 World  Printer-friendly version of this article
November 2008

In part two of our interview with John Stasko (read part one here), we move beyond general discussions of how humans perceive the world and into the specifics of Jigsaw, a program John and his team have created that allows users to visually interpret the sort of data that pours into Homeland Security agencies and police departments. Are these developments in crime prevention and crime solving going to make for a safer world, or are these stumbling steps towards some scary Minority Reportesque world of the future?

Q - So your Jigsaw program seems to be something like social network analysis combined with on-the-fly graphing and the use of text-sources--

--and search. Yeah it's really kind of a high end viz multimedia document analysis tool in a way.

Q - How did Jigsaw come about?

I got involved with this new kind of project effort called visual analytics that some folks at Pacific Northwest National Labs [PNL] developed. This visual analytics area bubbled up about four or five years ago, and some people behind it gathered together probably about 25 researchers from around the country who worked on things like this. We met in a couple of two-day meetings to hammer out a research agenda, definition for the area. There was a book, Illuminating the Path [this is free to download here], that is really the product of those two meetings. [The Department of Homeland Security] is a sponsor of it. But it's not just homeland security; it's drug discovery and other kinds of things.

In these two days of meetings we did a variety of activities. Since it was Homeland Security they brought in someone who works with the border patrol. And he talked about how a truck rolls in and we have 32 seconds to make a determination whether to let that thing go across. They brought in some guys who do emergency management, wildfire patrol and stuff.

And they bought in an interesting professor from the Joint Military Intelligence College where they train analysts. To help us learn more about the analytic process he had us do exercises. He gives us a series of documents, kind of snippets—think of them as case reports, intel organization case reports. And you get a collection of, I think in the one exercise we had about 25 and in the other we had about 125.

They're like three to four paragraphs each. They may say "On June 12th we intercepted an email from so-and-so, there was a phone call from here, and there were ammunitions stolen from Fort Benning"—little things like that. They're each a little snippet of information that is all kind of disconnected. But if you connect the dots a theme—a story—emerges. And in these exercises our goal as the analysts was to find the story.

It was fun! But it was also hard; harder than you'd think. And this was just with a hundred documents. What if it was a hundred thousand?

I went back to my hotel room that night thinking, "There's gotta be a better way. And I'm a viz guy. I think if we visualized the documents, and particularly the entities within them, visualization is a natural way to show that. So we had some initial ideas and we started building some of the Jigsaw views and then I started to talk to people at PNL about would this work, would it not work? We entered the VAST [Visual Analytics Science and Technology] contest, which we won the one year, so that was a huge, nice little bit of PR for us.

There's a police force I've given it to—this is not on any of the web pages yet because it's still behind the scenes—there's a police force that's starting to use it to look at some interesting data they have. When I was on sabbatical in the spring at PNL I was able to use Jigsaw to work on an active cold case for a local police department. I can't tell you anything about it, but it was fascinating to look at. They had about a thousand documents relevant to this case.

Suppose there's a cold case and a new detective comes on. They'll drop a four-foot stack of documents on his desk and say, "Make sense of it." That's kind of the role that I took on. And by using Jigsaw I was able to come up to speed very quickly and learn who some of the key players were and what was going on. And after not too much effort I went up to the people at the labs I was working with and said, "You know I think there's three main hypotheses about the crime, and it's A, B, and C," and they said to me, "You're exactly right." And I said, "You know, if I had to guess one, I would go for C," and they said, "That's what the police are leaning to now."

In many ways it was interesting. Jigsaw was only so useful. It helped in some things, but there were some aspects of it that I think Jigsaw didn't do well, or it didn't have some capabilities that would have been nice. It was fantastically useful for me in terms of learning more about Jigsaw and what it should do, in terms of what it should do, in terms of being a driver of the technology.

It's a challenge in academia to get access to this kind of data that we really need. These exercises that this professor from the Miltiary Intelligence College prepared have been incredibly valuable for us. There is a group at the labs I was at that have a research project that is about generating synthetic data sets, like these document collections, because it's just so hard to get to the real ones because they're top secret. So we want something that's kind of synthetic but good enough and realistic enough to really push the tools so that we can then evolve the tools and give them to the agencies and hopefully they're worthwhile.

Q - STAB [a program developed by Professor Ashok Goel that pieces together possible scenarios to explain how a crime might have taken place] seems to be overlap with the idea a little bit.

STAB is more of an artificial intelligence systems approach of...I can from the atoms construct the molecule. Here's all the facts that we know; now we put together a hypothesis about what the crime really was and why it occurred. In Jigsaw it's more human-centered and human-directed. The analyst is still primarily making all of those judgements and the visual tool is this microscope that helps them. STAB is more about can we do that inferencing?

It can't make something out of nothing. So it needs to have some library of...to rob a bank you want a motive, and then you want availability, so you need transportation. So you can think about it working top-down and bottom-up. So at some level we need a top-down description of why would someone blackmail another person? Why would they commit fraud? STAB is about now we start to get some of the pieces, and and we kind of infer bottom-up what the top-level one is.

Q - When considering dystopian views of the future, perhaps like those seen in movies like Minority Report and Brazil, a paranoid person might be uneasy about turning over all of this inferencing to the machines.

It's not at all like that. The inferencing is very much, if A then B, and we're looking for an A, and then we can say to B...a lot of, much of the concern on the Homeland Security side has been, "Where is the data coming from?" "How is data accessed?" And that's one thing we're very careful about in our projects. We're not doing data-gathering. We're working with these synthetic data sets that are made up, that are fictional, hypothetical.

Q - And even with real data the integrity of data is obviously the responsibility of law enforcement.

Law enforcement, intelligence, they have to support privacy in whatever they do, there's a lot of laws they have to follow, and that is not our side of the fence; that's the other side of the fence.

Even in STAB, the system is starting to put together this story and it will come up with a hypothesis, but there has to be a human that then looks at this hypothesis and says, "Does that make sense?" And a lot of this is just for law enforcement or intelligence people it's just the data's getting so big that humans can't put that all together, so these systems help them with that. But the humans are never out of the loop. There's somebody making a judgement.

Q - Was 9-11 the major wake-up call for this area of research? I remember reading at the time complaints about the lack of communication across agencies, failure to pass along information or to connect dots, all things Jigsaw seems to address...

I don't want to speak to some of that stuff because I don't know it really well. I do think 9-11 amplified the need for better tools. I believe in the 9-11 report that there was discussion--right?--that we knew some of the facts. There were the little bits of the story around, and tools like Jigsaw are really to try to help someone put those pieces together in a way.

Obviously 9-11 was a landmark event for the intelligence community, and a lot of this visual analytics work rolled out a year or two after that.

Q - Seems Social Network Analysis took off after 9-11 as well.

Yeah, absolutely that's a huge booming area in computing. There are some researchers who do strictly algorythmc computation, so they have a logical network of people and a social network and they run algorythms, find the shortest path between these two people, or who's the middle man between this? That area is really very hot. And then again with things like Facebook there's just so many more social networks now.

Next: What Jigsaw does, how it works, and its potential impact on intelligence gathering.