Differentiating a port from a shipyard is a new kind of problem for AI
It’s well known that satellites and other intelligence, surveillance and reconnaissance platforms collect more data than is possible for humans to analyze.
To tackle this problem, the Intelligence Advanced Research Projects Activity, or IARPA, conducted the Functional Map of the World (fMoW) TopCoder challenge from July 2017 through February 2018, inviting researchers in industry and academia to develop deep learning algorithms capable of scanning and identifying different classes of objects in satellite imagery. IARPA curated a dataset of 1 million annotated, high-resolution satellite images aggregated using automated algorithms and crowd sourced images for competitors to train their algorithms to classify objects into 63 classes, such as airports, schools, oil wells, shipyards, or ports.
Researchers powered their deep learning algorithms by combining large neural networks, known as convolutional neural networks (CNNs), and computers with large amounts of processing power. The result was a network that, when fed massive amounts of training data, can learn to identify and classify various objects from satellite imagery. By combining a number of these networks into what is called an ensemble, the algorithm can judge the results from each CNN to produce a final, improved result that is more robust than any single CNN.
This is how a team from Lockheed Martin, led by Mark Pritt, designed their deep learning algorithm for the challenge. Pritt explained to C4ISRNET, that he and his team developed their CNN using machine learning software and framework from online open source software libraries, such as Tensor Flow. Earning a top five finish, the algorithm designed by Pritt’s team achieved a total accuracy of 83 percent, and was able to classify 100 objects per second. Pritt said that with fully functioning algorithm, this software could take an image recognition task that takes a human an hour to complete and reduce the process to a few seconds.
The team’s algorithm excelled at identifying classes with distinctive features, and successfully matched nuclear power plants, tunnel openings, runways, tool booths, and wind farms with accuracies greater than 95 percent, but struggled with more indiscreet classes such as shipyards and ports, hospitals, office buildings, and police stations.
“Usually when you develop an algorithm its nice to see where it succeeds, but you actually learn the most where you look at where the algorithm fails or it doesn’t do well,” Pritt said. In trying to decipher why the algorithms struggled, Pritt said the competitors suggested that some objects simply don’t have any distinguishing features from the point of view of a satellite image for the algorithms to recognize.
“Maybe the most important ingredient you need for these new types of algorithm to work is the dataset because these algorithms require a great amount of data to train on,” Pritt explained. “It’s kind of analogous to the way a human will learn in childhood how to recognize things. You need lots of examples of what those things are and then you can start to generalize and make your own judgments,” he said.
But even with large amounts of training data that is correctly labeled, it is also possible the deep learning technology of today cannot reach the higher levels of intelligence to recognize nuanced differences. For example, Lockheed Martin’s algorithm confused shipyards and ports 56 percent of the time. Pritt said that people “look at an image and they can tell that it’s a port or a shipyard, they are usually looking at very subtle things such as if there is a ship in dry dock or if there is a certain type of crane present. They are looking for details in the image that are maybe higher level or more complicated than what these deep learning algorithms can do right now.”
However, the fact that these algorithms cannot do everything should not dismiss the significant contribution they could provide to the defense and intelligence community.
Hakjae Kim, IARPA’s program manager for the fMoW challenge, said the benefits of this technology could extend far beyond faster image processing. “I want to look at it more in the perspective that we can do things we weren’t able to do before,” Kim said. “Because its technology that we are now able to do x, y and z, there are more applications you can create because with the human power it is just impossible to do before.”
Kim and Pritt stressed managing expectations for CNN-based artificial intelligence.
“This is a real technology that will work, but it also has limitations. I don’t want to express this technology as a magic box that will just solve everything magically,” Kim said. “I don’t want the users in the field to get disappointed by the initial delivery of this technology and say ‘Oh, this is another technology that was oversold and this is not something we can use,” he added.
Part of managing our expectations for AI requires recognizing that although intelligence is in the name, this technology does not think and reason like humans. “A lot of the time we think that because we use the term AI, we tend to think these algorithms are like us, they are intelligent like us,” Pritt said. “And in someways they seem to mimic our intelligence, but when they fail we realize ‘Oh, this algorithm doesn’t really know anything, [it] doesn’t have any common sense.’”
So how are IARPA and Lockheed Martin working to improve their algorithms? For IARPA, Kim’s team is working on updating and maintaining their dataset to ensure algorithms have the most up to date information to train on, ultimately making the CNN-based algorithms easier to trust. “[S]ubtle changes in the area mess up the brains of the system and that system will give you a totally wrong answer,” Kim explained. “So we have planned to continuously look over the area and make sure the algorithm we are developing and reassessing for the government to test on and use to be robust enough for their application,” he furthered.
Work is also underway at American universities. Kim described how a team of researchers at Boston University are using the fMoW dataset and tested algorithms to create heat maps that visualize what part of the image algorithms are using to classify objects. They’ve found that sometimes it is not the object itself, but clues surrounding the object that aid most in classification. For example a “windmill that actually shows a shadow gives a really good indicator of what that object is,” Kim said. “Shadows show a better view of the object. A shadow is casting the side view of the object over on the ground, so [BU’s heat map algorithm] actually points out the shadow is really important and the key feature to make the object identified as a windmill.”
But don’t expect these algorithms to take away the jobs of analysts any time soon. “I think you still need a human doing the important judgments and kind of higher level thinking,” Pritt said. “I don’t think AI will take away our jobs and replace humans, but I think what we have to do is figure out how to use them as a tool and how to use them efficiently, and that of course requires understanding what they do well and what they do poorly,” he concluded.