24 Hours of DataMay 22, 2017 11:56 pm
Recently Virtualitics had a flashback to our college years thanks to an all-nighter, pizza delivery, Redbull and Twizzlers (well, maybe not the Twizzlers). We put together two teams to participate in the OpenWERX data science hackathon sponsored by the United States Special Operations Command (SOCOM) and open to teams all over the world. We ended up winning Challenge 1 “Bounty Hunter” and placing second in Challenge 2 “Master Maven.”
The first challenge required us to locate a ship based on incomplete AIS and satellite data while taking into account environmental conditions in our predictions.
This probability heat map was used in combination with an algorithm that determined the list of satellites passing a given region at a given time to query the Planet API for satellite images. These images were sent through a series of image processing algorithms including: background subtraction, median filters, canny edge detection, segmenting, and applying bounding boxes around the ships. When we located the ship in question, the probabilistic trajectory approximation was recomputed by incorporating the most recent sighting as a part of the initial data and iterating again.
Now, can we solve the mystery of the Bermuda Triangle? Maybe not. But the judges were impressed with our processes and accuracy and we placed first in this challenge!
The second challenge was to find the world’s foremost expert in materials science with a specialization in exoskeleton armor.
We viewed this as an unstructured data problem and chose to solve it using a semantic knowledge graph. What exactly is that, you ask? Well, a semantic knowledge graph is a collection of nodes, each of which represents an unstructured document, linked together by the strength of the semantic relationship between the entities in each document. We built our knowledge graph on Apache Solr, an open-source document search technology.
We leveraged Solr to ingest over 1.4 million documents that we scraped from journal publications, patent filings, LinkedIn profiles, and Wikipedia pages. After feeding all of these documents to our knowledge graph, we were able to query the graph for candidates most related to terms like “armor”, “polymer layers”, and “materials science.” We then developed a scoring function to determine the relatedness between any two entities in the graph. Using this function, we found that Gareth McKinley was the armor expert most relevant to the challenge’s specifications. McKinley runs the Non-Newtonian fluids group at MIT and has been involved with development of armor for the department of defense in the past. Perhaps the most exciting aspect of our solution was the fact that it is generalized—so, using various query terms, we can search for the top expert in any domain very efficiently.
Taking home the first and second place prizes respectively in these two challenges was exhilarating and we were thrilled to share the news with the rest of the team (the ones that actually got some shut-eye that night). It’s certainly validating to come away with the win, but spending a night working on complex data issues with a team of really smart people was a great time in and of itself. We can’t wait for the next opportunity to do it again!
Sarthak Sahu is Head of Machine Learning at Virtualitics. He holds a Bachelor of Science in Computer Science with a focus in Machine Learning from Caltech.
The other brave Virtualitics’ participants in the Hackathon were: Michael Amori, Ciro Donalek, Richard Zhu, Anshul Ramachandran, Aakash Indurkhya, Nand Kishore, Siddharth Murching, Kshitij Grover and Chris Sanchez.
Categorised in: Case studies, Data Analytics, Data Visualization, News
This post was written by Sarthak Sahu