AI-Ready: Making petabytes of data more discoverable and usable
NASA embraces open science. IMPACT works to enable open data for NASA tools such as Worldview which gives users access to over 450 terabytes of satellite imagery. Open data is critical to research. Before embarking on a scientific study related to particular phenomena, such as wildfires, scientists need to collect numerous examples of these phenomena. Locating these examples requires searching through 197 million square miles of satellite imagery each day across more than 20 years of data. Such an effort can produce a valuable trove of data, but the act of manually searching the data is cumbersome and laborious. Making large amounts of data more discoverable and usable for specific parameter extraction is a hard problem.
A question such as “Can we use new techniques, such as self-supervised learning, to tackle our data discovery problem?” has a number of hidden questions:
•Can we find a needle in a haystack?
• Can we teach a machine to search fine-grained data without labels?
• Can we get artificial intelligence (AI) to present examples to a human when it gets confused?
• Can we scale up the search from gigabytes to terabytes to petabytes?
• Can we learn to represent rare events?
• Can we create tools that make it simple to ingest the data?
• Can we teach AI to focus on the interesting parts?
• Can we search several years of data covering the entire planet in under a second?
To tackle these questions, IMPACT embraced an open science approach and partnered with the SpaceML initiative, an international AI accelerator for citizen scientists and a branch of Frontier Development Lab in partnership with NASA, the SETI Institute, and Trillium Technologies Inc. SpaceML engages early career research engineers and connects them with mentors who are senior machine learning and software engineering experts. Current participants range from high school graduates to graduate students, all the way to industry professionals, as well as contributors from non-traditional computer science academic backgrounds, including two high school teachers transitioning their careers to data science.