Foundation Models for Science
Foundation models for science potentially represent a paradigm shift in machine learning from single-purpose, supervised machine learning pipelines, to general purpose, self-supervised models running at enormous scales.
Foundation models can:
act as a queryable knowledge store of large volumes of data
democratize the use of AI: enabling faster roll-out of capabilities such as segmentation, classification, object detection, change detection and regression - accessible even to people without coding skills
generalize well: maintain high performance across distinct geographic areas or between time-periods, and
be generative: producing predictions, synthetic data and model phenomena.
Foundation models translate the important information from multiple sensors and simulations into lower-dimensional ‘embeddings’ that are much easier to query, or use to generate new insight-level data, or directly answer questions.
However the journey from raw-data to embeddings requires trade-offs and therefore the way these models are built requires science-led thinking on how to make them genuinely useful tools for discovery and analysis.
Trillium is currently developing two Foundation Models:
SDO-FM (in partnership with NASA and Google Cloud) is a multi-model foundation model of our Sun that aggregates solar magnetic data and multi-band images of the Sun’s atmosphere into the embedding space.
SDO-FM uses the SDOML V2 analysis ready data product. A live inference stream can be viewed here: http://sdomldemo.org/
An overview of SDOML V2 can be viewed here:
SAR-FM (in partnership with the LSA, Lux Provide and ESA) is a foundation model trained using synthetic aperture radar (SAR) data to generalize across geographic regions, for a broad range of downstream applications.
SAR-FM won best paper at the Climatechange.ai workshop at NeurIPS 2023. The paper can be viewed here: https://www.climatechange.ai/papers/neurips2023/76