Earth Embeddings and Geospatial Representation Learning

Learning general-purpose representations that capture spatial, temporal, and contextual signals across Earth.

Embeddings, explained simply

An embedding is a short list of numbers that acts like a fingerprint for something complex (an image, text, or a place). Similar fingerprints mean similar content—enabling fast search, grouping, and prediction.

Key papers and main takeaways

Earth Embeddings: Towards AI-centric Representations of our Planet

Introduces Earth embeddings as an AI-native representation layer for geospatial data (a reusable “location representation” across tasks).
Frames embeddings as a bridge between databases (retrieval/indexing) and models (generalization/interpolation) across modalities and scales.
Outlines a community roadmap: standardized embedding products, evaluation, and tooling to make geospatial ML more reusable and comparable.

SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Proposes contrastive pretraining that matches satellite image features with their geographic coordinates to learn a location encoder.
Produces general-purpose location embeddings that transfer across many downstream tasks and improve geographic generalization.
Shows that geolocalized EO imagery can act as scalable supervision for learning “place representations” without dense labels.

Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks

Highlights why global location encoding is tricky: naïve coordinate embeddings can create spherical artifacts (notably near the poles).
Introduces a principled global encoder combining spherical harmonics (sphere-native basis) with sinusoidal representation networks (SIREN).
Demonstrates strong performance across benchmarks, motivating INRs/location encoders as a foundation for global Earth representations.

Measuring the Intrinsic Dimension of Earth Representations

Studies intrinsic dimension as a label-free lens on “how much information” Earth representations actually use (vs. their ambient vector size).
Finds intrinsic dimension is often much smaller than the embedding size and varies with resolution and training modality.
Shows intrinsic dimension can correlate with downstream performance and reveal spatial artifacts, supporting diagnostics and model selection.

Where we are heading

We build Earth embeddings to enable:

global retrieval (“find places like this”),
robust transfer across regions and sensors,
multimodal fusion (EO, climate, maps, text),
and interpretable representations with diagnostics that help scientific trust and use.