Learning a Dynamic Map of Visual Appearance

cnn architecture


The appearance of the world varies dramatically not only from place to place but also from hour to hour and month to month. Every day billions of images capture this complex relationship, many of which are associated with precise time and location metadata. We propose to use these images to construct a global-scale, dynamic map of visual appearance attributes. Such a map enables fine-grained understanding of the expected appearance at any geographic location and time. Our approach integrates dense overhead imagery with location and time metadata into a general framework capable of mapping a wide variety of visual attributes. A key feature of our approach is that it requires no manual data annotation. We demonstrate how this approach can support various applications, including image-driven mapping, image geolocalization, and metadata verification.



Related Papers

Cross-View Time (CVT) Dataset:

dataset fig

In our dataset, we have 305 011 ground-level images with the capture time and geolocation. For each image, we have the orthorectified overhead image centered on the geographic location.

Please contact us by email to receive access to the database.