Local Intelligence

Architecture2Music | Music2Architecture, a multi-modal AI

Project

Unveiling Tacit Knowledge through AI across Architecture and Music

Local Intelligence is a research project that frames locality not merely as a geographical reference but as a carrier of tacit, culturally embedded knowledge. It explores the possibility that such knowledge—dispersed across domains like architecture and music—can be learned, interpreted, and rearticulated through multimodal artificial intelligence systems.

Why This Project?

As contemporary design workflows increasingly lean on standardized algorithms and global aesthetics, the idiosyncratic patterns of local cultures risk being flattened or overlooked. Yet, music, architecture, language, and other cultural artifacts preserve embodied memories, affective rhythms, and situated know-how. This project reclaims AI not as a neutral tool of automation, but as a medium for exploring and revealing latent cultural intelligence.

The Core Hypothesis

We hypothesize that vernacular architecture and local folklore music, when emerging from the same community and historical context, share a set of implicit structures. These may manifest in patterns of rhythm, proportion, texture, or repetition. By training AI models to translate between these domains, we investigate whether machines can learn these cross-modal resonances—not to imitate culture, but to surface its hidden logics.

Conceptual Framework

Local Intelligence conceptualizes a distributed network of local knowledge nodes—architecture, music, language, cuisine, and more. These nodes are not organized hierarchically, but rhizomatically: they are interconnected across spatial, social, and temporal dimensions. We define “local intelligence” as the capacity of these nodes to form emergent patterns when interpreted through machine learning.

In this sense, AI is positioned not only as a pattern-recognition engine but as a partner in speculative interpretation, capable of surfacing connections that exceed human perceptual limits.

The project challenges linear knowledge hierarchies (data > information > knowledge > wisdom) and instead proposes a networked epistemology mediated by machines.



Stored & transferred

Concept

Process

Explicit

Tacit

Wisdom

execution

reasoning

intuition

Understanding

recognition

patterns

concepts

Local Intelligence

modeling

local presence

heritage

Artificial Intelligence


neural network

latent space

Knowledge

cognition

instruction

experience

Information

description

sorting data

NA

Data

observation

sensory

symbolic

The Questions We Ask

  • Can locality function as a scale of analysis to uncover patterns in architectural and musical creation?

  • Is there a shared latent structure between built form and musical rhythm within the same cultural-historical milieu?

  • Can machine learning models detect and recreate this structure across domains?

  • Can we generate architectural imagery from music (music2architecture), or musical spectrograms from architectural facades (architecture2music)?

How We Did It

The Local Intelligence project employed a multimodal machine learning pipeline designed to translate between audio and visual domains, using two paired datasets from the Aegean region:

  • A dataset of vernacular architectural facades scraped and curated using Octoparse and manual filtering techniques.

  • A corresponding dataset of folk music samples, with a focus on Rebetiko traditions, converted into mel-spectrograms via the Librosa library in Python.


These two datasets formed the foundation for training image-to-image translation models based on Generative Adversarial Networks (GANs). Two different GAN architectures were used:

  1. Pix2Pix (Isola et al., 2017) – a conditional GAN requiring paired training data. It was used to translate spectrograms into architectural images and vice versa.

  2. CycleGAN (Zhu et al., 2017) – an unpaired GAN model which enabled more flexible domain translation, especially where direct data pairing was not available.


To increase model diversity and representation accuracy, we ran comparative tests with different activation functions—notably the traditional Leaky ReLU versus the Mish activation function. While Leaky ReLU provided better image realism, Mish delivered more recognizable local features (e.g., exedras, overhangs).

To evaluate the generated outputs, we implemented two key image similarity metrics:

  • DeepAI’s perceptual similarity score, comparing learned feature vectors across images.

  • Perceptual Hashing (pHash), measuring how visually close the generated images were to the ground-truth dataset based on facade features.


In the music2architecture direction, generated architectural images displayed structural cues aligned with the rhythm and harmonic complexity of the music input. Conversely, in the architecture2music task, spectrogram outputs retained tonal envelopes comparable to their architectural counterparts—though with reduced fidelity due to the nature of inverse translation.

Additionally, a diffusion model was employed to post-process the GAN outputs and enhance the legibility of facade features. The prompt “Aegean architecture” was used consistently to avoid semantic bias.

Publications

Get in touch to transforming artificial intelligence methods and techniques into real-life applications.

United Methods of Artificial Intelligence Lab

Get in touch to transforming artificial intelligence methods and techniques into real-life applications.

United Methods of Artificial Intelligence Lab

Get in touch to transforming artificial intelligence methods and techniques into real-life applications.

United Methods of Artificial Intelligence Lab