Catégorie : Combinatoire magique

Comparison between Google S2R, Pribor’s Combinatorial Magic and Pribor’s CHE (Contextual Hyper-Embedding)

This document presents the characteristics, divergences and synergies between three approaches: Google S2R, Pribor’s Combinatorial Magic and Pribor’s CHE (Contextual Hyper-Embedding).

1. Google’s S2R *

“S2R” means Speech-to-Retrieval. It is a recent voice search architecture that Google is deploying, which bypasses the explicit speech → text transcription step to try to directly establish a match between the spoken audio and the information sought. The model relies on a dual encoder: one processes the audio, the other the candidate texts, in order to bring their vector representations closer together in the same semantic space.

2. Pribor’s Combinatorial Magic

Combinatorial Magic is a bijective, lossless and fixed-dimensional encoding of simple sentences into 4D or 5D vectors: three symbolic components (Subject, Verb, Object) plus a “meta” register of 8 or 16 bits. It is distinguished by O(1) complexity, total absence of information loss, and perfect interpretability.

3. CHE (Contextual Hyper-Embedding uint8)

CHE is an extremely economical contextual encoding approach, which represents each token by a uint8 integer. Unlike the floating-point attention of Transformers, it avoids matrices and softmax, reducing energy consumption by a factor of up to 5000.

4. Comparative Table

Feature	S2R (Google)	Combinatorial Magic	CHE
Data type	float16 / float32	symbolic indices + meta uint8	uint8
Dimension	512–4096D	4D / 5D	1 byte/token
Complexity	O(n²)	O(1)	O(n) linear
Information loss	with loss	none	bounded / quantized
Energy efficiency	low	extreme	extreme (×500–5000)
Interpretability	low	total	medium

5. Synergies and Integration

The three approaches can be integrated into a hybrid architecture: S2R provides the global semantic geometry, CHE ensures contextual efficiency through uint8 quantisation, and Combinatorial Magic formalises symbolic propositions without loss. This combination gives rise to a family of S2R–CHE–CM models combining semantic generalisation, energy frugality and complete interpretability.

* Ehsan Variani and Michael Riley, Research Scientists, Google Research, « Speech-to-Retrieval (S2R): A new approach to voice search », October 7, 2025

24 octobre 2025

PRIBOR : CHE (Contextual Hyper-Embedding uint8)

CHE (Contextual Hyper-Embedding uint8) est plus économique que l’attention classique des LLMs. Des processus similaires sont déjà utilisés mais moins économiques que CHE.

————————————————–

1. Économie de mémoire

• Attention standard : matrices float16/float32 → 700 à 4000 bits par token

• CHE uint8 → 8 bits par token

→ gain × 500 à × 5000 en mémoire

————————————————–

2. Processus similaires déjà utilisés

• INT-FlashAttention (Peking University, 2024) : attention entièrement en INT8, 72 % plus rapide, 82 % moins d’erreur

• SageAttention (OpenReview, 2024) : attention en INT8 + lissage, plug-and-play

• LLM.int8() (NeurIPS 2022) : multiplication matricielle entièrement en INT8

→ uint8 est déjà standard dans l’attention quantifiée.

————————————————–

3. Compatibilité avec CHE

• CHE = uint8 comprimé (SHA-256[0:8]) → 8 bits par token

• Pas de matrice 700×700, pas de softmax, pas de float ;

• Juste un uint8 dans le triplet ℝ⁴ ;

→ Plus économique et déjà utilisé dans l’attention quantifiée.

Contact : pauljorion@pribor.ai

19 octobre 2025

Logique « Combinatoire magique » – Preuve de concept

Codage sans perte à 4 scalaires × réduction de mémoire de 175 × décodage en 1 cycle

Énoncé : toute phrase simple peut être codée sans perte en 4 scalaires
(3 chaînes UTF-8 ≤ 16 octets chacune + 1 uint8) tout en préservant les
rôles d’agent / patient / possesseur et 10 catégories + 4 causes.

1. Définition du vecteur 4-D

Dim	Type	Longueur max.	Sémantique
0	Chaîne UTF-8	16 B	Agent (initiateur)
1	Chaîne UTF-8	16 B	Racine du prédicat (action)
2	Chaîne UTF-8	16 B	Patient (personne subissant l’action)
3	uint8	1 B	Bitmap : possesseur + 4 causes + 6 de réserve

Total = 128 bits (16 octets) – aligné sur une ligne de cache de 64 octets → aucun gaspillage de remplissage à zéro.

2. Disposition du bitmap (1 octet)

bit 0 : 1 = l'agent est le possesseur
bit 1 : 1 = le patient est le possesseur
bit 2 : 1 = cause matérielle présente
bit 3 : 1 = cause formelle présente
bit 4 : 1 = cause efficiente présente
bit 5 : 1 = cause finale présente
bits 6-7 : réservés (0)

3. Exemple concret

Phrase : « Alice donne son livre à Bob. »

Agent : Alice
Prédicat : donner
Patient : livre
Bitmap : 0b00010101 → possesseur = agent, cause efficiente et finale signalées.

Charge utile totale : 3×5 + 1 = 16 octets → 128 bits.

4. Gain de mémoire par rapport à l’intégration 700-D Float32

700-D × 4 B = 2 800 B
Magie combinatoire = 16 B
Gain = 2800 / 16 ≈ ×175

5. Garanties de cohérence

Disjonction agent-patient : imposée par le schéma (dim 0 ≠ dim 2).
Possesseur unique : le bitmap permet à un seul des {agent, patient} d’être marqué comme possesseur.
10 catégories : mappées sur des emplacements à 3 chaînes + 1 octet méta.
4 causes : encodées dans le bitmap ; absence = 0.

6. Test de réversibilité

Étant donné le vecteur 4D ci-dessus, la surface de la phrase originale peut être régénérée de manière déterministe à l’aide du modèle :

{Agent} {prédicat}s {patient} [indicateur de possesseur → « son »/« sa »].

✓ Reconstruction exacte → sans perte.

7. Références

Aristote, Catégories & Métaphysique Δ
Dowty, D. 1991. « Thematic Proto-Roles »

3 octobre 2025

Catégorie : Combinatoire magique

Comparison between Google S2R, Pribor’s Combinatorial Magic and Pribor’s CHE (Contextual Hyper-Embedding)

Comparison between Google S2R, Pribor’s Combinatorial Magic and Pribor’s CHE (Contextual Hyper-Embedding)

1. Google’s S2R *

2. Pribor’s Combinatorial Magic

3. CHE (Contextual Hyper-Embedding uint8)

4. Comparative Table

5. Synergies and Integration

PRIBOR : CHE (Contextual Hyper-Embedding uint8)

1. Économie de mémoire

2. Processus similaires déjà utilisés

3. Compatibilité avec CHE

Logique « Combinatoire magique » – Preuve de concept

Codage sans perte à 4 scalaires × réduction de mémoire de 175 × décodage en 1 cycle

1. Définition du vecteur 4-D

2. Disposition du bitmap (1 octet)

3. Exemple concret

4. Gain de mémoire par rapport à l’intégration 700-D Float32

5. Garanties de cohérence

6. Test de réversibilité

7. Références