You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Dump before implementation of proper cavity var optimization
* Dirty but more or less works
* Update README
* Update README #2
* Clean up tests folder
* Saturday dump
* Remove dirty example
* Clean up solvers
* Clean solvers and utils
* Update setup.py
* Update README and tests
* Clean up deq
* Clean modules
* Update README
* Simplifiy single layer example and remove plot from debug in deq
Copy file name to clipboardExpand all lines: README.md
+21-7
Original file line number
Diff line number
Diff line change
@@ -2,15 +2,27 @@
2
2
3
3
## Deep Implicit Attention
4
4
5
-
_The return of the Boltzmann machine_
5
+
Experimental implementation of deep implicit attention in PyTorch.
6
6
7
-
---
7
+
**Summary:** Using deep equilibrium networks to implicitly solve a set of self-consistent mean-field equations of a random Ising model implements attention as a collective response 🤗 and provides insight into the transformer architecture, connecting it to mean-field theory, message-passing algorithms, and Boltzmann machines.
8
8
9
-
Experimental implementation of deep implicit attention in PyTorch.
9
+
**Blog post (in preparation): _Deep Implicit Attention: A Mean-Field Theory Perspective on Attention Mechanisms_**
10
+
11
+
## To-do
10
12
11
-
**Key idea:** Use deep equilibrium networks to implicitly solve a set of self-consistent mean-field equations of a random Ising model: attention as a collective response 🤗.
13
+
### Modules
14
+
-[x] Add a `GeneralizedIsingGaussianAdaTAP` module implementing the adaptive TAP mean-field equations for an Ising-like vector model with standard multivariate Gaussian priors over spins
15
+
-[ ] Figure out the analytical Gibbs free energy for `GeneralizedIsingGaussianAdaTAP` and implement it to be able to use it as a stand-alone loss function
16
+
-[ ] Look into making the parameters of the multivariate Gaussian priors in `GeneralizedIsingGaussianAdaTAP` trainable
17
+
-[ ] Add a `VanillaSoftmaxAttention` module which reproduces vanilla softmax attention, i.e. implementing coupling weights between spins which depend solely on linear transformations of the external sources (queries/keys) and replacing the self-correction term with a parametrized position-wise feed-forward network
12
18
13
-
**Blog post (in preparation):** <ahref="https://mcbal.github.io/">Deep Implicit Attention: A Mean-Field Theory Perspective on Attention Mechanisms</a>
19
+
### Models
20
+
-[ ] Add a `DeepImplicitAttentionTransformer` model
See `tests` for now until `examples` folder is populated.
32
44
33
-
## Selection of references
45
+
## References
46
+
47
+
### Selection of literature
34
48
On variational inference, iterative approximation algorithms, expectation propagation, mean-field methods and belief propagation:
35
49
-[Expectation Propagation](https://arxiv.org/abs/1409.6179) (2014) by Jack Raymond, Andre Manoel, Manfred Opper
36
50
@@ -48,7 +62,7 @@ On deep equilibrium networks:
48
62
-[Chapter 4: Deep Equilibrium Models](https://implicit-layers-tutorial.org/deep_equilibrium_models/) of the [Deep Implicit Layers - Neural ODEs, Deep Equilibirum Models, and Beyond](http://implicit-layers-tutorial.org/), created by Zico Kolter, David Duvenaud, and Matt Johnson
0 commit comments