All Projects [AI Guild]

Welcome aboard to the AI Guild! We're a group of self-taught students who bridge curiosity with opportunities in Machine Learning and AI. Our main focus is leveraging workshops and projects to bring people closer to the cutting edge of academia. Feel free to take a look at our curriculum for Spring 2025 below!

Presentations

AI Guild Kick-off

AI Guild: Gradient Descent

AI Guild: Intro to Model Interpretability

Early Fusion and MoE

AI Guild: Towards Monossemanticity

AI Guild: PyTorch

AI Guild: Variational Auto-Encoders

AI Guild: Reinforcement Learning in LLMs & TRL

AI Guild: Intro to Agents with OpenAI-SDK

AI Guild: Transformers and Attention

AI Guild: Post-Training for your LLM

AI Guild: HuggingFace & Ollama

AI Guild: Neural Networks in JAX

AI Guild: DeepSeek-R1

<aside>

To see all meetings, visit Meetings!

</aside>

Projects

GPT-2 (124M) From Scratch

Trained on 10% of OpenWebText sharded on 8 A100s using JAX.
Optimized for KV-Caching, achieving linear token/s performance.
Bonus: Applied Pallas (XLA’s “CUDA”) Flash Attention v2 kernels to boost performance

machine-learning/GPT2_(+KV_Cache_&_Kernel_Fusions).ipynb at main · ghubnerr/machine-learning

Results after 1 hour of distributed training on 8 A100s

**Generated Text:**
Once upon a time, I got asked how many legs a dog had. And I said: ative is a huge amount of people being thought to make anyone think that would be a high school in the world.

The whole family known as "ScAttling," said one of the largest people in the world are considering a large number of people to the world's largest and 40,000 people and four of them to the country.

There are a lot of people who live here on music or even the country and there are so many people who are looking for a dog who actually really loved and have their children looking for it, but one of them is so as many people want to buy.

"We are both a lot of people who are looking to find out and look at them," said one of the biggest causes of the world.
...

Language Models are Unsupervised Multitask Learners

language_models_are_unsupervised_multitask_learners.pdf

Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.

KV-Cached Generation Scaling Time

Variational Auto-Encoders for MNIST Digit Recognition

Implemented in Flax & Optax to reconstruct smooth interpolations of a latent space
Trained on MNIST representations.

vae_mnist_jax.ipynb

Lecture Link

https://drive.google.com/file/d/1A1dFYaKPxC6o_nOTnpSCR1-x3DQuBAto/view?usp=drive_link

Auto-Encoding Variational Bayes

1312.6114v11.pdf

Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." 20 Dec. 2013,

Reinforcement Learning in JAX

RL Agents all the way from Q-Learning to Actor-Critic methods, in Flax

Google Colab

Playing Atari with Deep Reinforcement Learning

1312.5602v1.pdf

Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).

**Building Basic Neural Networks w/ Flax *(**RNNs, LSTMs, CNN)*

RNNs, CNNs, LSTMs, and MLPs implemented in Flax.

machine-learning/jax/flax_cnns_rnns_lstms.ipynb at main · ghubnerr/machine-learning

MLPs

class MLP(nn.Module):
    hidden_sizes: list[int]
    output_size: int

    @nn.compact
    def __call__(self, x):
        for size in self.hidden_sizes:
            x = nn.relu(nn.Dense(size, kernel_init=nn.initializers.xavier_uniform())(x))
        x = nn.Dense(self.output_size, kernel_init=nn.initializers.xavier_uniform())(x)
        return x

CNNs

RNNs

**Generated Text**
Once upon a time, urne , a father who can also discuss the EPA's administration in opposition from the revolt . Before levelling when the wealthiest senior year .The British withdrawal of the SSI sponsors division owned had gone , the first years after her decision went to a whole . 
...

LSTMs

$$ \begin{aligned}f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \quad &\text{(forget gate)}\\i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \quad &\text{(input gate)}\\o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \quad &\text{(output gate)}\\\tilde{c}t &= \tanh(W_c \cdot [h{t-1}, x_t] + b_c) \quad &\text{(cell candidate)}\\c_t &= f_t \odot c_{t-1} + i_t \odot \tilde{c}_t \quad &\text{(cell state update)}\\h_t &= o_t \odot \tanh(c_t) \quad &\text{(hidden state update)}\end{aligned} $$

Tensor Autograd (Miniature Auto-Differentiation Engine)

Implemented scalar auto-differentiation engine in Python for PyTorch-like Tensor API for optimization in computational graphs and neural networks.
Applied gradient descent algorithms on weights created with Tensors to solve linear regression tasks on synthetic data, achieving near-perfect performance in approximation.
Designed backward methods for different mathematical operations with regards to their derivatives and a topological sorting of the graph, allowing for back-propagation.

intro-to-gradient-descent.ipynb

Example Backward Propagation Across a 1-Layer Linear Regression NN

$$ L(w, b) = \frac{1}{n} \sum_{i=1}^n \left( \left( w x_i + b \right) - y_i \right)^2\\ \frac{\partial L}{\partial w}= \frac{2}{n} \sum_{i=1}^{n} x_i \left( \left( w x_i + b \right) - y_i \right)\\ \frac{\partial L}{\partial b}= \frac{2}{n} \sum_{i=1}^{n} \left( \left( w x_i + b \right) - y_i \right) $$

Computation Graph with Gradients