Welcome aboard to the AI Guild! We're a group of self-taught students from Florida International University who bridge curiosity with opportunities in Machine Learning and AI. Our main focus is leveraging workshops and projects to bring people closer to the cutting edge of academia. Feel free to take a look at our curriculum for Spring 2025 below!

Presentations

AI Guild Kick-off

AI Guild: Gradient Descent

AI Guild: Intro to Model Interpretability

Early Fusion and MoE

AI Guild: Towards Monossemanticity

AI Guild: PyTorch

AI Guild: Variational Auto-Encoders

AI Guild: Reinforcement Learning in LLMs & TRL

AI Guild: Intro to Agents with OpenAI-SDK

AI Guild: Transformers and Attention

AI Guild: Post-Training for your LLM

AI Guild: HuggingFace & Ollama

AI Guild: Neural Networks in JAX

AI Guild: DeepSeek-R1

<aside>

</aside>

Projects

GPT-2 (124M) From Scratch

machine-learning/GPT2_(+KV_Cache_&_Kernel_Fusions).ipynb at main · ghubnerr/machine-learning

image.png

Results after 1 hour of distributed training on 8 A100s

**Generated Text:**
Once upon a time, I got asked how many legs a dog had. And I said: ative is a huge amount of people being thought to make anyone think that would be a high school in the world.

The whole family known as "ScAttling," said one of the largest people in the world are considering a large number of people to the world's largest and 40,000 people and four of them to the country.

There are a lot of people who live here on music or even the country and there are so many people who are looking for a dog who actually really loved and have their children looking for it, but one of them is so as many people want to buy.

"We are both a lot of people who are looking to find out and look at them," said one of the biggest causes of the world.
...

Language Models are Unsupervised Multitask Learners

language_models_are_unsupervised_multitask_learners.pdf

Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.

KV-Cached Generation Scaling Time

image.png


Variational Auto-Encoders for MNIST Digit Recognition

vae_mnist_jax.ipynb

Lecture Link

https://drive.google.com/file/d/1A1dFYaKPxC6o_nOTnpSCR1-x3DQuBAto/view?usp=drive_link

image.png

Auto-Encoding Variational Bayes

1312.6114v11.pdf

Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." 20 Dec. 2013,


Reinforcement Learning in JAX

Google Colab

image.png

image.png

Playing Atari with Deep Reinforcement Learning

1312.5602v1.pdf

image.png

Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).


**Building Basic Neural Networks w/ Flax *(*RNNs, LSTMs, CNN)

machine-learning/jax/flax_cnns_rnns_lstms.ipynb at main · ghubnerr/machine-learning

MLPs

class MLP(nn.Module):
    hidden_sizes: list[int]
    output_size: int

    @nn.compact
    def __call__(self, x):
        for size in self.hidden_sizes:
            x = nn.relu(nn.Dense(size, kernel_init=nn.initializers.xavier_uniform())(x))
        x = nn.Dense(self.output_size, kernel_init=nn.initializers.xavier_uniform())(x)
        return x

image.png

CNNs

image.png

RNNs

**Generated Text**
Once upon a time, urne , a father who can also discuss the EPA's administration in opposition from the revolt . Before levelling when the wealthiest senior year .The British withdrawal of the SSI sponsors division owned had gone , the first years after her decision went to a whole . 
...

LSTMs

$$ \begin{aligned}f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \quad &\text{(forget gate)}\\i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \quad &\text{(input gate)}\\o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \quad &\text{(output gate)}\\\tilde{c}t &= \tanh(W_c \cdot [h{t-1}, x_t] + b_c) \quad &\text{(cell candidate)}\\c_t &= f_t \odot c_{t-1} + i_t \odot \tilde{c}_t \quad &\text{(cell state update)}\\h_t &= o_t \odot \tanh(c_t) \quad &\text{(hidden state update)}\end{aligned} $$


Tensor Autograd (Miniature Auto-Differentiation Engine)

intro-to-gradient-descent.ipynb

Example Backward Propagation Across a 1-Layer Linear Regression NN

$$ L(w, b) = \frac{1}{n} \sum_{i=1}^n \left( \left( w x_i + b \right) - y_i \right)^2\\ \frac{\partial L}{\partial w}= \frac{2}{n} \sum_{i=1}^{n} x_i \left( \left( w x_i + b \right) - y_i \right)\\ \frac{\partial L}{\partial b}= \frac{2}{n} \sum_{i=1}^{n} \left( \left( w x_i + b \right) - y_i \right) $$

Computation Graph with Gradients

image.png