Welcome aboard to the AI Guild! We're a group of self-taught students from Florida International University who bridge curiosity with opportunities in Machine Learning and AI. Our main focus is leveraging workshops and projects to bring people closer to the cutting edge of academia. Feel free to take a look at our curriculum for Spring 2025 below!
AI Guild: Intro to Model Interpretability
AI Guild: Towards Monossemanticity
AI Guild: Variational Auto-Encoders
AI Guild: Reinforcement Learning in LLMs & TRL
AI Guild: Intro to Agents with OpenAI-SDK
AI Guild: Transformers and Attention
AI Guild: Post-Training for your LLM
AI Guild: HuggingFace & Ollama
AI Guild: Neural Networks in JAX
<aside>
</aside>
machine-learning/GPT2_(+KV_Cache_&_Kernel_Fusions).ipynb at main · ghubnerr/machine-learning

Results after 1 hour of distributed training on 8 A100s
**Generated Text:**
Once upon a time, I got asked how many legs a dog had. And I said: ative is a huge amount of people being thought to make anyone think that would be a high school in the world.
The whole family known as "ScAttling," said one of the largest people in the world are considering a large number of people to the world's largest and 40,000 people and four of them to the country.
There are a lot of people who live here on music or even the country and there are so many people who are looking for a dog who actually really loved and have their children looking for it, but one of them is so as many people want to buy.
"We are both a lot of people who are looking to find out and look at them," said one of the biggest causes of the world.
...
language_models_are_unsupervised_multitask_learners.pdf
Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.
KV-Cached Generation Scaling Time

Lecture Link
https://drive.google.com/file/d/1A1dFYaKPxC6o_nOTnpSCR1-x3DQuBAto/view?usp=drive_link

Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." 20 Dec. 2013,



Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).
machine-learning/jax/flax_cnns_rnns_lstms.ipynb at main · ghubnerr/machine-learning
class MLP(nn.Module):
hidden_sizes: list[int]
output_size: int
@nn.compact
def __call__(self, x):
for size in self.hidden_sizes:
x = nn.relu(nn.Dense(size, kernel_init=nn.initializers.xavier_uniform())(x))
x = nn.Dense(self.output_size, kernel_init=nn.initializers.xavier_uniform())(x)
return x


**Generated Text**
Once upon a time, urne , a father who can also discuss the EPA's administration in opposition from the revolt . Before levelling when the wealthiest senior year .The British withdrawal of the SSI sponsors division owned had gone , the first years after her decision went to a whole .
...
$$ \begin{aligned}f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \quad &\text{(forget gate)}\\i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \quad &\text{(input gate)}\\o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \quad &\text{(output gate)}\\\tilde{c}t &= \tanh(W_c \cdot [h{t-1}, x_t] + b_c) \quad &\text{(cell candidate)}\\c_t &= f_t \odot c_{t-1} + i_t \odot \tilde{c}_t \quad &\text{(cell state update)}\\h_t &= o_t \odot \tanh(c_t) \quad &\text{(hidden state update)}\end{aligned} $$
intro-to-gradient-descent.ipynb
Example Backward Propagation Across a 1-Layer Linear Regression NN
$$ L(w, b) = \frac{1}{n} \sum_{i=1}^n \left( \left( w x_i + b \right) - y_i \right)^2\\ \frac{\partial L}{\partial w}= \frac{2}{n} \sum_{i=1}^{n} x_i \left( \left( w x_i + b \right) - y_i \right)\\ \frac{\partial L}{\partial b}= \frac{2}{n} \sum_{i=1}^{n} \left( \left( w x_i + b \right) - y_i \right) $$
Computation Graph with Gradients
