Below you will find pages that utilize the taxonomy term “LLM”
Metal parallelization of llm.c
Metal is Apple’s low-level API for GPU programming and llm.c is Andrej Karpathy’s plain C and CUDA implementation of GPT-2. The C version leverages OpenMP to parallelize the layer functions on the CPU cores. The CUDA version is highly optimized for multi-node multi-accelerator parallelization on NVIDIA GPUs using Open MPI and NCCL.
I once ported the C version to Swift and used Grand Central Dispatch (GCD) for CPU parallelization. The Xcode project is in llm.swift. Despite using the -Ounchecked
Swift compiler option to generate fast code without bounds checks the C version compiled with clang
runs about 6 times faster than Swift.
Run llm.c in TornadoVM
TornadoVM lets Java programs execute on accelerators. llm.c is a plain C implementation of OpenAI‘s GPT-2, the LLM that powered the 1st ChatGPT. Released in fall ‘22, it sparked an AI hype that still lasts. Both are not a perfect fit at first glance, but a Java version of llm.c could make them friends, so I tried to bring them together.
Although there was already a Java port of llm.c, I made my own to get (back) into the groove of Java. I defined some obvious classes, turned C functions into Java methods, replaced pointers with array inidices, used Java Streams instead of OpenMP to parallelize for
-loops, and leveraged the Java Vector API for matrix multiplication (the latter taken from llama2.java, thx for sharing @TheMukel).