Below you will find pages that utilize the taxonomy term “Llm.c”
Metal parallelization of llm.c
Metal is Apple’s low-level API for GPU programming and llm.c is Andrej Karpathy’s plain C and CUDA implementation of GPT-2. The C version leverages OpenMP to parallelize the layer functions on the CPU cores. The CUDA version is highly optimized for multi-node multi-accelerator parallelization on NVIDIA GPUs using Open MPI and NCCL.
I once ported the C version to Swift and used Grand Central Dispatch (GCD) for CPU parallelization. The Xcode project is in llm.swift. Despite using the -Ounchecked
Swift compiler option to generate fast code without bounds checks the C version compiled with clang
runs about 6 times faster than Swift.