Doing AI better
Hello reader, I hope you are enjoying the summer where you are. I apologize for the delay - I was busy with finals and work. This issue focuses on some good reading in the AI space (no prompt “engineering” nonsense here).
This is a highlight of my (and my friend’s*) work. As a part of our research methods class, we benchmarked popular Automatic Speech Recognition (ASR) systems delivered via API. Unfortunately, this is not a comprehensive review. For example, Google’s new USM system was omitted due to lack of access and so was AWS Transcribe due to a lack of time. But it does include major systems like OpenAI Whisper v2, Assembly Conformer, and more.
There are some surprising results. One was that OpenAI Whisper is not the best-in-class ASR system anymore - it is the Assembly conformed with its hybrid architecture between convolutional nets and transformers. For more like this, I recommend reading the paper. We have made the code and data available.
A Better Way to draw neural nets
Neural nets are often described using the diagram style below. This design is inspired by flow diagrams in graph theory but only loosely. It violates several conventions of flow diagrams and does not describe the neural net accurately - for example, the missing bias terms or representing outputs within neurons.
Aaron argues for a new convention to describe these networks in an elegant and easy-to-learn way. Though I maintain that code is the best way to describe large networks succinctly, this new convention is promising for smaller systems and for pedagogical purposes.
It is not easy to estimate the compute and memory requirements of an LLM training run or inference. It is harder still to compare these across model families. Eleuther AI has published a set of napkin-math equations to estimate resource use based on some simple parameters. In my work with ASR and text-generation systems at Delta, I found memory constraints while performing inference a big headache and found this blog helpful. I think many working with LLMs will find this similarly helpful.
A caveat here is that the space is moving rapidly and popular models get optimized rapidly (see batching-based optimizations in Whisper JAX) but this is a good starting point.
*The friend in question here is a fellow Georgia Tech CS major - Mehul Rastogi. You can find his awesome work here: mehulrastogi.com