The rapid emergence and ongoing integration of generative AI tools throughout the communication and media environment is potentially transformative of many economic and social domains. But what kind of new literacies do everyday users need to confidently understand, use and evaluate these technologies, and how can they be learned? At present, users are left to experiment with different prompts and compare outputs to gain an understanding of the operational workings of generative AI. Literacy projects driven by tech corporations are also tools to promote their products, and ultimately serve corporate ends, not those of users.
In this project, we are exploring how complex generative AI concepts can be communicated and demystified for non-experts by co-designing visual tools that employ open-source models – repackaged with visual explainers and interactive parameters – and focusing on the generation of text, images, and audio. Covering social, technical and environmental dimensions, our approach proposes an AI literacy program with the interests of everyday users at its core.
As we iteratively develop more resources throughout the duration of this research project, we will add to the below collection over time. These explainers assume a basic level of familiarity with Python scripting and Google Collab notebooks, but no background knowledge of Machine Learning or AI is required.
If you refer to these materials in a talk, poster, seminar, workshop, grant application, etc. please use the below recommended citation (authors listed alphabetically);
- Burgess, J., He, W., Snoswell, A. J., & Witzenberger, K. (2024). Unboxing GenAI: Building capacities for public understanding of Generative AI. https://ssrn.com/abstract=4920305
If you want to adapt or re-use these materials in any form for teaching or any other purpose, or if there is a topic you think we should cover – please get in touch!
A Gentle Introduction to Google Colab
These explainers use the Google Colab platform. This lets you, the reader, try out code interactively from your own device. If you’ve never used Google Colab before though, there can be a lot to take in. Start with this short introduction to get familiar with the platform.
A Gentle Introduction to Stable Diffusion
Our stable diffusion mini-series covers the most prevalent approach to text-to-image generation that is currently on the market – latent diffusion models. This series breaks down each component of an example open-source model (Stable Diffusion v1.4), explains the reasoning behind each component’s inclusion, and openly reconstructs the model’s algorithm in an approachable, non-technical and interactive format.
- 1401: Introduction to Latent Diffusion Models
- 1402: The CLiP text embedding model
- 1403: Variational Auto Encoders for image compression
- 1404: Convolutional UNet de-noiser
- 1405: Conclusion – putting it all together
Annotated Bibliography
-
- This paper is the first published generative text-to-image modelling paper, which used an LSTM to generate 32×32 images of input data outside of its training dataset.
-
- This is arguably the next big iteration in the development of text-to-image models – Conditional Generative Adversatial Networks (CGANs).
-
- The first Stable Diffusion paper – and the one that is covered by this tutorial series.
-
- OpenAI’s DALL-E 2 paper – a significant departure from the original DALL-E and conceptually more aligned with how Latent Diffusion Models work when generating images.
-
- An example of a text-to-music model which follows the broad text-encoder, media compressor/encoder, denoiser/generator stack, where T5 is the text model, EnCodec is the music compressor/encoder, and MusicGen itself works on the “latent” space of the encodec codebook outputs. A key difference here is that this model is an autoregressive transformer, and not diffusion-based, but many other music generators are diffusion based.
-
- An example of a text-to-video model which has broadly similar principles in its model cascade as Latent Diffusion Models. This model uses T5 as the text embedder, a novel 3D causal VAE as the video compressor/encoder, and then uses a diffusion transformer trained with v-prediction.
-
- The training dataset for Stable Diffusion 1.0. Derived from the web common crawl, with some cleanup.
-
- OpenAI’s text-image zero-shot classifier – stable diffusion uses the text embedding model developed here to generate their text embeddings.
-
- A great resource on text tokenization in general.
-
Kingma, D. P. (2013). Auto-encoding variational bayes. https://arxiv.org/abs/1312.6114
- The compression/encoding (and decoding) model used for images in the Stable Diffusion model cascade. This is the paper which first introduced the concept of VAEs.
-
- The denoising/generating model in stable diffusion. This is the paper which first introduces UNets to the field of computer vision – in this case, the application was for biomedical imaging, but their application has since generalised.
A Gentle Introduction to Transformer Language Models
Our Transformer Large Language Model miniseries begins by explaining the historical development and nature of transformer models, which are the backbone of today’s incredibly popular LLM-based chatbots. After this introduction, the mini-series then goes on to demystify the training and development of chat-finetuned LLMs, by interactively demonstrating the generative algorithms used by the models to answer prompts.
- 3301: History of Language Modeling in AI – under development!
- 3302: All about tokenization – coming soon
- 3302: Decoder-only Transformers – coming soon
- 3303: From Token Prediction to Chat – coming soon
A Gentle Introduction to Explainable AI
Generative AI systems universally rely on very large deep learning architectures. Explainable AI methods, which generate explanations of how these opaque systems work, have come to be an invaluable tool for AI developers as well as a range of down-stream stakeholders who need to understand what goes on inside Generative AI models. This mini-series introduced the field of Explainable AI research generally, then touches on a few illustrative XAI methods that highlight the strengths and weaknesses of XAI approaches to understanding model behavior.
- 8801: Introduction to Explainable AI methods
- 8802: Permuation Feature Importance for explaining tabular data models
- 8803: SHapley Additive exPlanations
- 8804: The LIME method and related approaches – coming soon
- 8805: Monosemantic Sparse Auto Encoders for Interpreting Transformer Knowledge Stores – coming soon