The Unseen Blurriness of ChatGPT

An article analyzing how much of an innovation ChatGPT truly is: is it truly game changing?

4/22/2023

In 2013, a German construction company discovered that their Xerox photocopier produced inaccurate copies of a floor plan. The original floor plan had specified the area of each room as 14.13, 21.11, and 17.42 square meters, respectively, but the photocopied versions of the plan labeled all three rooms as having an area of 14.13 square meters. After the company contacted computer scientist David Kriesel, he discovered that the photocopier used a lossy compression format called jbig2, which was designed for use with black-and-white images. The compression algorithm deemed the area labels of the rooms similar enough to store only one value for all three rooms, resulting in the inaccurate copies.

The problem with lossy compression algorithms like jbig2 isn't that they're inherently flawed, but rather that the compression artifacts they produce can be difficult to recognize. If the copies had been noticeably blurry, it would have been clear that they were not accurate reproductions of the original. However, because the photocopier produced readable copies, the inaccuracies weren't immediately apparent. This incident highlights the potential problems with using lossy compression algorithms in certain contexts, where inaccuracies can have significant consequences.

Interestingly, large language models like OpenAI's ChatGPT can be thought of as lossy compression algorithms, similar to how images are compressed as jpegs. These models store vast amounts of text using lossy compression, retaining much of the information on the web, but an exact match cannot be guaranteed. However, the approximation presented in grammatical text is usually acceptable to users. The nonsensical answers or "hallucinations" that large language models are prone to producing can be seen as compression artifacts, similar to how an image program reconstructs a pixel by looking at nearby pixels and calculating the average.

This compression analogy offers a useful corrective to the tendency to anthropomorphize large language models and highlights the importance of better text compression in the creation of human-level artificial intelligence. The greatest degree of compression can be achieved by understanding the text, as seen in the example of deriving the principles of arithmetic and writing the code for a calculator program. The same logic applies to compressing a slice of Wikipedia - the more the program knows, the more words it can discard when compressing pages about a specific topic.

While large language models like ChatGPT can identify statistical regularities in text and answer questions about certain topics, they often fail to derive the underlying principles behind them. For instance, while they may be able to answer questions on economic theory, they struggle to derive the principles of arithmetic. This limitation results in the models producing superficial approximations that can sometimes be confusing when differentiating between acceptable and unacceptable types of information. While the blurriness of large language models can be useful for content mills, it can make it harder for people to find the information they are looking for online.

OpenAI's forthcoming successor to ChatGPT, GPT-4, is predicted to exclude material generated by ChatGPT or any other large language model to avoid creating more compression artifacts. This move underscores the importance of finding better text compression methods for large language models to produce more accurate and reliable results.