Summarization: The MP3 of AI

In Situ Knowledge Clues

Summarization: The MP3 of AI

While there are endless untapped use cases for AI, one of the original is still one of the most common: summarization. Who doesn’t want AI to read long documents and provide all the answers? Not only does it save huge amounts of time, AI can even rewrite the information in any style you like.

But there are problems with AI summaries. Lots of them. So many, I hope this post and the research we’re doing at TrustGraph, will launch interest into this problem that will likely take years to solve. Years? Why? In so many ways, using AI to summarize text is just like the MP3.

The What?

At first, I considered that heading to be a bit facetious, but times have changed. Entire books and documentaries could be written about the history of the MP3 and its impact on music. As the internet began to explode in the late 90s, homes were dialing into the internet, literally, at blistering speeds of 56kbps. Music extracted from a CD was natively in .WAV files, roughly 10MB per minute. A CD of music might be 600MB, taking 23.8 hours to download at 56kbps. As you can imagine, no one was downloading music as .WAV files. Enter the MP3. The MP3 is an audio compression format that can reduce music files to dramatically smaller sizes, often 90% smaller. Instead of taking an entire day to download an album, it might only take a few hours.

But, there’s a catch. That file size compression doesn’t come without a penalty - loss, degradation in quality. The .WAV file is a lossless audio format. The MP3 is a lossy audio format. The way you get smaller file sizes is by throwing away some of the actual audio. And that’s exactly what early MP3 encoders did, often dumping all audio above 12 kHz. 

MP3 spectrogram showing removed high frequencies.

If human hearing spans 20 Hz to 20 kHz, how can you get away with just deleting so much audio? Turns out, the human ear is tuned in a funny way. We hear certain frequencies much better than others, and not surprisingly, those frequencies happen to correspond to those of human speech. Knowing that, you can just throw away some of the spectral range without much impact

This Has Something to Do with AI?

Yes, but not yet. Also, when’s the last time anyone thought about MP3s? And that’s my point. During the 90s, 00s, and even into the 2010s, people still considered audio bitrates when listening to music. But now? Everyone uses streaming music services, and no one thinks about audio bitrates. Why? Two fold, audio encoders and audio engineering have both improved making lossy formats viable. After over a decade, both sides have developed sophisticated tools to provide the listener a seemingly lossless listening experience.

AI Now?

Yes, now back to AI. The summarization process is compression. We’re asking a Language Model to read a text corpus and provide the key points, often compressed at far higher rates than even 90% like MP3s. Asking a Language Model to provide a succinct explanation will remove some detail. The question becomes, is that detail important? Now, we’re opening a real can of worms.

Grok’s idea of Indiana Jones opening up a can of worms.

As I said earlier, there are many problems with summarizations, but the biggest is probably value judgements. Let’s take a Naive Extraction process in TrustGraph. The entire point is to load a document that I haven’t read. I don’t know what topics in the document are important. But, I want to know as much as possible about the document for RAG requests. A summary, however, requires making determinations about what topics are important. Again, in a Naive Extraction, I don’t know what’s important. Asking the LM to perform a summary is asking it to decide what topics are important. If you know the subject of the text, you know what topics are important to you. It becomes like the MP3, throwing away parts you know you don’t need. But what if you don’t?

Nascent Knowledge Compression

The Naive Extraction process, in many ways, is a nascent knowledge compression methodology. Naive Extraction processes and structures the text into RDF style triples for loading into a knowledge graph. This structure, much like MP3s, attempts to throw away parts that aren’t useful like prepositional phrases, clauses, adverbs, articles, and other grammar types that don’t contribute to meaning. If these grammar types don’t contribute to meaning, why do we use them? They provide the structure and context in which we derive meaning. They’re knowledge clues. If we remove those clues, it’s much harder to reconstruct the full meaning of the in situ knowledge.

Beyond Knowledge Compression

The intent of knowledge extraction to a RDF style knowledge graph, is not to just compress the knowledge but to enhance it. The labels, annotations, and connections made within the graph provide a deeper insight into a topic than a bulleted list of key points in a summary. A knowledge graph can make links between topics that would have required time-consuming human analysis to uncover. Not only does a Naive Extraction compress knowledge, it enhances it for knowledge retrieval.

Early Days

Today, our strategy with TrustGraph is to extract as many knowledge relationships as possible. This approach is far from optimized and likely records many meaningless relationships. However, we do know asking a LM to summarize the text is a guaranteed way to intentionally discard information - which could be exactly the information we wanted. We’re just now beginning this journey of improving knowledge extraction and already have several approaches we’re exploring. Come join us on this journey, and make knowledge extraction so ubiquitous that summaries go the way of the MP3.