Taking AI On-Device with GraphRAG

Unlocking the power of Tiny Language Models (TLMs)

While so many have been obsessed over the last year with everything OpenAI does, a new community is emerging with a new vision - driving AI On-Device. This time last year, that aim seemed impossible. Models from OpenAI, Anthropic, and Google were showing promising performance - if you had a datacenter. So what’s changed? GraphRAG and TrustGraph’s loadable Knowledge Cores.

Most everyone is familiar with the truly big LLMs like GPT-4, Claude, and Gemini, all believed to be trillion parameter plus models. If those are LLMs, what are SLMs - Small Language Models? It seems that models fall under 4 categories separated by an order of magnitude:

  • TLM: Tiny Language Models, ~1B parameters

  • SLM: Small Language Models, 7B-9B parameters

  • MLM: Medium Language Models, ~70B parameters

  • LLM: Large Language Models, 400B parameters and up

SLM performance from recent models like Llama 3.1:8B, Gemma2:9B, Phi3.5, Mistral7B, and others has become truly impressive. But, there is a problem. These SLMs are still large enough that they need a reasonable GPU with at least 8GB of memory to run them. Most any desktop will meet those requirements, but laptops and mobile devices? Not so much.

For devices without GPUs, we’re down in the range of TLMs. If you’ve ever experimented with these models, you know they can be infuriating. At times they can seem brilliant and other times bizarrely idiotic. Yet, the potential is there - but how to unlock it?

At first, RAG approaches aimed to solve the problem of long input contexts. The jury is still out on whether long context windows truly “work” or even what it means for them to “work” (our results point to the not category). Then RAG became about LLMs and privacy, except that you’re still almost always relying on an external API. Now, people are asking, can we get the same big LLM performance with RAG and SLMs? Yes, yes we can.

Then why GraphRAG? In short, it depends. TrustGraph supports both “traditional” RAG and GraphRAG. In fact, the TrustGraph GraphRAG is actually HybridRAG, but that’s another story. In some use cases, perhaps simple RAG is ok. So, what does a knowledge graph get you over cosine similarity?

  • Term disambiguation through context labeling

  • Linking related concepts through multiple “hops”

  • Linking context specific terms to universal terms

  • Disambiguation of concepts, entities, people, intangible objects, and any information of interest 

Knowledge graphs, if structured correctly, can be easily updated with improved and new knowledge relationships. The extraction process of TrustGraph can build these knowledge graphs with LLMs, MLMs, and even SLMs, but not TLMs. TLMs struggle to consistently form responses in parsable structure like JSON. In fact, TLMs struggle to consistently structure responses even as CSVs.

If we can’t build a knowledge graph with a TLM - yet - how can we then leverage TLMs and GraphRAG? Knowledge Cores. A unique feature of TrustGraph is the ability to store the extracted knowledge graph and mapped vector embeddings into a portable Knowledge Core. This Knowledge Core can then be loaded back into TrustGraph in seconds. The ability to store and share Knowledge Cores makes the extraction a one-time process. The needed Knowledge Cores can be built Off-Device and then loaded On-Device along with the TLM.

With new support for Llamafiles in TrustGraph v0.9.5, TrustGraph is poised to be a key enabler for On-Device AI by deploying the entire AI Infrastructure needed to make On-Device AI a reality. Come join us in driving AI On-Device.