22-June-2025 By Jeffrey Cooper

From RAGs to Niches (Part 1)

tl;dr

I built “Booker,” a specialized RAG (Retrieval Augmented Generation) chatbot that can answer questions about any book (in this case, my book on 3D photography), and this article walks through the entire technical journey. RAG combines the language skills of LLMs with “book smarts” around specific knowledge collections, letting you create focused AI assistants instead of relying on general-purpose tools like ChatGPT. What surprised me was how quickly the initial prototype came together – just 1-2 hours of planning and about an hour of implementation using OpenAI’s o3 and Cursor.

The real work came in optimizing the system. I learned that simply dumping text into a “RAG bucket” produces underwhelming results, so I dove deep into text chunking strategies, switching from larger text chunks to smaller text chunks for better performance. I added what I call “connective tissue” – summaries, keyword extraction, and metadata that help the system cross-reference different parts of the book. Since my book was written in 2011 when 3D technology was booming, I also had to update outdated content with an AI-generated epilogue and implement fallback to ChatGPT’s general knowledge. If you’re curious about RAG implementation, vector databases, embedding strategies, or want to see the visual UMAP diagrams that helped me understand my data distribution, this article covers the complete technical journey from concept to deployment.

RAG (Retrieval Augmented Generation) has long been on my list to tackle. As I got deep into Word Hammer (still to be released in the very near future), I pushed other things to the side because it was both super useful to me and I was doing a lot of different things with it.

But now, I am back and made a lot of progress. As with anything deeply technical, you can make some really fast progress, and then things get interesting. That’s what happened with Word Hammer, and I can see that happening with my RAG project as well, though I will try a bit better to keep this one on track.

For example, I spent 1-2 hours shaping what I wanted my RAG chatbot to do, and it took about an hour to implement. I had OpenAI’s o3- after I shaped it in a brainstorming session, write a complex prompt for Cursor. Cursor rapidly built the app, and other than a few initial bugs, in an hour, I had a functioning chatbot.

To back up first, RAG is a process by which you can make a chatbot like ChatGPT “smarter,” or rather, more focused.

ChatGPT knows a LOT about a lot of things, but it doesn’t necessarily go into depth in certain subjects. In particular, proprietary data that belongs to a company, or research papers, or any other large and detailed body of knowledge. You get the best of both worlds with RAG- you get the amazing language skills of any of today’s LLMs combined with “book smarts” around any given collection of work.

Nota para los lectores españoles: Estoy escribiendo mis articulos en dos idiomas mientras lo aprendo. Para mas información, lea este artículo.

RAG (Retrieval Augmented Generation) he sido durante mucho tiempo en mi lista para hacer. Como trabajé más y más de mi otro proyecto Word Hammer (que se lanzará en el futuro muy próximo), empujé otros proyectos al lado porque Word Hammer fue muy útil para mi y este proyecto tenía muchas partes diferentes.

Pero, ahora, estoy de vuelta y hice mucho progreso. Como con cualquier tema profundamente técnico, pueda hacer algo progreso muy rápido, y entonces las cosas se ponen interesantes. Eso ocurrió con Word Hammer, y puedo verlo ocurrir con my proyecto de RAG también, aunque trataré un poco mas mejor para mantenerlo en el rumbo.

Por ejemplo, tardé 1-2 horas para formar lo que funciones quería que mi Chatbot hiciera, y una otra hora para implementar con Cursor. Utilicé o3 de OpenAI para formar y generar un prompt complejo para Cursor. Y entonces, Cursor rápidamente construyó la app, y excepto por unos pocos errores inicialmente, tuve una app que funcionaba en una hora.

Para regresar y explicar, RAG es un proceso por puede hacer que un Chatbot como ChatGPT más inteligente, o más especifico.

ChatGPT sabe mucho sobre muchas cosa, pero no profundiza necesariamente en algunos temas, en particular, datos privados que pertenecer a una empresa, o papeles de investigaciones, o cualquier conjunto de conocimientos. RAG tiene lo mejor de ambos mundos- habilidades asombrosas de cualquier LLM en combinación con “book smarts,” o conocimientos específicos alrededor cualquier colección de obras.

Why RAG & The Booker App

RAG-based chat applications have broad appeal and are useful in many, many domains. They are great for books, bodies of research, corporate unstructured data, social media comments and sentiments, helpdesks like ZenDesk, knowledge bases… the list goes on.

Whatever you have ingested as your data source, becomes the primary focus of the app. And in your chatbot, you specialize a generalized LLM in the area of interest, and at the same time, “box it in.”

By boxing it in, the chatbot is mainly restricted to the body of work you have given it. It will use the knowledge and information present in your database, and its own broad language skills, but in general, will restrict itself from unrelated information. For example, you can’t just ask “Who won the 1951 World Series” and expect an answer. It will reply that it is not in the body of knowledge that it is currently using.

To develop my RAG chat application, I happened to have the perfect candidate to use: My book. And I called the app Booker, though it is more general than just books.

Aplicaciones de RAG chat tienen gran atractivo y son utiles en muchos dominios. Son ideales para libros, conjuntos de conocimientos, datos corporativos no estructurados, comentarios y sentimientos de las redes sociales, mesas de ayuda, bases de conocimientos… la lista continúa.

Lo que datos sea ha ingerido se convierte en el foco principal de la app. Y en tu chatbot, especializa un LLM generalizado en un área de interés, y al mismo tiempo, encerrarlo.

Por encerrarlo, el chatbot es limitado al conjunto de obra has darlo. Utilizará el conocimiento y la información que es en tu base de datos, pero en general, lo restringirá de información no relacionada. Por ejemplo, no puedes preguntarlo “¿Quien ganó la World Series de 1951?” y espere una respuesta. Responderá que no es en un conjunto de obra lo que utilizando actual.

Para desarrollar my app de RAG chat, tuve un candidato perfecto para usar: Mi libro. Y me llamo la app Booker, aunque es más general que solo libros.

RAGging the Book

My book is about 3D Photography, has been one of the bigger hobbies in my life. I learned how to view stereoscopic images as a kid looking at 3D images from the surface of Mars. When I got my first camera, I knew I could recreate this myself (minus the Mars part 😄). By 2011, I had a website dedicated to stereo photography and had been the #1 search result on Google for 3D photography for 15 years running. I had a vibrant and successful Forum and had a side gig developing a 3D stock image site modeling after Getty Images. Avatar was top of the charts and 3D TVs were exploding (not literally…). I was in Finland for work and a publisher friend asked me to write a book for his company to publish. That process on its own was a fantastic experience, and the book was published at the end of 2011 (cover above) called The 3D Photography Book.

Fast forward- the 3D hype nose dived a year later (you can ask my chatbot why- link below) and sales were not great and my side gig also evaporated. But you learn from all this. So now, with the publishers permission, I decided this would be the perfect candidate to “chat with your book.”

I will preface this that you can easily go to ChatGPT and it will perfectly well teach you how to take 3D photos, without my book (given how my website and forum dominated 3D for so long, I expect at least some of that is in ChatGPTs training data). But, there are nuances and things in my book, and the important thing to know was if my chatbot was using MY text to get its answers. Short answer- it does.

Mi libro es sobre la fotografía en tres dimensionales, y ha sido unos de los pasatiempos más importantes de mi vida. De joven, aprendí como ver las imágenes estereoscópicas de la superficie de Marte. Cuándo compré mi primera cámara, sabía que podía recrearlo lo mismo (menos de la parte sobre Marte 😀). Para 2011, tuve un sitio de web dedicado a fotografía estereoscópica que fue el #1 resultado de la búsqueda en Google desde 15 años! Tuve un Forum exitoso y vibrante desde 2006 y un “gig” en al lado para crear un sitio para imágenes estereoscópicas de archivo que he modelado como Getty Images. Avatar era en la encima de las listas y televisiones 3D se vendían como pan caliente. Estuve en Finlandia por trabajo y un amigo publicador me pidió escribir un libro para su empresa a publicar. Ese proceso en sí mismo era una experiencia fantástica y el libro fui publicado al fin de 2011 (el frente del libro es a la izquierda) se llama The 3D Photography Book.

Avancemos rápidamente hasta el presente- el bombo publicitario de 3D se cayó rapido (tu puedes preguntarse “porque” en mi chatbot- el enlace que aparece al final de este blog) y los ventas no era buena y mi gig en al lado también evaporado. Pero se aprende de todo esto. Ahora, con la permisión del publicador, decidí este sería un candidato perfecto para “chatear con tu libro.”

Antes de nada, quiero decir que puedas ir a ChatGPT y enseñarte sobre 3D perfectamente, sin mi libro. Dado que mi sitio de web y forum han dominado 3D en Google para tanto años, imagino que eso es un parte del entrenamiento de ChatGPT). Pero, hay matices y cosas en mi libro, y la cosa más importante para saber si mi chatbot era utilizar MI texto para obtener las respuestas. La respuesta corta es que sí.

Understanding the Corpus

This is a long article for a reason. When you dive straight into a project, you learn fast. With RAG, you learn very quickly that just dumping all your text into a “RAG bucket” will be, at the very least, underwhelming, and most likely, useless. While the chatbot might find a reference to a paragraph that mentions something in your question, the response will likely be unhelpful.

The first thing that happens with a RAG is that you need to chunk the text, i.e., extract it in sections. Each chunk is then vectorized as an embedding. This is a process by which, I colloquially put it- a concept or idea is placed into a knowledge space. We will talk about embeddings in a future article, but that is the gist of it. And each embedding is an array of either 1536 or 3072 numbers, and there is one embedding per chunk. I chose to use OpenAI’s large embeddings model, which is the bigger of the two.

Different types of content will require different approaches to create a good dataset with which to chat.

A book (or collection of books) are sequences of chapters, each containing a subset of information, or if a story, a sequential set of information as the story progresses. Other than chapters of the book, there is not a lot of ancillary information and it is straightforward. This was one of two initial corpora I wanted to tackle.

The other was a series of reports I did for a client that was >1200 pages, consisting of 5 different market domains and a broad array of their products. That corpus has a structure of 5 reports, 5 presentation and an overview. None of the data is sequential, but is rather opportunity-driven, both in market opportunities and technical opportunities. And the product array was quite broad, involving both the potential for applications as well as a number of physical products, some technology-based, others not. And for all those documents is a much larger trove of files consisting of market reports, market studies, press releases, annual reports, technical studies, etc… And this demands a very different approach to the data ingestion.

Este es un artículo largo para una razón. Cuando saltas directamente en un proyecto, aprendes rápido. Con RAG, aprendes muy rápido que volcar tus datos en “cubo de RAG” será, como mínimo, decepcionante, y probable inútil. Mientras el chatbot podría encontrar una referencia al párrafo que menciona algo de tus preguntas, las respuesta no serán de ayuda.

La primera cosa que ocurre con RAG es que necesita cortar en trozos el texto, i.e., extraje en secciones. Entonces, cada trozo es vectorizado como un embedding. Esto es un proceso, por digo coloquialmente, un concepto o idea que se pone en un espacio de conocimiento. Hablaremos más sobre embeddings en un artículo futuro, pero eso es la esencia del asunto. Y cada embedding es una matriz de 1536 o 3072 números, y hay un embedding por trozo. Elegí utilizar el model de embeddings grandes de OpenAI, cuales el más grande de los dos.

Tipos diferentes de contenido requerir enfoques diferentes para crear un conjunto de data con el que conversar.

Un libro (o colección de libros) son secuencias de capítulos, cada uno conteniendo una porción de información como la historia progresa. Aparte de los capítulos del libro, no hay mucha información complementaria y es sencillo. Este fue uno de dos tipos de conjuntos de contenido inicialmente que quería abordar.

El otro conjunto de data fue una serie de informes que hice para una cliente que fue >1200 paginas, con 5 dominios mercados diferentes y un gran matriz de productos. Ese conjunto tiene una estructura de 5 informes, 5 presentaciones, y una visión general. Ninguna de los datos son secuencial, pero es más enfoque en oportunidades mercadas y técnicas. Y la matriz del productos fue grande, con ambas potencial para aplicaciones tan como muchos productos físicos, algo basado en tecnología y otros no. Y para todos los documentos hay un repositorio más grande compuesto por informes mercados, comunicados de prensa, informes anuales, estudias técnicas, etc… y estos exigen un enfoque diferente a la ingesta de los datos.

Text Chunking

Initially, I set up Booker to chunk the text in 800 character blocks, with an 80 character overlap. This means it only looks at 800 characters at a time, and for the next chunk, it slides the window forward 720 characters (800 − 80). And so forth. Each chunk is then sent off to, in my case, OpenAI’s large embeddings model to create a single vector embedding for that chunk. As mentioned above, this vector represents an array of values to capture semantic information about the content of the entire chunk. This is abstract stuff. And each chunk is represented by a vector.

When a user later, in the chatbot, types a question, this question is ALSO sent to the OpenAI large embeddings model, which returns the same. This vector is then compared against the database that contains all the vectors from the text, and the vector that is closest in the multi-dimensional vector space to the vector of your question, is then extracted (or rather, the text that vector represents), and THIS is what is then excerpted from the larger corpus and submitted to an LLM (in the case of the Booker app, GPT 4.1 mini) to generate a reply.

That’s the simple version. It is actually a little more complex than that- usually there is more than one match, and the algorithm returns a number of top k candidates, and depending on how proximal they are to each other, more than one text excerpt can be submitted.

Inicialmente, configura Booker para cortar en trozos (chunkear? 😄) de bloques de 800 caracteres con 80 caracteres superpuestos. Eso significa que solo busca a 800 caracteres en una vez, y para el próximo trozo, se desliza la ventana hacia 720 caracteres (800 − 80). Y así sucesivamente. Cada trozo entonces se envía a, en mi caso, el model de los embeddings grandes de OpenAI para crear un único vector para ese trozo. Se mencionado anteriormente, este vector representa una matriz de valores para capturar información semántica sobre el contenido del trozo completo. Esto es cosas abstractas. Y cada trozo se representado por un vector.

Hasta, cuando un usuario puede una pregunta al chatbot, esta pregunta también se envío al model de embeddings grandes de OpenAI, que devuelve el mismo. Entonces, este vector se compara contra el base de datos que tiene todos los vectores del texto y el vector, o vectores, más cerca en un espacio de vectores multi-dimensionales al vector de tu pregunta, es extraído (en realidad, el texto que se representado por el vector), y ESTE es que extraído del conjunto de textos y envío al LLM (en el caso de la app Booker, GPT-4.1-mini) para generar una respuesta.

Eso es la versión sencilla. En realidad, es un poco más complejo que eso- usualmente hay más de un resultado coincidente, y el algoritmo vuelven unos de “top k” candidatos, y depende en como proximal hay al otro, más de uno extracto puede enviar.

"Connective Tissue"

So, if you just chunk the text into a vector database, and leave it at that, you will have a very bland chatbot. You will get some very compartmentalized answers regarding some exact specifics with little ability to connect the dots between different parts of the book.

This is where you need to bring in what I call “connective tissue,” to create infrastructure around your initial body of text (or corpus) that helps find and cross-locate different blocks of text.

Así, si justo cortas el texto en trozos y ponerlo en un base de datos vectorial, y eso es todo, tendrás un chatbot muy insulsa. Tendrás unas respuestas hay muy compartimentado en relación con los específicos exactos con poca capacidad para conectar los puntos entre partes diferentes del libro.

Esto es donde necesitas añadir lo que llamo “tejido conectivo,” para crear una infraestructura alrededor tu conjunto de obra inicial que ayuda a encontrar y localizar cruzada bloques de texto diferentes.

The way this works is that the text for each chunk is, itself, sent to an LLM (OpenAI GPT 4.1 mini) to summarize the chunk in a few sentences. There is a Python package called SpaCy, a natural language processor (NLP) that I use to identify important keywords in that chunk. These are combined with Chapter and Header titles (if available) to further give a deeper context to the chunk, and all of this is added to a JSON sidecar file

The diagram above shows this process, and you can see that multiple calls to OpenAI area made, both to generate the embeddings and to summarize text. These are two different API calls to different services on OpenAI, and you don’t have to use OpenAI’s services. The left side of the diagram, #1, shows the chunking and saving to the FAISS vector database. FAISS is a commonly-used vector database framework created by Meta, but there are many others, such as Pinecone and Milvus. Following that, the chunks are summarized by a standard OpenAI LLM model and keywords extracted and saved to both a structured db (DuckDB) and JSON sidecars. You can see a snippet of one of the JSON entries for one of the chunks below.

On the right side, after ingestion, the sidecar summaries are combined and then summarized again to represent a concise summary of the body of work.

La manera como esto funciona es que el texto para cada trozo también se envía al LLM (OpenAI GPT-4.1-mini) para resumir el trozo en una o unas de frases. Hay un paquete de Python se llama SpaCy, un procesador de lenguaje natural (NLP) que utilizo para identificar palabras clave en ese trozo. Estas se combinar con los títulos capítulos y encabezados (si disponible) para dar más un contexto más profundo al trozo, y todo se añadir a un archivo sidecar de JSON.

El diagrama anteriormente muestra el proceso, y puedes ver que varias llamadas a OpenAI se hacer, para ambos generar y resumir el texto. Hay dos APIs diferentes se utilizan servicios diferentes de OpenAI. El lado izquierdo del diagrama #1 muestra el cortando y guardando al base de los datos vectorial de FAISS. FAISS es un marco de base de datos vectorial se usar común, creyó por Meta, per hay otros como Pinecone y Milvus. A continuación, los trozos se resumen por un model estándar de OpenAI y palabras claves se extraen y se guardan ambos a un base de datos estructurado y archivos sidecar de JSON. Puedes ver un porción de un sidecar de JSON siguiente.

En el derecho lado, después de la ingestión, los resúmenes de los sidecars se combinan y entonces se resumen otra vez, para representar un resumen conciso del conjunto de obra.

				
					 {
      "chunk_id": 140,
      "embedding_index": 139,
      "page_range": "370-380",
      "summary": "The excerpt discusses the impressive results achieved using a simple macro photography setup to capture the tiny size of a bee, comparable to a #2 pencil lead. It also offers tips for taking 3D aerial photographs during flights, emphasizing the importance of planning the flight route, considering terrain orientation, and timing for optimal lighting conditions.",
      "keywords": [
        "opposite",
        "fly",
        "hemisphere",
        "size",
        "fellow",
        "long",
        "shadow",
        "time",
        "lead",
        "resolution",
        "lens",
        "north",
        "tiny",
        "example",
        "window",
        "flight",
        "late",
        "rail",
        "region",
        "google",
        "aerial",
        "macro",
        "check",
        "sure",
        "airplane",
        "midday",
        "dead",
        "account",
        "use",
        "sun",
        "south",
        "book",
        "extension",
        "insect",
        "lest",
        "identify",
        "consider",
        "strong",
        "fairly",
        "leave",
        "anaglyph",
        "northern",
        "well",
        "right",
        "alps",
        "superb",
        "way",
        "come",
        "bright",
        "southern",
        "seat",
        "make",
        "planning",
        "winter",
        "pretty",
        "western",
        "terrain",
        "setup",
        "straight",
        "country",
        "sit",
        "show",
        "bee",
        "relatively",
        "think",
        "aircraft",
        "airport",
        "experience",
        "shot",
        "countless",
        "route",
        "give",
        "interesting",
        "big",
        "day",
        "morning",
        "world",
        "tube",
        "beautiful",
        "light",
        "probably",
        "line",
        "earth",
        "go",
        "fun",
        "pencil",
        "reference",
        "interfere",
        "inexpensive",
        "frequently",
        "result",
        "simple"
      ],
      "entities": [
        {
          "text": "bee",
          "label": "ORG"
        },
        {
          "text": "2",
          "label": "MONEY"
        },
        {
          "text": "Size",
          "label": "PERSON"
        },
        {
          "text": "Macro",
          "label": "ORG"
        },
        {
          "text": "the Western US",
          "label": "LOC"
        },
        {
          "text": "Alps",
          "label": "LOC"
        },
        {
          "text": "Google Earth",
          "label": "LOC"
        },
        {
          "text": "the winter",
          "label": "DATE"
        },
        {
          "text": "Morning",
          "label": "TIME"
        },
        {
          "text": "late day",
          "label": "DATE"
        }
      ],
      "heading": null,
      "heading_level": null,
      "heading_type": null,
      "source_type": "book",
      "importance_level": 0,
      "importance_name": "PRIMARY",
      "source_directory": "",
      "file_name": "Chapter 5b- Taking 3D Pictures.pdf"
    },

Depending on the corpus of work that you are ingesting, there will be different strategies with the connective tissue- the side cars, the keyword parsing, and even the embedding models you use (more in that future article on Embeddings).

Depende en el conjunto de obra que se ingesta, habrá estrategias diferentes con la tejido conectivo- los sidecars, las palabras extraídas, y incluso cuales los modelos de los embeddings se utilizas (más sobre eso en un artículo futuro sobre Embeddings).

Once all of this is done, the diagram above, in step #3, shows the construction of the JSON sidecar and saving of the data to both the actual JSONs and also folding it into what is called a Pickle File, which is essentially an index of the FAISS table and is paired with it.

Cuando terminar, el diagrama anterior, en etapa #3, muestra la construcción del sidecar de JSON y el guardando de los datos a ambos los JSONs y también ponerlo en un archivo se llama un Pickle File, cual es, esencialmente, un índice del tabla de FAISS y se emparejan con él.

The block diagram above shows what is inside of each file type.

I am going into all of these details because it is important to have a good foundation upon which the chat application can run. With this rich cross-referencing system, it enables the system to pinpoint text passages from the book, both by specific keywords, but also by semantic meaning, which is a best-of-both-worlds result. Human language is complex, and keywords alone do not always result in acceptable answers.

El diagrama anterior muestra lo que es a dentro cada tipo de archivo.

Soy darte todas de estas detalles porque es importante para tener una fundación solida en cual la app de chat puede funcionar. Con el sistema referencia cruzada muy rico, se permite el sistema para extraer pasajes de texto del libro, ambos por las palabras clave especificas, pero también por la significado semántico, y eso es un resultado que es lo mejor de ambos mundos. Lenguaje human es complejo, y solo palabras clave no siempre resulta en respuestas suficientes.

Chunking, Revisited

If you read my blog regularly, you know I am a very visual person. As I was researching RAG/vector databases, curious about those 3072-dimension vector spaces, I wanted to know what they might look like. Of course we see in 3D, not 3072D (my book only goes up to 3 dimensions!), but there had to be a way to see the data visually.

It turns out there are multiple methods, one of which is a UMAP diagram (Uniform Manifold Approximation & Projection). In a system with a lot of data, they can look like the following dramatic illustration. (borrowed from https://www.scdiscoveries.com/blog/knowledge/what-is-a-umap-plot/), and they are used for large dimensionality data sets in various areas of science- not just RAG vector databases.

Si lees mi blog regularmente, sabes que soy una persona muy visual. Mientras estaba investigando los bases de datos vectoriales/RAG, fue curioso sobre esos espacios de los 3072 dimensionales. Quería saber que se aparecer. Por supuesto, vemos en 3D, no en 3072D (¡mi libro solo discute 3 dimensiones!), per tiene que haber una manera para ver los datos visualmente.

Resulta que hay varios métodos para verlos, una de las cuales es un diagrama UMAP (Uniform Manifold Approximation & Projection). En un sistema con muchos datos, puedan aparece como el siguiente ilustración dramática (tomada de https://www.scdiscoveries.com/blog/knowledge/what-is-a-umap-plot/), y se utilizan para conjuntos de datas grandes dimensionalidades en varias areas de las ciencias, no solo justo en los bases de datos vectoriales.

This type of illustration is what I tried to capture in this blog article’s masthead image at the top, which I created myself using a series of MidJourney prompts and Photoshop.

Actual results do vary, a lot.

Este tipo de ilustración es que traté capturar en la imagen del encabezado encima de este artículo, cual creí lo mismo utilizando un series de prompts de MidJourney y Photoshop.

Resultados reales puede variar muchos.

Being visual, I was giddy with excitement when I added the Spotlight Viewer app to my project, and couldn’t wait to see my data. The image above on the left is what I got… and it was quite a bit less than exciting! This led to a deeper analysis of what I had implemented and its implications.

That image on the left has 91 dots, which correspond to the 91 chunks into which the book was chopped. That sparse image combined with my initial resuts being a bit bland and getting the occasional equivalent “I don’t know” made me want to look deeper.

The reality is, the optimum chunk sizes really depend on the type of data in your corpus and also the information density within it. So I decided to make them smaller, and changed the chunk size from 800/80 overlap to 300 characters with 50 overlap. The net results was we now were getting 254 chunks, and somewhat better clumping, shown in the image on the right side.

To be fair, this book is only 130 pages, and is very specific, so I was not expecting thousands of dots in large numbers of tight clumps.

Como una individual visual, fue muy emocionado cuando ańade el Spotlight Viewer a mi app, y no pude esperar para ver mis datos. La imagen a la izquierda anterior es lo que vi, ¡y fue bastante menos de que emocionante! Esto me llevó a investigar más profundo de que tuve y los implicaciones.

La imagen de la izquierda tiene 91 puntos, lo que corresponden a los 91 trozos en que el libro estaba cortado. Esta imagen dispersa en combinación con mis resultados iniciales que era insulsos, y también con respuestas equivalente de “Yo no sé” me causó querer ver mas profundo.

En realidad, los tamaños de los trozos más optimum dependen en el tipo de datos en tu conjunto de obra, y también la densidad de la información al dentro. Así, decidí hacerlos más pequeños, y cambié los tamaños de los trozos de 800/80 superposición a 300 caracteres con 50 caracteres superposición. Los resultados netos fue que estaba obteniendo 254 trozos, y una mejor agrupación, demuestra en la imagen en la derecha.

Para ser justos, este libro solo tiene 130 paginas, y es muy especifico, así no me lo esperaba ver miles de puntos agrupados tan juntos.

For comparison, the research report I also ingested in a separate library, is about 1250 pages and consists of 5 distinct reports. Each report is tightly focused on a subject area. The resulting diagram (above) has mulitple regions of very tight clusters. This graph also represents the same 300/50 chunk sizes.

It’s important to note that, right now, I am not satisfied with the quality of the Research Report chatbot. It will need a different strategy regarding the structuring of the corpus as this one has background information in other files and needs a different approach. I have been putting a lot of thought in how to generalize the library structures to handle a variety of different corpus types.

Para comparar, el informe de investigación que ingestó también en un biblioteca separada, es aproximadamente 1250 paginas y consiste de 5 informes distintos. Cada informe es muy enfocado en una tema. El diagrama resultante (anterior) tiene múltiples grupos de puntos muy juntos. Este grafo representa los tamaños mismos trozos de texto de 300/50.

Es importante para notar que, ahorita, no he satisfecho con la calidad del chatbot del informe de investigación. Necesitará una estrategia diferente en relación con la estructura del conjunto de obra y este biblioteca tiene mucha información de referencia en otros archivos y necesita un enfoque diferente. He eatado pensando mucho sobre como generalizar las estructuras de la biblioteca para entender una variedad de tipos diferentes de los conjuntos de obras.

The Chat App

With the smaller chunks, I was getting better results. It just needed a few more bits to make it work well.

For starters, the book was written in 2011, when 3D movies were at an all time high and 3D TVs were selling like crazy. And there were 3D cameras, 3D computer monitors, and even several 3D cellphones on the market. This all came to a screeching halt over the next couple of years. So, initial chats with the book felt very… dated. To remedy that, I asked Claude Sonnet 4 to take a look at the book, and catch it up on areas I knew that were no longer relevant or outdates. This became a final chapter of sorts- an epilogue.

I gave Claude very explicit instructions what to consider, what I wanted in it, and let it create the surrounding prose to make it a richer document. I then folded this into the FAISS and other databases.

This brought the chat app almost up to current day. There were still a few gaps, so I decided to add some additional layers into it.

First, the default way these work is the underlying LLM (ChatGPT 4.1 mini in my case) restricts its answers to the corpus you ingested. That doesn’t have to be the case. ChatGPT itself knows a lot about 3D already because it has ingested most of humanity’s output over the past century! It knows what devices are out there and it knows what happened to 3D TVs and 3D movies. So I created a fallback- if there is any gap in what the chat app was going to output, it would “pop out” to ChatGPTs general knowledge of the world- specifically, 3D imaging, and fill in those gaps. That made the experience smoother.

There is another level I added a placeholder for, but it is not implemented. It is a final layer. Should I add it in the future, I could choose a model that can also search the web, and then it would actually be able to go out and fill in gaps even further, allowing the user to drill down into specs for specific devices or software. But for now, it’s not functional.

Con los trozos más pequeños, estaba obteniendo mejores resultados, Justo necesitó un poco más trabajo para hacerlo funcionar bien.

Para empezar, el libro fue escrito en 2011, cuando las peliculas de 3D fueron muy populares y las teles 3D se venden cómo pan caliente. También fue cámaras de 3D, pantallas computadoras en 3D, y incluso celulares de 3D en el mercado. De repente, todos se detuvieron sobre un par de años. Así, mis chats iniciales con el libro sintieron muy… anticuados. Para remediar, se pidió Claude Sonnet 4 para leer mi libro, y actualizarlo en areas que supe que no más relevante o anticuados. Este se convirtió a una especie de capítulo final, un epílogo.

Yo se dí Claude instrucciones muy especificas para considerar, lo que querida al dentro el epílogo, y yo lo permití escribir la prosa alrededor para hacerlo un documento más rico. Entonces, lo incluye en el conjunto del libro y en el base de datos FAISS y otros bases.

Esto actualizó la información en app casi al día actualmente. Había unos vacíos todavía, así decidí para agregar niveles adicionales a la lógica.

Primero, la manera predeterminada que estos funcionan es el baso es el LLM (ChatGPT-4.1-mini en mi caso) se restringe sus respuestas la información que se ingirió. No es necesario para ser el caso. ChatGPT en sí mismo ya sabe mucho sobre 3D porque ¡ha entrenado la mayoría del conocimiento de humanidad sobre el ultimo siglo! Sabe cuales dispositivos son disponible, sabe lo que sucedido de teles 3D y las películas en 3D. Así agregué una alternativa- si hubo cualquier vacío en lo que el chatbot iba a generar, Le pidió que lo escalara al conocimiento general de ChatGPT del mundo- específicamente, las imagenes 3D y llene los vacíos. Eso creyó una experiencia más suave.

Hay uno más nivel que agregué como un marcador de posición, pero no está implementada. Es un nivel final. Debería que decida codificarlo en el futuro, podría elegir un model que también busque el web, y entonces, podría buscar y llene los vacíos cada vez más, y permita el usuario para profundizar en especificaciones para dispositivos o software específicos. Pero por ahora, no es implementado.

The flowchart shows what happens between when you ask a question through to when you get a reply. I am pretty happy with the results.

El diagrama de flujo muestra que ocurrir entre la pregunta y la respuesta. Soy feliz con los resultados.

App Deployment

I won’t go into a lot of detail regarding how I deployed this. In the past I have deployed right here on my Running Thoughts website. However, it has limitations as it is primarily a WordPress site and I don’t have priveleges to run more complex backends.

This system has 3 components- the Front End, or the app you see; the Back End, which processes your prompts and does all the fancy work; and the data stores themselves.

I chose to store the data on an AWS S3 bucket and can add new stores to the library as I ingest more content.

The Back End and Front End I deployed to Render, which lets me host small applications for free. There is, however, a small convenience cost to the User for the free service. When Booker is idle for more than 15 minutes, it “spins down” in that the container it is running it goes away. When a user clicks on the link, if it is not currently active, it can take up to a minute to “spin up.” That’s OK as I wanted to try out Render, and I do give the user a countdown message indicating it is spinning up if has been inactive. Please bear with this minor inconvenience. Eventually as I try out different services, I might settle on one and properly subscribe to them. For now, it is all part of the experimentation.

No voy a discutir muchos detalles relativos a implementarlos por un servidor. En el pasado, he implementado aquì en me sitio de Running Thoughts. Sin embargo, este sitio tiene limitaciones así es principalmente un sitio de WordPress y no tengo privilegios para ejecutar backends más complejos.

Este sistema tiene 3 componentes- el front end, que es la app que ver; el back end, que procesa tus prompts y hace el trabajo duro; y los almacenes de datos.

Elegí para mantener los datos en un bucket de S3 en AWS y puedo agregar nuevas bibliotecas como más contenido se ingiere.

El back end y front end he implementado a Render, que me permite alojar aplicaciones pequeñas gratis. Hay, sin entonces, un pequeño inconveniente para el usuario para el servicio gratis. Cuando Booker está inactivo para más de 15 minutos, se detiene y que el contenedor se ejecuta desaparece. Cuando un usuario hacer clic en el enlace, tarda casi un minuto para arrancar. Es OK, y que quería probar Render, y se da el usuario un mensaje de cuenta atrás se indica que arranca si he estado inactivo. Por favor, tengan paciencia con este pequeño inconveniente. Eventualmente como pruebo servicios diferentes, podría escoger en uno y se lo suscribo. Por ahora, está un todo parte de la experimentación.

📚

Launch Booker

(you may need to wait up to a minute for the application to spin up)

You can launch the app via the link above and I am making the code viewable, as is, in GitHub. Keep in mind that the code is currently set up to run either locally or on Render (I do check to see if the code is local or on a production site, but it is, for now, specific to Render). If you want to build it yourself, you can either set up a free account on Render (link) or adapt the code accordingly.

Puedes lanzar la app con el enlace posterior, y estoy permitiendo para ver el código, como si, en GitHub. Recuerda que la app se configura a ejecutar en Render (probo para ver la app ejecuta en un sitio de producción o en tu máquina local, pero ahora, está específica a Render). Si deseas implementar la app en una otra locación te mismo, puedes configurar una cuenta gratis en Render (enlace) o adaptas el código como corresponde.

https://github.com/runningthoughts/Booker

Final Thoughts, and Next Steps

So far, the Booker app is mainly optimized for a book. I need to work on it to get better results with other types of corpora.

Below I have added an example of a chat I did to test out the application. Hover to expand it and make it readable.

There will be future articles as I advance the RAG application, including some automation and possible agentification as well, to make it smarter and more adaptable. As with a lot of my projects, it is a work in progress. Stay tuned, and if you made it this far, thank you for reading the entire article!

Hasta ahora, la app Booker está optimizado principalmente para el libro. Necesito trabajar más para obtener mejores resultados con otros tipos conjuntos de obras.

A continuación, añade un ejemplo de un chat que hice para probar la aplicación. Flota sobre la imagen para magnificarla y leerla.

Habría publicaciones futuras que avanzo la app de RAG, incluyendo cierta automación y posiblemente agentificación también, para hacerla más inteligente y adaptable. Como muchos de mis proyectos, este es una obra en progresa. Mantente atentos, y has leído al fin de este artículo, gracias de leer la publicación en toda!

El contenido de estos artículos son un poco avanzado. Necesito utilizar ayuda de DeepL, per trato utilizar lo menos posible. Todavía lo estoy utilizando alrededor 15-20%, porque necesito un más vocabulario y coloquialismos también. Pere con cada publicación, estoy utilizando DeepL menos y menos. Para esta publicación, lo utilicé menos que nunca.

Running Thoughts

From RAGs to Niches (Part 1)

tl;dr

Why RAG & The Booker App

Outline of this article

RAGging the Book

Understanding the Corpus

Text Chunking

"Connective Tissue"

Chunking, Revisited

The Chat App

App Deployment

Launch Booker

Final Thoughts, and Next Steps

Like this:

Related

Recent Posts

Recent Posts

Recent Comments

Archives

From RAGs to Niches (Part 1)

tl;dr

Why RAG & The Booker App

Outline of this article

RAGging the Book

Understanding the Corpus

Text Chunking

"Connective Tissue"

Chunking, Revisited

The Chat App

App Deployment

Launch Booker

Final Thoughts, and Next Steps

Share this:

Like this:

Related

Recent Posts

Recent Posts

Recent Comments

Subscribe

Archives

Subscribe