AI-Enhanced Knowledge Hub: Improving Access to our Blog and Docs

Codesphere's info revamp merges blogs & docs, using Qdrant & AI for seamless, natural language searches. Now, users easily access comprehensive tutorials & guides directly from our docs.

January 26, 2024 5 Min Read
AI-Enhanced Knowledge Hub: Improving Access to our Blog and Docs
AI-Enhanced Knowledge Hub: Improving Access to our Blog and Docs

Alexander Voll

Product Marketing Engineer

Over the years, we have amassed a significant amount of content about Codesphere and what is possible to do with it.
One issue with how our content has been structured so far has been the distinction between our blog and documentation as separate platforms.

In this article I will talk in retrospective about how we consolidated our documentation and our existing blog and combined it with an LLM-powered search using Qdrant (as described in my last article), to have a consolidated, natural language-enabled search across all our information.

Setting the scene

At Codesphere, we not only had rather traditional documentation, we also publish multiple blog articles per week which include tutorials that reach from simple deployment guides to more complex topics like building out an email marketing engine.

This left us with the problem that informational content about Codesphere, including our most sophisticated tutorials, was hosted outside of our documentation. Therefore it was not searchable from our docs page which would be where users go when they are looking for information on what to do with and how to use Codesphere.

That is why we set out to find a way to consolidate our docs and blog.

Our existing setup

Our existing setup mainly consisted of three different pillars to host our information:

  1. Gitbook: We stored all out docs in Gitbook, a standard docs application used by many tech startups around the world.
  2. Ghost.io: As we've mentioned many times before, we use Ghost.io as the content management system for our blog.
  3. Custom Blog Frontend: Our blog is a fully custom static site generator which gives us full flexibility on how to set it up which is what we profited from for this project.

While Gitbook comes with some quality of life improvement like an automatic search and a clean UI, it wasn't really possible to connect it with our blog to create a consolidated search which is why we set out to create our own.

How we made it happen

Before we get into the different steps, let me quickly mention what should already be a given: Everything we did was implemented in Codesphere or through services available through the Codesphere marketplace.

Zero config cloud made for developers

From GitHub to deployment in under 5 seconds.

Sign Up!

Review Faster by spawning fast Preview Environments for every Pull Request.

AB Test anything from individual components to entire user experiences.

Scale globally as easy as you would with serverless but without all the limitations.

Data migration

It became clear to us pretty early on that we'd need to say goodbye to Gitbook to realize what we had in mind.
That is why we implemented all our existing docs to Ghost.io and retrofitted them with fitting tags that we could use for filtering in our frontend.

This way, we were now able to pull our docs and blog posts separately from Ghost and save them in the Node environment where our consolidated hub would run.

Implementing Search Capabilities

We used the same technique I explained in my last blog article to create an AI-powered search using Qdrant. Using Xenova/all-MiniLM-L6-v2 as the transformer for our embeddings. We are hosting the Qdrant instance in a custom Docker image, set up through the Codesphere Marketplace.

This now allows for us to search both our blog and docs simultaneously, using natural language or more traditional queries alike, enabling high quality semantic search that goes beyond just simple keywords.

Adjusting the frontend

Of course, all those changes would need to be reflected in our frontend. I already had some design improvements for the blog article page brewing in the background and this was the perfect opportunity to let them come to light.

One big addition we made was implementing a similar navigation to Gitbook.

We were able to also implement this in a static way, keeping our blog fast and snappy.

The last thing to add was the search for our frontend. For that, we simply added a prominent search bar to the top or our page. We might add this to the navbar to make it even more prominent in the future.

Conclusion

The transition came with its own challenges and tedious steps but at the end of the day, it was a lot simpler than we (or at least I) initially expected.
We now have a new and state of the art way for our users to search all of the content we have ever published to get them to their goal quicker.

Now that we have set up this database, this opens up new possibilities for us as we could use it to power an LLM-powered Codesphere assistant or provide an even more tailored content experience.

The possibilities are endless and we can't wait to keep on exploring what's possible.

If you would like to set up something similar and don't know where to start, feel free to join our Discord community and ask for further assistance.

About the Author

AI-Enhanced Knowledge Hub: Improving Access to our Blog and Docs

Alexander Voll

Product Marketing Engineer

Alex brings a unique perspective through interdisciplinary experience from various corporate stops. He's responsible for most outward facing web applications, including the website and the blog.

More Posts

Full Metal

Full Metal

Buying a used server on ebay kleinanzeigen and preparing it to be cloudified? Follow along to see what it takes to get a piece of metal running.

Structure PDF Table Data for AI Applications with GMFT

Structure PDF Table Data for AI Applications with GMFT

GMFT is a fast, lightweight toolkit for extracting tables from PDFs into formats like CSV, JSON, and Pandas DataFrames. Leveraging Microsoft's Table Transformer, GMFT efficiently processes both text and image tables, ensuring high performance for reliable data extraction.