Run LLMs on Cloud Run with GPUs
Cloud Run as of today became one of the very few services to offer GPUs on a serverless product. It allows you to use one L4 GPU per Cloud Run instance, which as of today has NVIDIA driver version:...
Cloud Run as of today became one of the very few services to offer GPUs on a serverless product. It allows you to use one L4 GPU per Cloud Run instance, which as of today has NVIDIA driver version:...
Retrieval-Augmented Generation (RAG) is an approach in LLM-based Applications which enables an LLM like Gemini to answer queries regarding topics it wasn’t even trained with. This is done by augmen...
As Gen AI gets more popular and integrated into corporate strategies, the use of models like Gemini, GPT 4, Llama etc. is increasing in an incredibly fast fashion. However, a lot of the times the s...
I wrote recently on the official Google Cloud blog about how I helped one of Germany’s biggest banks improve their Developer Experience using Cloud Workstations. Cloud Workstations are preconfigur...