Just the Gist: Using Langchain to invoke LLM, hosted in Snowpark Containers
Dated: Jan-2024
Snowpark Container Services(SPCS) provides a managed environment to run containerized applications natively in Snowflake. There are myriad of videos and articles on how to host LLM in SPCS, for ex:
- Video: Powering LLMs with Llama 2 in Snowflake (PrPr)
- Article: Generating Product Descriptions with Mistral-7B-Instruct-v0.2 with vLLM Serving
LLM application framework
However, when it comes to developing LLM applications, developers tend to start their prototypes and production-alining using frameworks like:
Langchain is one of the popular frameworks for developing LLM applications. While it offers many LLM integrations, there are currently no integrations specific to an LLM hosted in the Snowpark Container Service (SPCS) API endpoint.
So I explored around to how to solve this and am sharing how I got it working.
LLM API endpoint
While there are multiple ways an LLM can be hosted in a container. VLLM is one of the widely used libraries, for loading up LLM models and use for inference and serving. VLLM can download a model from HuggingFace and also offer to serve the model via an OpenAI-compatible server.
A small tidbit note here, when hosting the VLLM OpenAI server, in the SPCS, run it as an http server. In my lab, I hosted as follows:
python -m vllm.entrypoints.openai.api_server --port 8080
--model deepseek-ai/deepseek-coder-6.7b-instruct
--download-dir /llm_model_stg/
--max-model-len 5000
VLLM API
Langchain provides VLLM Integration, which can be used to communicate with the OpenAI-compatible server.
In my previous article: Just the Gist: Connect to API, hosted in Snowpark Container. I shared the approach of connecting to an API endpoint using Snowpark session and JWT Token.
NOTE: The approach reflected here is a hack/temporary approach and might change in the future. Since there is no official documentation there is always a possibility that Snowflake would come up with a better solution in the future.
from snowflake.snowpark.session import Session
# Establish a Snowpark session
snowflake_connection_info = {
"url": "https://<account locator>.snowflakecomputing.com"
,"account": "<account locator>"
,"account_name": "<account identifier>, do not include the organization name"
,"organization": "<account org name>"
,"user": "XXXX"
,"password": "XXXX"
}
# Get JWT Token
api_sp_session.sql(f'''
alter session set
python_connector_query_result_format = json;''').collect()
# Get the session token, which will be used for API calls for authentication
sptoken_data = api_sp_session.connection._rest._token_request('ISSUE')
api_session_token = sptoken_data['data']['sessionToken']
# craft the request to ingress endpoint with authz
api_headers = {'Authorization': f'''Snowflake Token="{api_session_token}"'''}
The following steps are done after an api_session has been established and the JWT token has been extracted and stored in a header JSON.
import requests
from langchain_community.llms.vllm import VLLMOpenAI
# We set of the initialized api header in a request session
session = requests.Session()
session.headers.update(api_headers)
# We refer to the model that we would like to host in the VLLM server
HF_MODEL = 'deepseek-ai/deepseek-coder-6.7b-instruct'
print(f'Model: {HF_MODEL}')
# We create the openai url
api_base_url = '<api endpoint url, obtained via Show endpoint command>'
# Ref : https://docs.snowflake.com/en/sql-reference/sql/show-endpoints
vllm_openai_url = f'{api_base_url}/v1'
print(f'VLLM openai URL: {vllm_openai_url} ... \n ')
# We instantiate the langchain llm object
llm = VLLMOpenAI(
model_name = HF_MODEL
,openai_api_base = vllm_openai_url
,openai_api_key = 'EMPTY'
,default_headers = api_headers
,streaming = True
,max_tokens = 300
)
observe the use of the variable ‘api_headers’; in the above set of statements.
and that’s it, you can add the ‘llm’ object to a chain or an agent, etc.. for interacting with the SPCS hosted LLM.
What next
For now, I am in my early LLM amateur journey; I only showcased how to communicate using LangChain LLM, and I have not explored using other frameworks. I am confident that, with some tweaking it might be possible.
With what I have demonstrated, you can now develop an LLM application using your regular dev machine. The pattern is very similar to ways that one would interact with many SAS offerings like OpenAI / Anthropic / AWS Bedrock etc…
Additionally, there could be multiple SPCS, each with its own hosted LLM and shared across the enterprise/organization with different LLM applications.
Develop and Chill on!!!