Lamini

Official repository of the lamini pypi package!

Most recent updates via pip install --upgrade lamini. Full docs of the Python library and REST API are here.

Quick Tour

Get LLMs in production in 2 minutes with Lamini!

First, get <YOUR-LAMINI-API-KEY> at https://app.lamini.ai/account.

Add the key as an environment variable. Or, authenticate via the Python library below.

export LAMINI_API_KEY="<YOUR-LAMINI-API-KEY>"

Next, install the Python library.

pip install --upgrade lamini

Run an LLM with a few lines of code.

import laminilamini.api_key = "<YOUR-LAMINI-API-KEY>" # or set as environment variable abovellm = lamini.LlamaV2Runner()print(llm.call("How are you?"))

Expected Output

That's it! ?

More details in our docs.

Better inference

Customize inference in many ways:

Change the prompt.
Change the model.
Change the output type, e.g. str, int, or float.
Output multiple values in structured JSON.
High-throughput inference, e.g. 10,000 requests per call.
Run simple applications like RAG.

You'll breeze through some of these here. You can step through all of these in the Inference Quick Tour.

Prompt-engineer the system prompt in LlamaV2Runner.

from lamini import LlamaV2Runnerpirate_llm = LlamaV2Runner(system_prompt="You are a pirate. Say arg matey!")print(pirate_llm.call("How are you?"))

Expected Output

Definitely check out the expected output here. Because now it's a pirate :)

You can also add multiple outputs and multiple output types in one call. The output is a JSON schema that is strictly enforced.

In order to do this in Python, you have to drop a to lower-level. The Lamini class is the base class for all runners, including the LlamaV2Runner. Lamini wraps our REST API endpoint.

Lamini expects an input dictionary, and a return dictionary for the output type. You can return multiple values, e.g. an int and a string here.

from lamini import Laminillm = Lamini(model_name="meta-llama/Llama-2-7b-chat-hf")llm(    "How old are you?",    output_type={"age": "int", "units": "str"}
)

Expected Output

Bigger inference

Batching requests is the way to get more throughput. It's easy: simply pass in a list of inputs to any of the classes and it will be handled.

You can send up to 10,000 requests per call - on the Pro and Organization tiers. Up to 1000 on the Free tier.

llm(
    [        "How old are you?",        "What is the meaning of life?",        "What is the hottest day of the year?",
    ],    output_type={"response": "str", "explanation": "str"}
)

Expected Output

Training (ft. finetuning)

When running inference, with prompt-engineering and RAG, is not enough for your LLM, you can train it. This is harder but will result in better performance, better leverage of your data, and increased knowledge and reasoning capabilities.

There are many ways to train your LLM. We'll cover the most common ones here:

Basic training: build your own LLM for specific domain knowledge or task with finetuning, domain adaptation, and more
Better training: customize your training call and evaluate your LLM
Bigger training: pretrain your LLM on a large dataset, e.g. Wikipedia, to improve its general knowledge

For the "Bigger training" section, see the Training Quick Tour.

First, get data and put it in the format that LlamaV2Runner expects, which includes the system prompt, the user query, and the expected output response.

Sample data:

{    "user": "Are there any step-by-step tutorials or walkthroughs available in the documentation?",    "system": "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.",    "output": "Yes, there are step-by-step tutorials and walkthroughs available in the documentation section. Here\u2019s an example for using Lamini to get insights into any python library: https://lamini-ai.github.io/example/",
}

Now, load more data in that format. We recommend at least 1000 examples to see a difference in training.

data = get_data()

Code for get_data()

Next, instantiate the model and load the data into it.

from lamini import LlamaV2Runnerllm = LlamaV2Runner()llm.load_data(data)

Train the model. Track progress at https://app.lamini.ai/train.

llm.train()

Evaluate your model after training, which compares results to the base model.

llm.evaluate()

After training, llm will use the finetuned model for inference.

llm.call("What's your favorite animal?")

我是小智——您的学习智能助手!

Lamini可以使用开源的LLM，用Lamini库对生成的数据进行微调。

Lamini

Quick Tour

Better inference

Bigger inference

Training (ft. finetuning)