国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

開源大模型框架llama.cpp使用C++ api開發(fā)入門

2年前作者：踏莎行hyx分類：Toy博客閱讀(25)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了開源大模型框架llama.cpp使用C++ api開發(fā)入門。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

llama.cpp是一個(gè)C++編寫的輕量級(jí)開源類AIGC大模型框架，可以支持在消費(fèi)級(jí)普通設(shè)備上本地部署運(yùn)行大模型，以及作為依賴庫集成的到應(yīng)用程序中提供類GPT的功能。

以下基于llama.cpp的源碼利用C++ api來開發(fā)實(shí)例demo演示加載本地模型文件并提供GPT文本生成。

項(xiàng)目結(jié)構(gòu)

llamacpp_starter
	- llama.cpp-b1547
	- src
	  |- main.cpp
	- CMakeLists.txt

CMakeLists.txt

cmake_minimum_required(VERSION 3.15)

# this only works for unix, xapian source code not support compile in windows yet

project(llamacpp_starter)

set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

add_subdirectory(llama.cpp-b1547)

include_directories(
    ${CMAKE_CURRENT_SOURCE_DIR}/llama.cpp-b1547
    ${CMAKE_CURRENT_SOURCE_DIR}/llama.cpp-b1547/common
)

file(GLOB SRC
    src/*.h
    src/*.cpp
)

add_executable(${PROJECT_NAME} ${SRC})

target_link_libraries(${PROJECT_NAME}
    common
    llama
)

main.cpp

#include <iostream>
#include <string>
#include <vector>
#include "common.h"
#include "llama.h"

int main(int argc, char** argv)
{
	bool numa_support = false;
	const std::string model_file_path = "./llama-ggml.gguf";
	const std::string prompt = "once upon a time"; // input words
	const int n_len = 32; 	// total length of the sequence including the prompt

	// set gpt params
	gpt_params params;
	params.model = model_file_path;
	params.prompt = prompt;


	// init LLM
	llama_backend_init(false);

	// load model
	llama_model_params model_params = llama_model_default_params();
	//model_params.n_gpu_layers = 99; // offload all layers to the GPU

	llama_model* model = llama_load_model_from_file(model_file_path.c_str(), model_params);

	if (model == NULL)
	{
		std::cerr << __func__ << " load model file error" << std::endl;
		return 1;
	}

	// init context
	llama_context_params ctx_params = llama_context_default_params();

	ctx_params.seed = 1234;
	ctx_params.n_ctx = 2048;
	ctx_params.n_threads = params.n_threads;
	ctx_params.n_threads_batch = params.n_threads_batch == -1 ? params.n_threads : params.n_threads_batch;

	llama_context* ctx = llama_new_context_with_model(model, ctx_params);

	if (ctx == NULL)
	{
		std::cerr << __func__ << " failed to create the llama_context" << std::endl;
		return 1;
	}

	// tokenize the prompt
	std::vector<llama_token> tokens_list = llama_tokenize(ctx, params.prompt, true);

	const int n_ctx = llama_n_ctx(ctx);
	const int n_kv_req = tokens_list.size() + (n_len - tokens_list.size());

	// make sure the KV cache is big enough to hold all the prompt and generated tokens
	if (n_kv_req > n_ctx)
	{
		std::cerr << __func__ << " error: n_kv_req > n_ctx, the required KV cache size is not big enough" << std::endl;
		std::cerr << __func__ << " either reduce n_parallel or increase n_ctx" << std::endl;
		return 1;
	}

	// print the prompt token-by-token
	for (auto id : tokens_list)
		std::cout << llama_token_to_piece(ctx, id) << " ";
	std::cout << std::endl;

	// create a llama_batch with size 512
	// we use this object to submit token data for decoding
	llama_batch batch = llama_batch_init(512, 0, 1);

	// evaluate the initial prompt
	for (size_t i = 0; i < tokens_list.size(); i++)
		llama_batch_add(batch, tokens_list[i], i, { 0 }, false);

	// llama_decode will output logits only for the last token of the prompt
	batch.logits[batch.n_tokens - 1] = true;

	if (llama_decode(ctx, batch) != 0)
	{
		std::cerr << __func__ << " llama_decode failed" << std::endl;
		return 1;
	}

	// main loop to generate words
	int n_cur = batch.n_tokens;
	int n_decode = 0;

	const auto t_main_start = ggml_time_us();

	while (n_cur <= n_len)
	{
		// sample the next token
		auto n_vocab = llama_n_vocab(model);
		auto* logits = llama_get_logits_ith(ctx, batch.n_tokens - 1);

		std::vector<llama_token_data> candidates;
		candidates.reserve(n_vocab);

		for (llama_token token_id = 0; token_id < n_vocab; token_id++)
		{
			candidates.emplace_back(llama_token_data{ token_id, logits[token_id], 0.0f });
		}

		llama_token_data_array candidates_p = { candidates.data(), candidates.size(), false };

		// sample the most likely token
		const llama_token new_token_id = llama_sample_token_greedy(ctx, &candidates_p);

		// is it an end of stream?
		if (new_token_id == llama_token_eos(model) || n_cur == n_len)
		{
			std::cout << std::endl;
			break;
		}

		std::cout << llama_token_to_piece(ctx, new_token_id) << " ";

		// prepare the next batch
		llama_batch_clear(batch);

		// push this new token for next evaluation
		llama_batch_add(batch, new_token_id, n_cur, { 0 }, true);

		n_decode += 1;

		n_cur += 1;

		// evaluate the current batch with the transformer model
		if (llama_decode(ctx, batch))
		{
			std::cerr << __func__ << " failed to eval" << std::endl;
			return 1;
		}
	}
	std::cout << std::endl;

	const auto t_main_end = ggml_time_us();

	std::cout << __func__ << " decoded " << n_decode << " tokens in " << (t_main_end - t_main_start) / 1000000.0f << " s, speed: " << n_decode / ((t_main_end - t_main_start) / 1000000.0f) << " t / s" << std::endl;

	llama_print_timings(ctx);

	llama_batch_free(batch);

	// free context
	llama_free(ctx);
	llama_free_model(model);

	// free LLM
	llama_backend_free();

	return 0;
}

注：

llama支持的模型文件需要自己去下載，推薦到huggingface官網(wǎng)下載轉(zhuǎn)換好的gguf格式文件
llama.cpp編譯可以配置多種類型的增強(qiáng)選項(xiàng)，比如支持CPU/GPU加速，數(shù)據(jù)計(jì)算加速庫

源碼

llamacpp_starter

本文由博客一文多發(fā)平臺(tái) OpenWrite 發(fā)布！文章來源地址http://www.zghlxwxcb.cn/news/detail-768993.html

到了這里，關(guān)于開源大模型框架llama.cpp使用C++ api開發(fā)入門的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場(chǎng)。本站僅提供信息存儲(chǔ)空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請(qǐng)注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

[NLP] 使用Llama.cpp和LangChain在CPU上使用大模型-RAG
下面是構(gòu)建這個(gè)應(yīng)用程序時(shí)將使用的軟件工具: 1.Llama-cpp-python ?下載llama-cpp, llama-cpp-python [NLP] Llama2模型運(yùn)行在Mac機(jī)器-CSDN博客 2、LangChain LangChain是一個(gè)提供了一組廣泛的集成和數(shù)據(jù)連接器，允許我們鏈接和編排不同的模塊?？梢猿Ｒ娏奶鞕C(jī)器人、數(shù)據(jù)分析和文檔問答等應(yīng)用。
2024年02月04日
瀏覽(20)
AI-windows下使用llama.cpp部署本地Chinese-LLaMA-Alpaca-2模型
生成的文件在 .buildbin ，我們要用的是 main.exe ， binmain.exe -h 查看使用幫助本項(xiàng)目基于Meta發(fā)布的可商用大模型Llama-2開發(fā)，是中文LLaMAAlpaca大模型的第二期項(xiàng)目，開源了中文LLaMA-2基座模型和Alpaca-2指令精調(diào)大模型。這些模型在原版Llama-2的基礎(chǔ)上擴(kuò)充并優(yōu)化了中文詞表，使用
2024年04月25日
瀏覽(33)
LLM大模型推理加速實(shí)戰(zhàn)：vllm、fastllm與llama.cpp使用指南
隨著人工智能技術(shù)的飛速發(fā)展，大型語言模型（LLM）在諸如自然語言處理、智能問答、文本生成等領(lǐng)域的應(yīng)用越來越廣泛。然而，LLM模型往往具有龐大的參數(shù)規(guī)模，導(dǎo)致推理過程計(jì)算量大、耗時(shí)長(zhǎng)，成為了制約其實(shí)際應(yīng)用的關(guān)鍵因素。為了解決這個(gè)問題，一系列大模型推理加
2024年04月13日
瀏覽(28)
使用go-llama.cpp 運(yùn)行 yi-01-6b大模型，使用本地CPU運(yùn)行，速度挺快的
https://github.com/ggerganov/llama.cpp LaMA.cpp 項(xiàng)目是開發(fā)者 Georgi Gerganov 基于 Meta 釋出的 LLaMA 模型（簡(jiǎn)易 Python 代碼示例）手?jǐn)]的純 C/C++ 版本，用于模型推理。所謂推理，即是給輸入-跑模型-得輸出的模型運(yùn)行過程。那么，純 C/C++ 版本有何優(yōu)勢(shì)呢？無需任何額外依賴，相比 Python 代碼
2024年02月20日
瀏覽(19)
基于llama.cpp學(xué)習(xí)開源LLM本地部署
目錄前言一、llama.cpp是什么？二、使用步驟 1.下載編譯llama.cpp 2. 普通編譯 3. BLAS編譯 3.1、OpenBLAS 編譯 CPU版 3.2?cuBLAS 編譯GPU版本 4. 模型量化 4.1、模型文件下載：
2024年01月21日
瀏覽(30)
【C++】開源：abseil-cpp基礎(chǔ)組件庫配置使用
?? ★,° :.☆(￣▽￣)/$: .°★ ?? 這篇文章主要介紹abseil-cpp基礎(chǔ)組件庫配置使用。無專精則不能成，無涉獵則不能通?！?jiǎn)⒊?歡迎來到我的博客，一起學(xué)習(xí)，共同進(jìn)步。喜歡的朋友可以關(guān)注一下，下次更新不迷路?? 項(xiàng)目Github地址： https://github.com/abseil/abseil-cpp 官網(wǎng)：
2024年02月13日
瀏覽(54)
【C++】開源：matplotlib-cpp靜態(tài)圖表庫配置與使用
?? ★,° :.☆(￣▽￣)/$: .°★ ?? 這篇文章主要介紹matplotlib-cpp圖表庫配置與使用。無專精則不能成，無涉獵則不能通。——梁?jiǎn)⒊?歡迎來到我的博客，一起學(xué)習(xí)，共同進(jìn)步。喜歡的朋友可以關(guān)注一下，下次更新不迷路?? 項(xiàng)目Github地址： https://github.com/lava/matplotlib-cpp matpl
2024年02月14日
瀏覽(18)
llama.cpp模型推理之界面篇
目錄前言一、llama.cpp 目錄結(jié)構(gòu) 二、llama.cpp 之 server 學(xué)習(xí) 1. 介紹 2. 編譯部署 3. 啟動(dòng)服務(wù) 4、擴(kuò)展或構(gòu)建其他的?Web 前端 5、其他在《基于llama.cpp學(xué)習(xí)開源LLM本地部署》這篇中介紹了基于llama.cpp學(xué)習(xí)開源LLM本地部署。在最后簡(jiǎn)單介紹了API 的調(diào)用方式。不習(xí)慣命令行的同鞋，也
2024年01月19日
瀏覽(20)
【大模型】大模型 CPU 推理之 llama.cpp
描述 The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud. Plain C/C++ implementation without any dependencies Apple silicon is a first-class citizen - optimized via ARM NEON, Accelerate and Metal frameworks AVX, AVX2 and AVX512 support for x86 arc
2024年04月14日
瀏覽(23)
llama.cpp LLM模型 windows cpu安裝部署；運(yùn)行LLaMA-7B模型測(cè)試
參考： https://www.listera.top/ji-xu-zhe-teng-xia-chinese-llama-alpaca/ https://blog.csdn.net/qq_38238956/article/details/130113599 cmake windows安裝參考：https://blog.csdn.net/weixin_42357472/article/details/131314105 1、下載： 2、編譯 3、測(cè)試運(yùn)行參考： https://zhuanlan.zhihu.com/p/638427280 模型下載： https://huggingface.co/nya
2024年02月15日
瀏覽(17)

<rt id="31h8y"><del id="31h8y"></del></rt>

<input id="31h8y"><delect id="31h8y"><style id="31h8y"></style></delect></input>