国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

【C++項(xiàng)目】boost搜索引擎

2年前作者：小唐學(xué)渣分類：Toy博客閱讀(28)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了【C++項(xiàng)目】boost搜索引擎。希望對大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

一、項(xiàng)目的相關(guān)背景

1.1 boost基本介紹

boost官網(wǎng)

Boost庫是為C++語言標(biāo)準(zhǔn)庫提供擴(kuò)展的一些C++程序庫的總稱。

Boost庫由Boost社區(qū)組織開發(fā)、維護(hù)。其目的是為C++程序員提供免費(fèi)、同行審查的、可移植的程序庫。Boost庫可以與C++標(biāo)準(zhǔn)庫完美共同工作，并且為其提供擴(kuò)展功能。Boost庫使用Boost License來授權(quán)使用，根據(jù)該協(xié)議，商業(yè)的非商業(yè)的使用都是允許并鼓勵(lì)的。
Boost社區(qū)建立的初衷之一就是為C++的標(biāo)準(zhǔn)化工作提供可供參考的實(shí)現(xiàn)，Boost社區(qū)的發(fā)起人Dawes本人就是C++標(biāo)準(zhǔn)委員會的成員之一。在Boost庫的開發(fā)中，Boost社區(qū)也在這個(gè)方向上取得了豐碩的成果。在送審的C++標(biāo)準(zhǔn)庫TR1中，有十個(gè)Boost庫成為標(biāo)準(zhǔn)庫的候選方案。在更新的TR2中，有更多的Boost庫被加入到其中。從某種意義上來講，Boost庫成為具有實(shí)踐意義的準(zhǔn)標(biāo)準(zhǔn)庫。
大部分boost庫功能的使用只需包括相應(yīng)頭文件即可，少數(shù)（如正則表達(dá)式庫，文件系統(tǒng)庫等）需要鏈接庫。里面有許多具有工業(yè)強(qiáng)度的庫，如graph庫。
很多Boost中的庫功能堪稱對語言功能的擴(kuò)展，其構(gòu)造用盡精巧的手法，不要貿(mào)然的花費(fèi)時(shí)間研讀。Boost另外一面，比如Graph這樣的庫則是具有工業(yè)強(qiáng)度，結(jié)構(gòu)良好，非常值得研讀的精品代碼，并且也可以放心的在產(chǎn)品代碼中多多利用

1.2 為什么要自主實(shí)現(xiàn)boost搜索引擎

百度、搜狗、360搜索、頭條新聞客戶端 - 我們自己實(shí)現(xiàn)是不可能的?。ㄈW(wǎng)搜索）
boost的官網(wǎng)是沒有站內(nèi)搜索的，需要我們自己做一個(gè)

站內(nèi)搜索：搜索的數(shù)據(jù)更垂直，數(shù)據(jù)量其實(shí)更小

二、搜索引擎的相關(guān)宏觀原理和項(xiàng)目演示

【C++項(xiàng)目】boost搜索引擎
用戶輸入：關(guān)鍵字 -> 倒排索引中查找 -> 提取出文檔ID -> 根據(jù)正排索引 -> 找到文檔的內(nèi)容 ->title+conent（desc）+url 文檔結(jié)果進(jìn)行摘要->構(gòu)建響應(yīng)結(jié)果

2.1 項(xiàng)目演示：

【C++項(xiàng)目】boost搜索引擎

三、搜索引擎技術(shù)棧和項(xiàng)目環(huán)境

技術(shù)棧: C/C++ C++11, STL, 準(zhǔn)標(biāo)準(zhǔn)庫Boost，Jsoncpp，cppjieba，cpp-httplib , 選學(xué)： html5，css，js、jQuery、Ajax
項(xiàng)目環(huán)境： Centos 7云服務(wù)器，vim/gcc(g++)/Makefile , vs2019 or vs code

四、正排索引 vs 倒排索引 - 搜索引擎具體原理

文檔1：雷軍買了四斤小米
文檔2：雷軍發(fā)布了小米手機(jī)

正排索引：就是從文檔ID找到文檔內(nèi)容(文檔內(nèi)的關(guān)鍵字)

文檔ID	文檔內(nèi)容
1	雷軍買了四斤小米
2	雷軍發(fā)布了小米手機(jī)

目標(biāo)文檔進(jìn)行分詞（目的：方便建立倒排索引和查找）：

文檔1[雷軍買了四斤小米 ]: 雷軍/買/四斤/小米/四斤小米
文檔2[雷軍發(fā)布了小米手機(jī)]：雷軍/發(fā)布/小米/小米手機(jī)

停止詞：了，的，嗎，a，the，一般我們在分詞的時(shí)候可以不考慮

關(guān)鍵字（具有唯一性）	文檔ID
雷軍	文檔1，文檔2
買	文檔1
四斤	文檔1
小米	文檔1，文檔2
四斤小米	文檔1
發(fā)布	文檔2
小米手機(jī)	文檔2

模擬一次查找的過程：
用戶輸入：小米 -> 倒排索引中查找 -> 提取出文檔ID(1,2) -> 根據(jù)正排索引 -> 找到文檔的內(nèi)容 ->
title+conent（desc）+url 文檔結(jié)果進(jìn)行摘要->構(gòu)建響應(yīng)結(jié)果

五、編寫數(shù)據(jù)去標(biāo)簽與數(shù)據(jù)清洗的模塊 Parser

目前只需要boost_1_79_0/doc/html目錄下的html文件，用它來進(jìn)行建立索引

【C++項(xiàng)目】boost搜索引擎

#include <iostream>
#include <string>
#include <vector>
#include <boost/filesystem.hpp>
#include "Util.hpp"

const std::string src_path = "data/input";
const std::string output = "data/raw_html/raw.txt";

typedef struct DocInfo
{
public:
    std::string title;   // 文檔標(biāo)題
    std::string content; // 文檔內(nèi)容
    std::string url;     // 網(wǎng)址
} DocInfo_t;

// const & 輸入
// * 輸出
// & 輸入輸出
bool EnumFile(const std::string &src_path, std::vector<std::string> *file_list);

bool ParseHtml(const std::vector<std::string> &files_list, std::vector<DocInfo_t> *results);

bool SaveHtml(const std::vector<DocInfo_t> &results, const std::string &output);

int main()
{
    std::vector<std::string> files_list;
    // 第一步，遞歸式將每個(gè)HTML文件名帶路徑，保存在files_list當(dāng)中；方便后期一個(gè)一個(gè)讀取
    if (!EnumFile(src_path, &files_list))
    {
        std::cerr << "EnumFile error" << std::endl;
        return 1;
    }
    // 第二步，按照file_list讀取每一個(gè)文件中的內(nèi)容，并進(jìn)行解析
    std::vector<DocInfo_t> results;
    if (!ParseHtml(files_list, &results))
    {
        std::cerr << "ParseHtml error" << std::endl;
        return 2;
    }
    // 第三步，把解析完成的各個(gè)文件內(nèi)容，寫入到output里面，按照\n作為每個(gè)文檔的分隔符 \3作為分割doc里面的各個(gè)數(shù)據(jù)
    if (!SaveHtml(results, output))
    {
        std::cerr << "SaveHtml error" << std::endl;
        return 3;
    }
    return 0;
}

bool EnumFile(const std::string &src_path, std::vector<std::string> *files_list)
{
    namespace fs = boost::filesystem;
    fs::path root_path(src_path);
    // 判斷當(dāng)前路徑是否存在
    if (!fs::exists(root_path))
    {
        std::cerr << src_path << " not exists" << std::endl;
        return false;
    }
    // 定義一個(gè)迭代器，來判斷遞歸結(jié)束
    fs::recursive_directory_iterator end;
    for (fs::recursive_directory_iterator iter(root_path); iter != end; ++iter)
    {
        // 判斷是否是普通文件，HTML是普通文件
        if (!fs::is_regular_file(*iter))
        {
            continue;
        }
        // 判斷后綴是否是html
        if (iter->path().extension() != ".html")
        {
            continue;
        }
        // std::cout << "debug:" << iter->path().string() << std::endl;
        //  當(dāng)前路徑一定是合法的，一html為后綴的普通文件
        //將當(dāng)前路徑后綴為HTML的文件名保存在files_list，方便進(jìn)行文本分析
        files_list->push_back(std::move(iter->path().string())); // move 減少拷貝
    }
    return true;
}

static bool ParseTitle(const std::string &file, std::string *title)
{
    size_t begin = file.find("<title>");
    if (begin == std::string::npos)
    {
        return false;
    }
    size_t end = file.find("</title>");
    if (end == std::string::npos)
    {
        return false;
    }

    begin += std::string("<title>").size(); // begin指向正文
    *title = file.substr(begin, end - begin);
    return true;
}

static bool ParseContent(const std::string &file, std::string *content)
{
    // 去標(biāo)簽，編寫一個(gè)簡單的狀態(tài)機(jī)
    enum status
    {
        LABLE,
        CONTENT
    };

    enum status s = LABLE;
    for (char ch : file)
    {
        switch (s)
        {
        case LABLE:
            if (ch == '>')
            {
                s = CONTENT;
            }
            break;
        case CONTENT:
            if (ch == '<')
            {
                s = LABLE;
            }
            else
            {
                if (ch == '\n')
                {
                    ch = ' ';
                }
                *content += ch;
            }
            break;
        default:
            break;
        }
    }
    return true;
}

static bool ParseUrl(const std::string &file_path, std::string *url)
{
    std::string url_head = "https://www.boost.org/doc/libs/1_79_0/doc/html";
    std::string url_tail = file_path.substr(src_path.size());
    *url = url_head + url_tail;
    return true;
}

// for debug
static void ShowDoc(DocInfo_t &doc)
{
    std::cout << doc.title << std::endl;
    std::cout << doc.content << std::endl;
    std::cout << doc.url << std::endl;
}

bool ParseHtml(const std::vector<std::string> &files_list, std::vector<DocInfo_t> *results)
{
    for (const std::string &file : files_list)
    {
        // 1，讀取文件內(nèi)容
        std::string result; // 文件內(nèi)容
        if (!ns_util::FileUtil::ReadFile(file, &result))
        {
            continue;
        }
        DocInfo_t doc;
        // 2.解析指定文件的title
        if (!ParseTitle(result, &doc.title))
        {
            continue;
        }
        // 解析指定文件的content
        if (!ParseContent(result, &doc.content))
        {
            continue;
        }
        // 解析指定文件的url
        if (!ParseUrl(file, &doc.url))
        {
            continue;
        }
        // debug doc
        // ShowDoc(doc);
        // break;
        // 提取完畢,當(dāng)前文件的相關(guān)結(jié)果都保存在了doc里面
        results->push_back(doc); // 細(xì)節(jié)，會發(fā)生拷貝
    }
    return true;
}

bool SaveHtml(const std::vector<DocInfo_t> &results, const std::string &output)
{
#define SEP '\3'
    std::ofstream out(output, std::ios::out | std::ios::binary);
    if (!out.is_open())
    {
        std::cerr << "open " << output << " failed" << std::endl;
        return false;
    }
    // 把解析完成的各個(gè)文件內(nèi)容，寫入到output里面，按照\n作為每個(gè)文檔的分隔符 \3作為分割doc里面的各個(gè)數(shù)據(jù)
    for (auto &it : results)
    {
        std::string out_string;
        out_string += it.title;
        out_string += SEP;
        out_string += it.content;
        out_string += SEP;
        out_string += it.url;
        out_string += '\n';
        out.write(out_string.c_str(), out_string.size());
    }
    out.close();
    return true;
}

六、編寫建立索引的模塊 Index

#pragma once
#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>
#include <fstream>
#include "Util.hpp"
#include <mutex>
#include "Log.hpp"

namespace ns_index
{
    //
    struct DocInfo
    {
        std::string _title;   // 文檔標(biāo)題
        std::string _content; // 文檔內(nèi)容
        std::string _url;     // 文檔url
        uint64_t _doc_id;     // 文檔id,方便構(gòu)建倒排拉鏈
    };

    //倒排拉鏈
    struct InvertedElem
    {
        uint64_t _doc_id;
        std::string _word;
        int _weight;
    };
    typedef std::vector<InvertedElem> InvertedList;

    class Index
    {
    private:
        Index()
        {
        }
        Index(const Index &) = delete;
        Index &operator=(const Index &) = delete;

    public:
        static Index *GetInstance()
        {
            if (_instance == nullptr)
            {
                std::unique_lock<std::mutex> ulck(_mtx);
                if (_instance == nullptr)
                {
                    _instance = new Index;
                }
            }
            return _instance;
        }

        ~Index()
        {
        }

    public:
        DocInfo *GetForwardIndex(uint64_t doc_id)
        {
            if (doc_id >= _forward_index.size())
            {
                std::cerr << "doc_id out of range" << std::endl;
                return nullptr;
            }
            return &_forward_index[doc_id];
        }

        InvertedList *GetInvertedIndex(const std::string &word)
        {
            auto iter = _inverted_index.find(word);
            if (iter == _inverted_index.end())
            {
                std::cerr << word << "have no Inverted!" << std::endl;
                return nullptr;
            }
            return &(iter->second);
        }

        // parse.cc處理完的數(shù)據(jù)給我
        // /home/ts/procedure_life/program/boost_sercher/data/raw_html
        bool BuildIndex(const std::string &input)
        {
            std::ifstream in(input, std::ios::in | std::ios::binary);
            std::cout << "file name: " << input << std::endl;
            if (!in.is_open())
            {
                std::cerr << "open " << input << " failed!" << std::endl;
                return false;
            }
            std::string line;
            int count = 0;
            while (std::getline(in, line))
            {
                ++count;
                DocInfo *doc = BuildForwardIndex(line);
                // 構(gòu)建正排
                if (nullptr == doc)
                {
                    std::cerr << "bulid " << line << " error" << std::endl;
                    continue;
                }
                // 構(gòu)建倒排
                BuildInvertedIndex(*doc);
                if (count % 50 == 0)
                    // std::cout << "當(dāng)前正在建立文檔：" << count << std::endl;
                    LOG(NORMAL, "構(gòu)建正排和倒排索引:" + std::to_string(count));
            }

            in.close();
            return true;
        }

    private:
        DocInfo *BuildForwardIndex(const std::string &line)
        {
            // 1.解析line，字符串切分
            std::vector<std::string> results;
            const std::string sep = "\3"; // 行內(nèi)分隔符
            ns_util::StringUtil::Split(line, &results, sep);
            if (results.size() != 3)
            {
                return nullptr;
            }
            // 2.將字符串填充到DocInfo
            DocInfo doc;
            doc._title = results[0];
            doc._content = results[1];
            doc._url = results[2];
            doc._doc_id = _forward_index.size();
            // 3.插入到正排索引_forward_index中
            _forward_index.push_back(std::move(doc));
            return &_forward_index.back();
        }

        bool BuildInvertedIndex(const DocInfo &doc)
        {
            // title conten url id
            // word -> 倒排拉鏈

            // 1.對title和content進(jìn)行jieba分詞
            std::vector<std::string> title_words;
            ns_util::JiebaUtil::CutString(doc._title, &title_words);
            std::vector<std::string> content_words;
            ns_util::JiebaUtil::CutString(doc._content, &content_words);

            // 2.統(tǒng)計(jì)詞頻
            struct word_cnt
            {
                int title_cnt;
                int content_cnt;

                word_cnt() : title_cnt(0), content_cnt(0) {}
            };

            std::unordered_map<std::string, word_cnt> word_map;
            // title Hello
            for (auto iter : title_words)
            {
                boost::to_lower(iter);
                word_map[iter].title_cnt++;
            }

            for (auto iter : content_words)
            {
                boost::to_lower(iter);
                word_map[iter].content_cnt++;
            }

            // 3.自定義相關(guān)性
#define X 10
#define Y 1
            for (auto &iter : word_map)
            {
                InvertedElem tmp;
                tmp._doc_id = doc._doc_id;
                tmp._word = iter.first;
                tmp._weight = iter.second.title_cnt * X + iter.second.content_cnt * Y;
                InvertedList &inver_list = _inverted_index[iter.first];
                inver_list.push_back(tmp);
            }
            return true;
        }

    private:
        // 正排索引用數(shù)組就可以
        std::vector<DocInfo> _forward_index;
        // 倒排索引是關(guān)鍵字和一組InverteLIst的對應(yīng)  [關(guān)鍵字和倒排拉鏈的映射]
        std::unordered_map<std::string, InvertedList> _inverted_index;
        static Index *_instance;
        static std::mutex _mtx;
    };
    Index *Index::_instance = nullptr;
    std::mutex Index::_mtx;
}

七、編寫搜索引擎模塊 Searcher

#pragma once
#include "Index.hpp"
#include "Util.hpp"
#include <algorithm>
#include <jsoncpp/json/json.h>
#include <iterator>
#include "Log.hpp"

namespace ns_sercher
{
    struct InvertedElemPrint
    {
        uint64_t _doc_id = 0;
        int _weight = 0;
        std::vector<std::string> _words;
    };

    class Searcher
    {
    public:
        Searcher() {}
        ~Searcher() {}

    public:
        void InitSearcher(const std::string &input)
        {
            // 1. 獲取或則創(chuàng)建index對象
            _index = ns_index::Index::GetInstance();
            // std::cout << "獲取單例成功" << std::endl;
            LOG(NORMAL, "獲取單例成功...");
            // 2. 根據(jù)index對象建立索引
            _index->BuildIndex(input);
            // std::cout << "建立正排和倒排成功" << std::endl;
            LOG(NORMAL, "建立正排和倒排索引成功...");

        }

        void Search(const std::string &query, std::string *json_string)
        {
            // 1.分詞，對我們的query進(jìn)行按照searcher的要求
            std::vector<std::string> words;
            ns_util::JiebaUtil::CutString(query, &words);
            // 2.觸發(fā)，根據(jù)分詞的各個(gè)詞進(jìn)行index查找
            //ns_index::InvertedList inverted_list_all;
            std::vector<InvertedElemPrint> inverted_list_all;
            std::unordered_map<uint64_t, InvertedElemPrint> tokens_map;

            for (std::string word : words)
            {
                boost::to_lower(word);
                ns_index::InvertedList *inverted_List = _index->GetInvertedIndex(word);
                if (nullptr == inverted_List)
                {
                    continue;
                }
                //inverted_list_all.insert(inverted_list_all.begin(), (*inverted_List).begin(), (*inverted_List).end());
                for(const auto& elem : *inverted_List)
                {
                    InvertedElemPrint& item = tokens_map[elem._doc_id];
                    item._weight += elem._weight;
                    item._doc_id = elem._doc_id;
                    item._words.push_back(elem._word);
                }
            }
            
            for(const auto& item :  tokens_map)
            {
                inverted_list_all.push_back(std::move(item.second));
            }

            // 3.合并排序，匯總查詢結(jié)果，按照相關(guān)性（weight）降序排列
            // std::sort(inverted_list_all.begin(), inverted_list_all.end(), 
            // [](const ns_index::InvertedElem& e1, const ns_index::InvertedElem& e2) {
            //     return e1._weight > e2._weight;
            // });
            std::sort(inverted_list_all.begin(), inverted_list_all.end(), 
            [](const InvertedElemPrint& e1, const InvertedElemPrint& e2){
                return e1._weight > e2._weight;
            });
            // 4.構(gòu)建，根據(jù)查找結(jié)果，構(gòu)建json串，
            Json::Value root;
            for(auto& iter : inverted_list_all)
            {
                ns_index::DocInfo* pdoc = _index->GetForwardIndex(iter._doc_id);
                if(nullptr == pdoc)
                {
                    continue;
                }
                Json::Value elem;
                elem["title"] = pdoc->_title;
                elem["content"] = GetDes(pdoc->_content, iter._words[0]); // 文檔是去掉標(biāo)簽后的結(jié)果，但是不是我們想要的結(jié)果，我們想要的是一部分
                // elem["content"] = pdoc->_content; // 文檔是去掉標(biāo)簽后的結(jié)果，但是不是我們想要的結(jié)果，我們想要的是一部分
                elem["url"] = pdoc->_url;
                //elem["id"] = (int)iter._doc_id;
                //elem["weight"] = iter._weight;
                root.append(elem); 
            }
            Json::FastWriter writer;
            *json_string = writer.write(root);
        }

        std::string GetDes(const std::string& html_content, const std::string& word)
        {
            // 從第一次出現(xiàn)word的位置開始向前找50個(gè)字節(jié)，向后找100個(gè)字節(jié)
            const size_t prev_step = 50;
            const size_t next_step = 100;
            // 找到在content中第一次出現(xiàn)word的位置
            auto iter = std::search(html_content.begin(),html_content.end(), word.begin(),word.end(), 
            [](int x, int y){ return std::tolower(x) == std::tolower(y);});
            if(iter == html_content.end())
            {
                return "None1";
            }
            size_t pos = std::distance(html_content.begin(), iter);
            //錯(cuò)誤查找
            // size_t pos = html_content.find(word);
            // if(pos == std::string::npos)
            // {
            //     return "None word";
            // }

            size_t start = 0;
            size_t end = html_content.size() - 1;
            if(start + prev_step < pos)
            {
                start = pos - prev_step;
            }
            if(pos + next_step < end)
            {
                end = pos + next_step;
            }
            if(start > end)
            {
                return "None2";
            }
            // 獲取start-end之間的字符串
            return html_content.substr(start, end - start);
        }

    private:
        ns_index::Index* _index; // 提供查找的索引
    };
}

八、編寫http_server 模塊

#include "./cpp-httplib/httplib.h"
#include "Sercher.hpp"
#include "Log.hpp"

const std::string root_path = "./wwwroot";
const std::string input = "data/raw_html/raw.txt";

int main()
{
    httplib::Server svr;
    ns_sercher::Searcher searcher;
    searcher.InitSearcher(input);
    svr.set_base_dir(root_path.c_str());
    svr.Get("/s", [&searcher](const httplib::Request& req, httplib::Response& res){
        if(!req.has_param("word"))
        {
            res.set_content("必須要有搜索關(guān)鍵系！", "text/plain; charset=utf-8");
            return;
        }
        LOG(NORMAL, "搜索關(guān)鍵詞成功...");
        std::string word = req.get_param_value("word");
        std::string json_string;
        searcher.Search(word, &json_string);
        res.set_content(json_string, "application/json; charset=utf-8");
        //res.set_content("Hello World!", "text/plain; charset=utf-8");
    });
    LOG(NORMAL, "服務(wù)器啟動成功...");
    svr.listen("0.0.0.0", 8080);
    return 0;
}

九、編寫前端模塊

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <script src="http://code.jquery.com/jquery-2.1.1.min.js"></script>

    <title>boost 搜索引擎</title>
    <style>
        /* 去掉網(wǎng)頁中的所有的默認(rèn)內(nèi)外邊距，html的盒子模型 */
        * {
            /* 設(shè)置外邊距 */
            margin: 0;
            /* 設(shè)置內(nèi)邊距 */
            padding: 0;
        }
        /* 將我們的body內(nèi)的內(nèi)容100%和html的呈現(xiàn)吻合 */
        html,
        body {
            height: 100%;
        }
        /* 類選擇器.container */
        .container {
            /* 設(shè)置div的寬度 */
            width: 800px;
            /* 通過設(shè)置外邊距達(dá)到居中對齊的目的 */
            margin: 0px auto;
            /* 設(shè)置外邊距的上邊距，保持元素和網(wǎng)頁的上部距離 */
            margin-top: 15px;
        }
        /* 復(fù)合選擇器，選中container 下的 search */
        .container .search {
            /* 寬度與父標(biāo)簽保持一致 */
            width: 100%;
            /* 高度設(shè)置為52px */
            height: 52px;
        }
        /* 先選中input標(biāo)簽， 直接設(shè)置標(biāo)簽的屬性，先要選中， input：標(biāo)簽選擇器*/
        /* input在進(jìn)行高度設(shè)置的時(shí)候，沒有考慮邊框的問題 */
        .container .search input {
            /* 設(shè)置left浮動 */
            float: left;
            width: 600px;
            height: 50px;
            /* 設(shè)置邊框?qū)傩裕哼吙虻膶挾龋瑯邮?，顏?*/
            border: 1px solid black;
            /* 去掉input輸入框的有邊框 */
            border-right: none;
            /* 設(shè)置內(nèi)邊距，默認(rèn)文字不要和左側(cè)邊框緊挨著 */
            padding-left: 10px;
            /* 設(shè)置input內(nèi)部的字體的顏色和樣式 */
            color: #CCC;
            font-size: 14px;
        }
        /* 先選中button標(biāo)簽， 直接設(shè)置標(biāo)簽的屬性，先要選中， button：標(biāo)簽選擇器*/
        .container .search button {
            /* 設(shè)置left浮動 */
            float: left;
            width: 150px;
            height: 52px;
            /* 設(shè)置button的背景顏色，#4e6ef2 */
            background-color: #4e6ef2;
            /* 設(shè)置button中的字體顏色 */
            color: #FFF;
            /* 設(shè)置字體的大小 */
            font-size: 19px;
            font-family:Georgia, 'Times New Roman', Times, serif;
        }
        .container .result {
            width: 100%;
        }
        .container .result .item {
            margin-top: 15px;
        }

        .container .result .item a {
            /* 設(shè)置為塊級元素，單獨(dú)站一行 */
            display: block;
            /* a標(biāo)簽的下劃線去掉 */
            text-decoration: none;
            /* 設(shè)置a標(biāo)簽中的文字的字體大小 */
            font-size: 20px;
            /* 設(shè)置字體的顏色 */
            color: #4e6ef2;
        }
        .container .result .item a:hover {
            text-decoration: underline;
        }
        .container .result .item p {
            margin-top: 5px;
            font-size: 16px;
            font-family:'Lucida Sans', 'Lucida Sans Regular', 'Lucida Grande', 'Lucida Sans Unicode', Geneva, Verdana, sans-serif;
        }

        .container .result .item i{
            /* 設(shè)置為塊級元素，單獨(dú)站一行 */
            display: block;
            /* 取消斜體風(fēng)格 */
            font-style: normal;
            color: green;
        }
    </style>
</head>
<body>
    <div class="container">
        <div class="search">
            <input type="text" value="請輸入搜索關(guān)鍵字">
            <button onclick="Search()">搜索一下</button>
        </div>
        <div class="result">
            <!-- 動態(tài)生成網(wǎng)頁內(nèi)容 -->
            <!-- <div class="item">
                <a href="#">這是標(biāo)題</a>
                <p>這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要</p>
                <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
            </div>
            <div class="item">
                <a href="#">這是標(biāo)題</a>
                <p>這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要</p>
                <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
            </div>
            <div class="item">
                <a href="#">這是標(biāo)題</a>
                <p>這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要</p>
                <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
            </div>
            <div class="item">
                <a href="#">這是標(biāo)題</a>
                <p>這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要</p>
                <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
            </div>
            <div class="item">
                <a href="#">這是標(biāo)題</a>
                <p>這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要這是摘要</p>
                <i>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib</i>
            </div> -->
        </div>
    </div>
    <script>
        function Search(){
            // 是瀏覽器的一個(gè)彈出框
            // alert("hello js!");
            // 1. 提取數(shù)據(jù), $可以理解成就是JQuery的別稱
            let query = $(".container .search input").val();
            console.log("query = " + query); //console是瀏覽器的對話框，可以用來進(jìn)行查看js數(shù)據(jù)

            //2. 發(fā)起http請求,ajax: 屬于一個(gè)和后端進(jìn)行數(shù)據(jù)交互的函數(shù)，JQuery中的
            $.ajax({
                type: "GET",
                url: "/s?word=" + query,
                success: function(data){
                    console.log(data);
                    BuildHtml(data);
                }
            });
        }

        function BuildHtml(data){
            // 獲取html中的result標(biāo)簽
            let result_lable = $(".container .result");
            // 清空歷史搜索結(jié)果
            result_lable.empty();

            for( let elem of data){
                // console.log(elem.title);
                // console.log(elem.url);
                let a_lable = $("<a>", {
                    text: elem.title,
                    href: elem.url,
                    // 跳轉(zhuǎn)到新的頁面
                    target: "_blank"
                });
                let p_lable = $("<p>", {
                    text: elem.desc
                });
                let i_lable = $("<i>", {
                    text: elem.url
                });
                let div_lable = $("<div>", {
                    class: "item"
                });
                a_lable.appendTo(div_lable);
                p_lable.appendTo(div_lable);
                i_lable.appendTo(div_lable);
                div_lable.appendTo(result_lable);
            }
        }
    </script>
</body>
</html>

十、添加日志

#pragma once

#include <iostream>
#include <string>
#include <ctime>

#define NORMAL 1
#define WARNING 2
#define DEBUG 3
#define FATAL 4

#define LOG(LEVEL, MESSAGE) Log(#LEVEL, MESSAGE, __FILE__, __LINE__)

void Log(const std::string& level, const std::string& message, const std::string& file, int line)
{
    std::cout << "[level:" << level << "]" << "[time:" << time(nullptr) << "]" << "[message:" << message << "]"
    << "[file:" << file << "]" << "[line:" << line << "]" << std::endl;
}

10.1 部署服務(wù)到 linux 上

nohup ./http_server > log/log.txt 2>&1 &

十一、結(jié)項(xiàng)總結(jié)

項(xiàng)目擴(kuò)展方向文章來源地址http://www.zghlxwxcb.cn/news/detail-414662.html

建立整站搜索
設(shè)計(jì)一個(gè)在線更新的方案，信號，爬蟲，完成整個(gè)服務(wù)器的設(shè)計(jì)
不使用組件，而是自己設(shè)計(jì)一下對應(yīng)的各種方案（有時(shí)間，有精力）
在我們的搜索引擎中，添加競價(jià)排名(強(qiáng)烈推薦)
熱次統(tǒng)計(jì)，智能顯示搜索關(guān)鍵詞（字典樹，優(yōu)先級隊(duì)列）(比較推薦)
設(shè)置登陸注冊，引入對mysql的使用(比較推薦的)

到了這里，關(guān)于【C++項(xiàng)目】boost搜索引擎的文章就介紹完了。如果您還想了解更多內(nèi)容，請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

【Boost搜索引擎項(xiàng)目】
1.編寫數(shù)據(jù)去標(biāo)簽?zāi)K–parser.cc 將去標(biāo)簽之后干凈文檔以title3content3urlntitle3content3urln格式放入同一文件中。 2.建立索引模塊–index.hpp 讀取處理好的行文本文件進(jìn)行分詞、權(quán)重計(jì)算等操作，在內(nèi)存中構(gòu)造出正排索引和倒排索引。 3.編寫搜索引擎模塊–searcher.hpp 對查詢詞進(jìn)
2024年02月15日
瀏覽(25)
【項(xiàng)目】Boost搜索引擎
研發(fā)搜索引擎的公司,如百度、搜狗、360搜索,還有各大網(wǎng)站各種客戶端也提供搜索功能為什么選擇實(shí)現(xiàn)Boost搜索引擎 1)因?yàn)锽oost官方網(wǎng)站是沒有搜索功能的,所以我們可以為Boost實(shí)現(xiàn)一個(gè)站內(nèi)搜索引擎，雖然官方提供了boost相關(guān)的一些方法，標(biāo)準(zhǔn)庫中的一些接口，但是我們想看到
2024年02月03日
瀏覽(27)
[C++項(xiàng)目] Boost文檔站內(nèi)搜索引擎(2): 文檔文本解析模塊parser的實(shí)現(xiàn)、如何對文檔文件去標(biāo)簽、如何獲取文檔標(biāo)題...
在上一篇文章中, 已經(jīng)從 Boost 官網(wǎng)獲取了 Boost 庫的源碼. 相關(guān)文章: ??[C++項(xiàng)目] Boost文檔站內(nèi)搜索引擎(1): 項(xiàng)目背景介紹、相關(guān)技術(shù)棧、相關(guān)概念介紹… 接下來就要編寫代碼了. 不過還需要做一些準(zhǔn)備工作. 創(chuàng)建項(xiàng)目目錄所有的項(xiàng)目文件肯定要在一個(gè)目錄下, 找一個(gè)位置執(zhí)行下
2024年02月14日
瀏覽(25)
[C++項(xiàng)目] Boost文檔站內(nèi)搜索引擎(3): 建立文檔及其關(guān)鍵字的正排倒排索引、jieba庫的安裝與使用...
之前的兩篇文章: 第一篇文章介紹了本項(xiàng)目的背景, 獲取了 Boost 庫文檔 ??[C++項(xiàng)目] Boost文檔站內(nèi)搜索引擎(1): 項(xiàng)目背景介紹、相關(guān)技術(shù)棧、相關(guān)概念介紹… 第二篇文章分析實(shí)現(xiàn)了 parser 模塊. 此模塊的作用是對所有文檔 html 文件, 進(jìn)行清理并匯總 ??[C++項(xiàng)目] Boost文檔站內(nèi)搜
2024年02月07日
瀏覽(91)
基于boost庫的搜索引擎項(xiàng)目
boost庫是指一些為C++標(biāo)準(zhǔn)庫提供擴(kuò)展的程序庫總稱，但是boost網(wǎng)站中并沒有為我們提供站內(nèi)搜索功能，因此我們要想找到某一個(gè)類的用法還要一個(gè)個(gè)去找，因此我們這次的目的就是實(shí)現(xiàn)一個(gè)搜索引擎功能，提高我們獲取知識的效率比如百度，谷歌，360等，這些都是大型的搜索
2024年03月14日
瀏覽(29)
基于boost準(zhǔn)標(biāo)準(zhǔn)庫的搜索引擎項(xiàng)目
這是一個(gè)基于Web的搜索服務(wù)架構(gòu) 客戶端-服務(wù)器模型：采用了經(jīng)典的客戶端-服務(wù)器模型，用戶通過客戶端與服務(wù)器交互，有助于集中管理和分散計(jì)算。簡單的用戶界面：客戶端似乎很簡潔，用戶通過簡單的HTTP請求與服務(wù)端交互，易于用戶操作。搜索引擎功能：服務(wù)器端的
2024年04月27日
瀏覽(15)
【Boost搜索引擎項(xiàng)目】Day1 項(xiàng)目介紹+去標(biāo)簽和數(shù)據(jù)清洗框架搭建
??歡迎來到C++項(xiàng)目專欄 ?????♀?作者介紹：前PLA隊(duì)員目前是一名普通本科大三的軟件工程專業(yè)學(xué)生 ??IP坐標(biāo)：湖北武漢 ?? 目前技術(shù)棧：C/C++、Linux系統(tǒng)編程、計(jì)算機(jī)網(wǎng)絡(luò)、數(shù)據(jù)結(jié)構(gòu)、Mysql、Python ?? 博客介紹：通過分享學(xué)習(xí)過程，加深知識點(diǎn)的掌握，也希望通過平臺能
2024年03月23日
瀏覽(26)
boost 搜索引擎
done 公司：百度、搜狗、360搜索、頭條新聞客戶端 - 我們自己實(shí)現(xiàn)是不可能的！站內(nèi)搜索：搜索的數(shù)據(jù)更垂直，數(shù)據(jù)量其實(shí)更小 boost的官網(wǎng)是沒有站內(nèi)搜索的，需要我們自己做一個(gè) 首先在用戶進(jìn)行搜索之前，在公司的服務(wù)器server上，內(nèi)存上有一個(gè)searcher服務(wù)，而我們想進(jìn)行搜
2024年02月11日
瀏覽(18)
Boost搜索引擎
先說一下什么是搜索引擎,很簡單,就是我們平常使用的百度,我們把自己想要所有的內(nèi)容輸入進(jìn)去,百度給我們返回相關(guān)的內(nèi)容.百度一般給我們返回哪些內(nèi)容呢?這里很簡單,我們先來看一下. 這里我們簡單的說一下我們的搜索引擎的基本原理. 我們給服務(wù)器發(fā)起請求,例如搜索關(guān)鍵
2024年01月19日
瀏覽(24)
boost庫搜索引擎
Gitee倉庫：boost庫搜索引擎市面上有很多搜索引擎例如Google、百度、360等，這些都是特別大的項(xiàng)目。對于個(gè)人學(xué)習(xí)我們可以寫一個(gè) 站內(nèi)搜索，這個(gè)搜索的內(nèi)容更加垂直，數(shù)據(jù)量更小，例如C++的文檔The C++ Resources Network Google搜索顯示內(nèi)容：客戶端使用瀏覽器搜索向服務(wù)器發(fā)起
2024年04月09日
瀏覽(30)

<source id="tv6s9"><strong id="tv6s9"><ul id="tv6s9"></ul></strong></source>

<strike id="tv6s9"></strike>

<source id="tv6s9"><strong id="tv6s9"><ul id="tv6s9"></ul></strong></source>