DB-GPT: Revolutionizing Database Interactions with Private LLM Technology

简体中文 |Discord |Documents|Wechat|Community

What is DB-GPT?

DB-GPT is an experimental open-source project that uses localized GPT large models to interact with your data and environment. With this solution, you can be assured that there is no risk of data leakage, and your data is 100% private and secure.

Demo

Run on an RTX 4090 GPU.

Chat Excel

Chat Plugin

LLM Management

FastChat && vLLM

Trace

Chat Knowledge

Chat with data, and figure charts.

Install

Usage Tutorial

Features

Currently, we have released multiple key features, which are listed below to demonstrate our current capabilities:

SQL language capabilities
- SQL generation
- SQL diagnosis
Private domain Q&A and data processing
- Knowledge Management(We currently support many document formats: txt, pdf, md, html, doc, ppt, and url.)
ChatDB
ChatExcel
ChatDashboard
Multi-Agents&Plugins
Unified vector storage/indexing of knowledge base
- Support for unstructured data
  - PDF
  - TXT
  - Markdown
  - CSV
  - DOC
  - PPT
  - WebURL
Multi LLMs Support, Supports multiple large language models, currently supporting
Support API Proxy LLMs
- ChatGPT
- Tongyi
- Wenxin
- ChatGLM
Support Datasources

DataSource	support	Notes
MySQL	Yes
PostgresSQL	Yes
Spark	Yes
DuckDB	Yes
Sqlite	Yes
MSSQL	Yes
ClickHouse	Yes
Oracle	No	TODO
Redis	No	TODO
MongoDB	No	TODO
HBase	No	TODO
Doris	No	TODO
DB2	No	TODO
Couchbase	No	TODO
Elasticsearch	No	TODO
OceanBase	No	TODO
TiDB	No	TODO
StarRocks	No	TODO

Introduction

Is the architecture of the entire DB-GPT shown in the following figure:

The core capabilities mainly consist of the following parts:

Multi-Models: Support multi-LLMs, such as LLaMA/LLaMA2、CodeLLaMA、ChatGLM, QWen、Vicuna and proxy model ChatGPT、Baichuan、tongyi、wenxin etc
Knowledge Based QA: You can perform high-quality intelligent Q&A based on local documents such as pdf, word, excel and other data.
Embedding: Unified data vector storage and indexing, Embed data as vectors and store them in vector databases, providing content similarity search.
Multi-Datasources: Used to connect different modules and data sources to achieve data flow and interaction.
Multi-Agents: Provides Agent and plugin mechanisms, allowing users to customize and enhance the system's behavior.
Privacy & Secure: You can be assured that there is no risk of data leakage, and your data is 100% private and secure.
Text2SQL: We enhance the Text-to-SQL performance by applying Supervised Fine-Tuning (SFT) on large language models

SubModule

DB-GPT-Hub Text-to-SQL performance by applying Supervised Fine-Tuning (SFT) on large language models.
DB-GPT-Plugins DB-GPT Plugins, Can run autogpt plugin directly
DB-GPT-Web ChatUI for DB-GPT

Image

🌐 AutoDL Image

Language Switching

In the .env configuration file, modify the LANGUAGE parameter to switch to different languages. The default is English (Chinese: zh, English: en, other languages to be added later).

Contribution

Please run black . before submitting the code. contributing guidelines, how to contribution

RoadMap

KBQA RAG optimization

Multi Datasource Support

Multi Datasource Support
- MySQL
- PostgresSQL
- Spark
- DuckDB
- Sqlite
- MSSQL
- ClickHouse
- Oracle
- Redis
- MongoDB
- HBase
- Doris
- DB2
- Couchbase
- Elasticsearch
- OceanBase
- TiDB
- StarRocks

Multi-Models And vLLM

Cluster Deployment
Fastchat Support
vLLM Support
Cloud-native environment and support for Ray environment
Service Registry(eg:nacos)
Compatibility with OpenAI's interfaces
Expansion and optimization of embedding models

Agents market and Plugins

multi-agents framework
custom plugin development
plugin market
Integration with CoT
Enrich plugin sample library
Support for AutoGPT protocol
Integration of multi-agents and visualization capabilities, defining LLM+Vis new standards

Cost and Observability

debugging
Observability
cost & budgets

Text2SQL Finetune

support llms
- LLaMA
- LLaMA-2
- BLOOM
- BLOOMZ
- Falcon
- Baichuan
- Baichuan2
- InternLM
- Qwen
- XVERSE
- ChatGLM2
SFT Accuracy

As of October 10, 2023, by fine-tuning an open-source model of 13 billion parameters using this project, the execution accuracy on the Spider evaluation dataset has surpassed that of GPT-4!

name	Execution Accuracy	reference
GPT-4	0.762	numbersstation-eval-res
ChatGPT	0.728	numbersstation-eval-res
CodeLlama-13b-Instruct-hf_lora	0.789	sft train by our this project,only used spider train dataset ,the same eval way in this project with lora SFT
CodeLlama-13b-Instruct-hf_qlora	0.774	sft train by our this project,only used spider train dataset ,the same eval way in this project with qlora and nf4,bit4 SFT
wizardcoder	0.610	text-to-sql-wizardcoder
CodeLlama-13b-Instruct-hf	0.556	eval in this project default param
llama2_13b_hf_lora_best	0.744	sft train by our this project,only used spider train dataset ,the same eval way in this project

More Information about Text2SQL finetune

Licence

The MIT License (MIT)

Contact Information

We are working on building a community, if you have any ideas about building the community, feel free to contact us.

Name		Name	Last commit message	Last commit date
Latest commit History 2,297 Commits
.github		.github
assets		assets
docker		docker
docs		docs
examples		examples
pilot		pilot
plugins		plugins
requirements		requirements
scripts		scripts
tests		tests
tools		tools
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
.plugin_env.template		.plugin_env.template
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README.zh.md		README.zh.md
docker-compose.yml		docker-compose.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DB-GPT: Revolutionizing Database Interactions with Private LLM Technology

What is DB-GPT?

Contents