python连接clickhouse数据库

时间：08-27来源：作者：点击数：28

简介

Yandex在2016年6月15日开源了一个数据分析的数据库，名字叫做ClickHouse，这对保守俄罗斯人来说是个特大事。更让人惊讶的是，这个列式存储数据库的跑分要超过很多流行的商业MPP数据库软件，例如Vertica。如果你没有听过Vertica，那你一定听过 Michael Stonebraker，2014年图灵奖的获得者，PostgreSQL和Ingres发明者（Sybase和SQL Server都是继承Ingres而来的）, Paradigm4和SciDB的创办者。Michael Stonebraker于2005年创办Vertica公司，后来该公司被HP收购，HP Vertica成为MPP列式存储商业数据库的高性能代表，Facebook就购买了Vertica数据用于用户行为分析。简单的说，ClickHouse作为分析型数据库，有三大特点：一是跑分快，二是功能多，三是文艺范

官网地址：https://clickhouse.tech/

官方文档：https://clickhouse.tech/docs/zh/single/

clickhouse-driver

ClickHouse没有官方的Python接口，有个第三方的库，叫clickhouse-driver，GitHub地址是：mymarilyn/clickhouse-driver: ClickHouse Python Driver with native interface support

安装：

pip install clickhouse-driver

使用方法如下：

from clickhouse_driver import Client

client = Client(host='localhost', database='default', user='default', password='')
client.execute('SHOW DATABASES')

==========================================================
>>> from clickhouse_driver import connect
>>>
>>> conn = connect('clickhouse://localhost')
>>> cursor = conn.cursor()
>>>
>>> cursor.execute('SHOW TABLES')
>>> cursor.fetchall()
[('test',)]

clickhouse-sqlalchemy

安装

pip install clickhouse-sqlalchemy==0.1.4 
pip install sqlalchemy==1.3.19

使用

# -*- coding:utf-8 -*-
from clickhouse_sqlalchemy import make_session
from sqlalchemy import create_engine


conf = {
    "user": "default",
    "password": "",
    "server_host": "47.104",
    "port": "8123",
    "db": "test"
}
connection = 'clickhouse://{user}:{password}@{server_host}:{port}/{db}'.format(**conf)
engine = create_engine(connection, pool_size=100, pool_recycle=3600, pool_timeout=20)


def get_session(engine):
    return make_session(engine)

def execute(sql):
    session = get_session(engine)
    cursor = session.execute(sql)
    try:
        fields = cursor._metadata.keys
        return [dict(zip(fields, item)) for item in cursor.fetchall()]
    finally:
        cursor.close()
        session.close()

query='SHOW TABLES'
result=execute(query)
print(result)

pandahouse

github：https://github.com/kszucs/pandahouse

安装

pip install pandahouse

Writing dataframe to clickhouse

from pandahouse.core import to_clickhouse


connection = {'host': 'http://clickhouse-host:8123',
              'database': 'test','user':'user' ,'password':'password','encoding':'utf-8'}
affected_rows = to_clickhouse(df, table='name', connection=connection)

Reading arbitrary clickhouse query to pandas

from pandahouse.core import read_clickhouse

df = read_clickhouse('SELECT * FROM {db}.table', index_col='id',
                     connection=connection)

clickhouse2pandas

github：https://github.com/lee19840806/clickhouse2pandas

安装

pip install clickhouse2pandas

使用

import clickhouse2pandas as ch2pd

connection_url = 'http://user:password@clickhouse_host:8123'

query = 'select * from system.numbers limit 1000000'

df = ch2pd.select(connection_url, query)
# df is a pandas dataframe converted from ClickHouse query result

API Reference

clickhouse2pandas.select(connection_url, query = None, convert_to = 'DataFrame', settings = None)

其它阅读

ClickHouse表引擎到底怎么选：

https://www.cdsy.xyz/computer/soft/database/other_database/240827/cd63085.html

clickHouse可视化查询工具：

https://www.cdsy.xyz/computer/soft/database/other_database/240827/cd63086.html

方便获取更多学习、工作、生活信息请关注本站微信公众号 城东书院微信服务号

来顶一下

返回首页

上一篇:python多进程之间共享数据下一篇:python获取两个list交集|并集|差集

使用Python查询任意地	win10安装CUDA和cuDNN
吐血整理\| 全国招投标	高考生入学注意：这些大

首页

学习

工作

生活

兴趣组

电子

计算机

掌上机件

图库

游戏

考试与竞赛

黑板报

国学

外语

下载

故事汇

社区