安装过程(以CentOS系统为例)
一. 环境要求
几乎所有传统X86设备都支持,只需要确认二点
linux操作系统
cpu支持SSE4.2 指令集
确定cpu是否支持的方法:
grep -q sse4_2 /proc/cpuinfo && echo "SSE 4.2 supported" || echo "SSE 4.2 not supported" #得到以下输出结果 SSE 4.2 supported
即使CPU不支持SSE指令集,官方也提供了特定版本的源码包,不过需要自行编译安装,过程较为复杂。
二. 系统参数调整
- 时间修正
yes | cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
- limit调大
vi /etc/security/limits.conf #结尾增加如下配置 * soft nproc 1048576 * hard nproc 1048576 * soft nofile 1048576 * hard nofile 1048576
- 防火墙关闭(可选,不关闭需要设置进出规则)
systemctl stop firewalld.service
- SELinux关闭(可选,不关闭需设置相应规则)
# vi /etc/selinux/config SELINUX=disabled
集群设备免密登陆
ssh-keygen生成密钥即可
三. 安装方式
rpm包安装步骤
安装包下载,地址2选1即可
【首选,官方维护】https://repo.yandex.ru/clickhouse/rpm/stable/x86_64/
包很多,选择需要的版本中的4个主要包即可
wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-common-20.3.12.112-1.el7.x86_64.rpm/download.rpm wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-server-20.3.12.112-1.el7.x86_64.rpm/download.rpm wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-common-static-20.3.12.112-1.el7.x86_64.rpm/download.rpm wget --content-disposition https://packagecloud.io/Altinity/clickhouse/packages/el/7/clickhouse-client-20.3.12.112-1.el7.x86_64.rpm/download.rpm- rpm -ivh ./*.rpm执行安装
编译安装(步骤较多较复杂,暂不展开)
- 下载源码包
- 编译环境准备
- 执行编译
四. 重要文件结构
核心目录:
- /etc/clickhouse-server:服务端的配置文件目录,包括全局配置config.xml和用户配置users.xml等
- /var/lib/clickhouse:默认的数据存储目录(通常会修改默认路径配置,将数据保存到大容量磁盘挂载的路径)
- /var/log/clickhouse-server:默认保存日志的目录(通常会修改路径配置,将日志保存到大容量磁盘挂载的路径)
程序引擎配置文件:
- /etc/security/limits.d/clickhouse.conf:文件句柄数量的配置
- /etc/cron.d/clickhouseserver:cron定时任务配置,用于恢复因异常原因中断的ClickHouse服务进程
程序执行文件:
- /usr/bin/clickhouse:主程序的可执行文件
- /usr/bin/clickhouse-client:一个指向ClickHouse可执行文件的软链接,供客户端连接使用
- /usr/bin/clickhouse-server:一个指向ClickHouse可执行文件的软链接,供服务端启动使用
- /usr/bin/clickhouse-compressor:内置提供的压缩工具,可用于数据的正压反解
五. 工互ClickHouse集群核心配置文件详解
- /etc/clickhouse-server/config.xml
<?xml version="1.0"?>
<yandex>
<!--本机节点信息-->
<listen_host>SDGJ-GYHLW-SLC17F-B09-clickhouse-01</listen_host>
<listen_host>::1</listen_host>
<listen_host>127.0.0.1</listen_host>
<include_from>/etc/clickhouse-server/metrika.xml</include_from>
<remote_url_allow_hosts></remote_url_allow_hosts>
<!--日志-->
<logger>
<level>trace</level>
<log>/data1/clickhouse-server/clickhouse-server.log</log>
<errorlog>/data1/clickhouse-server/clickhouse-server.err.log</errorlog>
<size>1000M</size>
<count>10</count>
</logger>
<!--端口-->
<http_port>8123</http_port>
<tcp_port>9000</tcp_port>
<!--SSL-->
<openSSL>
<server>
<certificateFile>/etc/clickhouse-server/server.crt</certificateFile>
<privateKeyFile>/etc/clickhouse-server/server.key</privateKeyFile>
<dhParamsFile>/etc/clickhouse-server/dhparam.pem</dhParamsFile>
<verificationMode>none</verificationMode>
<loadDefaultCAFile>true</loadDefaultCAFile>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
</server>
<client>
<loadDefaultCAFile>true</loadDefaultCAFile>
<cacheSessions>true</cacheSessions>
<disableProtocols>sslv2,sslv3</disableProtocols>
<preferServerCiphers>true</preferServerCiphers>
<invalidCertificateHandler>
<name>RejectCertificateHandler</name>
</invalidCertificateHandler>
</client>
</openSSL>
<!--端口-->
<interserver_http_port>9009</interserver_http_port>
<!--参数优化/存储路径-->
<max_connections>4096</max_connections>
<keep_alive_timeout>600</keep_alive_timeout>
<max_concurrent_queries>150</max_concurrent_queries>
<uncompressed_cache_size>8589934592</uncompressed_cache_size>
<mark_cache_size>5368709120</mark_cache_size>
<path>/data1/clickhouse-server/lib/clickhouse/</path>
<tmp_path>/data1/clickhouse-server/lib/clickhouse/tmp/</tmp_path>
<user_files_path>/data1/clickhouse-server/lib/clickhouse/user_files/</user_files_path>
<users_config>users.xml</users_config>
<default_profile>default</default_profile>
<default_database>default</default_database>
<mlock_executable>false</mlock_executable>
<remote_servers incl="clickhouse_remote_servers" ></remote_servers>
<zookeeper incl="zookeeper-servers" optional="true" />
<macros incl="macros" optional="true" />
<builtin_dictionaries_reload_interval>3600</builtin_dictionaries_reload_interval>
<max_session_timeout>3600</max_session_timeout>
<default_session_timeout>60</default_session_timeout>
<!--数据库名称-->
<query_log>
<database>system</database>
<table>query_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_log>
<trace_log>
<database>system</database>
<table>trace_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</trace_log>
<query_thread_log>
<database>system</database>
<table>query_thread_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
</query_thread_log>
<dictionaries_config>*_dictionary.xml</dictionaries_config>
<compression incl="clickhouse_compression"></compression>
<distributed_ddl>
<path>/clickhouse/task_queue/ddl</path>
</distributed_ddl>
<graphite_rollup_example>
<pattern>
<regexp>click_cost</regexp>
<function>any</function>
<retention>
<age>0</age>
<precision>3600</precision>
</retention>
<retention>
<age>86400</age>
<precision>60</precision>
</retention>
</pattern>
<default>
<function>max</function>
<retention>
<age>0</age>
<precision>60</precision>
</retention>
<retention>
<age>3600</age>
<precision>300</precision>
</retention>
<retention>
<age>86400</age>
<precision>3600</precision>
</retention>
</default>
</graphite_rollup_example>
<format_schema_path>/data1/clickhouse-server/lib/clickhouse/format_schemas/</format_schema_path>
</yandex>
- /etc/clickhouse-server/users.xml
<?xml version="1.0"?>
<yandex>
<!-- 读写用户设置 -->
<profiles>
<default>
<max_bytes_before_external_group_by>220000000000</max_bytes_before_external_group_by>
<max_memory_usage>220000000000</max_memory_usage>
<max_memory_usage_for_all_queries>220000000000</max_memory_usage_for_all_queries>
<max_execution_time>600</max_execution_time>
<distributed_product_mode>local</distributed_product_mode>
<background_pool_size>48</background_pool_size>
<use_uncompressed_cache>0</use_uncompressed_cache>
<load_balancing>random</load_balancing>
<log_queries>1</log_queries>
<skip_unavailable_shards>1</skip_unavailable_shards>
</default>
<readonly>
<readonly>1</readonly>
</readonly>
</profiles>
<!-- 用户和访问控制 -->
<users>
<default>
<password></password>
<networks incl="networks" replace="replace">
<ip>::/0</ip>
</networks>
<profile>default</profile>
<quota>default</quota>
</default>
</users>
<!-- 用户和配额 -->
<quotas>
<default>
<interval>
<duration>3600</duration>
<queries>0</queries>
<errors>0</errors>
<result_rows>0</result_rows>
<read_rows>0</read_rows>
<execution_time>0</execution_time>
</interval>
</default>
</quotas>
</yandex>
- /etc/clickhouse-server/metrika.xml
<yandex>
<!-- 集群配置 -->
<clickhouse_remote_servers>
<!--eversec集群数据分片和副本;以下配置1个副本2个分片 -->
<eversec_shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-B09-clickhouse-01</host>
<port>9000</port>
</replica>
<replica>
<host>SDGJ-GYHLW-SLC17F-B09-clickhouse-02</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-03</host>
<port>9000</port>
</replica>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-04</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-05</host>
<port>9000</port>
</replica>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-06</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-07</host>
<port>9000</port>
</replica>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-08</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-09</host>
<port>9000</port>
</replica>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-10</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-11</host>
<port>9000</port>
</replica>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-12</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-13</host>
<port>9000</port>
</replica>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-14</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-15</host>
<port>9000</port>
</replica>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-16</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-17</host>
<port>9000</port>
</replica>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-18</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-19</host>
<port>9000</port>
</replica>
</shard>
</eversec_shard>
<!--eversec_shard_none_rep集群数据只分片;无副本 -->
<eversec_shard_none_rep>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-B09-clickhouse-01</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-B09-clickhouse-02</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-03</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-04</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-05</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-06</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-07</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-08</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-09</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-10</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-11</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-12</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-13</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-14</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-15</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-16</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-17</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-18</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>SDGJ-GYHLW-SLC17F-A08-clickhouse-19</host>
<port>9000</port>
</replica>
</shard>
</eversec_shard_none_rep>
</clickhouse_remote_servers>
<!-- 本节点服务器的副本名称 -->
<macros>
<cluster>eversec_shard</cluster>
<shard>1</shard>
<replica>SDGJ-GYHLW-SLC17F-B09-clickhouse-01</replica>
</macros>
<!-- ZK配置信息 -->
<zookeeper-servers>
<node index="1">
<host>SDGJ-GYHLW-SLC17F-B09-clickhouse-01</host>
<port>2181</port>
</node>
<node index="2">
<host>SDGJ-GYHLW-SLC17F-B09-clickhouse-02</host>
<port>2181</port>
</node>
<node index="3">
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-03</host>
<port>2181</port>
</node>
<node index="4">
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-04</host>
<port>2181</port>
</node>
<node index="5">
<host>SDGJ-GYHLW-SLC17F-A09-clickhouse-05</host>
<port>2181</port>
</node>
</zookeeper-servers>
<!-- 数据压缩算法 -->
<clickhouse_compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>zstd</method>
</case>
</clickhouse_compression>
</yandex>六. 客户端访问接口
1. CLI
即命令行接口,其底层是基于TCP接口进行通信的,是通过clickhouseclient脚本运行的。它拥有两种执行模式。
交互式执行
交互式执行可以广泛用于调试、运维、开发和测试等场景,它的使用方法是直接运行clickhouse-client进行登录,然后便可以执行一问一答的交互查询,具体如下所示:
clickhouse-client ClickHouse client version 20.4.6.53 (official build). Connecting to localhost:9000 as user default. Connected to ClickHouse server version 20.4.6 revision 54434. localhost :) select 1 SELECT 1 ┌─1─┐ │ 1 │ └───┘ 1 rows in set. Elapsed: 0.040 sec.
通过交互式执行的SQL语句,相关查询结果会统一被记录到~/.clickhouse-client-history文件,该记录可以作为审计之用。执行记录的顺序是由下至上,从新到旧。
非交互式执行
非交互式模式主要用于批处理场景,诸如对数据的导入和导出等操作。在执行脚本命令时,需要追加query参数指定执行的SQL语句。在导入数据时,它可以接收操作系统的stdin标准输入作为写入的数据源。
cat /chbase/test_fetch.tsv | clickhouse-client --query "INSERT INTO test_fetch FORM ATTSV"
cat命令读取的文件流,将会作为INSERT查询的数据输入。
而在数据导出时,则可以将输出流重定向到文件:
clickhouse-client --query="SELECT * FROM test_fetch" > /chbase/test_fetch.tsv
在默认的情况下,clickhouseclient一次只能运行一条SQL语句,如果需要执行多次查询,则需要在循环中重复执行,这显然不是一种高效的方式。此时可以追加multiquery参数,它可以支持一次运行多条SQL查询,多条查询语句之间使用分号间隔,例如:
clickhouse-client -h 10.37.129.10 --multiquery --query="SELECT 1;SELECT 2;SELECT 3;"
- 一些重要参数
- –host/-h:服务端的地址,默认值为localhost
- –port:服务端的TCP端口,默认值为9000
- –user/-u:登录的用户名,默认值为default
- –password:登录的密码,默认值为空。如果在用户定义中未设置密码,则不需要填写(例如默认的default用户)
- –database/-d:登录的数据库,默认值为default
- –query/-q:只能在非交互式查询时使用,用于指定SQL语句
- –multiquery/-n:在非交互式执行时,允许一次运行多条SQL语句,多条语句之间以分号间隔。
- –time/-t:在非交互式执行时,会打印每条SQL的执行时间
2. JDBC
ClickHouse支持标准的JDBC协议,底层基于HTTP接口通信。使用下面的Maven依赖,即可为Java程序引入官方提供的数据库驱动包:
<dependency> <groupId>ru.yandex.clickhouse</groupId> <artifactId>clickhousejdbc</artifactId> <version>0.2.4</version> </dependency>
该驱动有两种使用方式
标准形式:
标准形式是我们常用的方式,通过JDK原生接口获取连接,其关键参数如下:
#JDBC Driver Class为 ru.yandex.clickhouse.ClickHouseDriver; #JDBC URL为 jdbc:clickhouse://<host>:<port>[/<database>]
高可用模式:
高可用模式允许设置多个host地址,每次会从可用的地址中随机选择一个进行连接,其URL声明格式如下:
jdbc:clickhouse://<firsthost>:<port>,<secondhost>:<port>[,…]/<database>
3. 其它
除了上面介绍的CLI(基于TCP)和JDBC(基于HTTP)。ClickHouse的访问接口还拥有原生的C++、ODBC接口及众多第三方的集成接口(Python、NodeJS、Go、PHP等),不做依依介绍了,根据使用语言自行查阅。
七. 内置的实用工具
1. clickhouse-local
clickhouse-local可以独立运行大部分SQL查询,不需要依赖任何ClickHouse的服务端程序,它可以理解成是ClickHouse服务的单机版微内核,是一个轻量级的应用程序。clickhouse-local只能够使用File表引擎(关于表引擎的更多介绍在后续章节展开),它的数据与同机运行的ClickHouse服务也是完全隔离的,相互之间并不能访问。clickhouse-local是非交互式运行的,每次执行都需要指定数据来源,例如通过stdin标准输入,以echo打印作为数据来源。
用的比较少,不做展开,这里近展示一个实例,借助操作系统的命令,实现对系统用户内存用量的查询:
ps aux | tail -n +2 | awk '{ printf("%s\t%s\n",$1,$4) }' | clickhouse-local -S "user String, memory Float64" -q "SELECT user,round(sum(memory),2) as memoryTotal FROM table GROUP BY user ORDER BY memoryTotal DESC FORMAT Pretty"2. clickhouse-benchmark
clickhousebenchmark是基准测试的小工具,它可以自动运行SQL查询,并生成相应的运行指标报告
echo "SELECT * FROM system.numbers LIMIT 100" | clickhouse-benchmark -i 5
查询执行5次,输出报告
0 条评论