环境搭建

单机方案

集群方案 3分片1复

方案对比

写入速度对比

数据量 : 26910101 Rows

方案一 : 单机方案( 全量数据插入单机表 )

方案二 : 集群方案( 数据写入物理表,分别并行向3台机器物理表写入数据 )

结论 : 写入数据速度和磁盘IO有关 , 集群方案数据写入相比单机方案有显著优势。

查询对比(主要针对分组查询和关联查询操作)

分布式建表方法

--物理表
CREATE table rd_physical.rd_baseinfo_physical on cluster cluster_3shards_1replicas
(`appId` String,`pvpId` String,`accountId` String,`userName` String,`round` Nullable(Int32),`event` String,`mode` Nullable(Int32),`win` Int32,`country` String,`timeStamp` String,`ts` DateTime,`ds` Date)ENGINE = ReplicatedMergeTree('/clickhouse/rd_physical/tables/{shard}/rd_baseinfo_physical', '{replica}')PARTITION BY (ds)ORDER BY (appId, accountId, pvpId)SETTINGS index_granularity = 8192

--逻辑表
CREATE table rd_data.rd_baseinfo on cluster cluster_3shards_1replicas
(`appId` String,`pvpId` String,`accountId` String,`userName` String,`round` Nullable(Int32),`event` String,`mode` Nullable(Int32),`win` Int32,`country` String,`timeStamp` String,`ts` DateTime,`ds` Date)ENGINE =Distributed(cluster_3shards_1replicas, rd_physical, rd_baseinfo_physical, cityHash64(accountId))

分组查询

SQL语句 : select count(*) , accountId,pvpId from rd.rd_baseinfo where ds>='2019-12-01' and ds<'2020-01-01' group by accountId ,pvpId ;