姜鹏辉的个人博客 GreyNius

【测试】OpenLookeng-Clickhouse连接器TPC性能测试

2021-01-10

原始版本发布在InfoQ上: OpenLookeng 连接器 -Clickhouse connector 性能测试报告

测试背景

OpenLooKeng是一款开源的高性能数据虚拟化引擎,提供统一SQL接口,可以实现对多个数据库的跨源异构和跨域跨DC查询。

ClickHouse是Yandex开源的一个用于实时数据分析的基于列存储的数据库,其工作速度比传统方法快100-1000倍,性能超过了目前市场上的列式存储数据库1

ClickHouse connector2是为OpenLookeng开发的用于访问Clickhouse数据源的连接器,截止测试时最新版本更新至2020-12-24日的b2162c5,已支持对常见数据类型和函数的映射,详细支持列表可见https://gitee.com/heatao/hetu-core/blob/48650aa794c90a871df15661bf77f25225fd09c5/hetu-docs/zh/connector/clickhouse.md

测试目的

使用SSB测试基准提供的数据和SQL查询语句,对OpenLookeng的ClickHouse connector的性能进行评估。

测试环境

系统信息

IP 操作系统 内核版本 文件系统类型
192.168.40.152 CentOS Linux release 7.4.1708 3.10.0-693.el7.x86_64 xfs
192.168.40.223 CentOS Linux release 7.7.1908 3.10.0-1062.12.1.el7.x86_64 xfs

硬件信息

  192.168.40.152 192.168.40.223
CPU Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz Intel(R) Xeon(R) CPU E5-2698 v3 @2.30GHz
内存 128G 128G

部署情况

软件 位置 版本
OpenLookeng 192.168.40.152 1.0.1
Clickhouse 192.168.40.152 20.3.12.112
Clickhouse 192.168.40.223 20.3.12.112

数据集

规模

按照Clickhouse官方文档中提到的Star Schema测试集生成数据3

通过ssb-dbgen工具设置参数101000分别生成了两种规模的测试集,其规模如下,分别称为ssb-10ssb-1000

table size bytes_on_disk data_uncompressed_bytes data_compressed_bytes compress_rate rows
supplier 771.98 KiB 771.98 KiB 1.11 MiB 771.02 KiB 67.9131432 20000
part 13.79 MiB 13.79 MiB 19.59 MiB 13.76 MiB 70.2197571 800000
customer 11.50 MiB 11.50 MiB 16.89 MiB 11.49 MiB 67.9983266 300000
lineorder 1.65 GiB 1.65 GiB 2.40 GiB 1.64 GiB 68.34114 59986052
lineorder_flat 5.19 GiB 5.19 GiB 9.70 GiB 5.18 GiB 53.3959248 59986052

ssb-10的数据占用空间为5.18G,包含59986052条数据

table size bytes_on_disk data_uncompressed_bytes data_compressed_bytes compress_rate rows
supplier 75.39 MiB 75.39 MiB 110.64 MiB 75.33 MiB 68.08099618 2000000
part 34.47 MiB 34.47 MiB 48.97 MiB 34.39 MiB 70.22835711 2000000
customer 1.12 GiB 1.12 GiB 1.65 GiB 1.12 GiB 67.94630597 30000000
lineorder 123.00 GiB 123.00 GiB 176.19 GiB 122.76 GiB 69.67630518 4398761522
lineorder_flat 396.40 GiB 396.40 GiB 711.53 GiB 395.76 GiB 55.62024948 4398761522

ssb-1000占用的空间为395.76G,数据量为4398761522

测试语句

SSB测试语句总共包含13条测试语句,如下

Q1.1: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYear(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;

Q1.2: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toYYYYMM(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;

Q1.3: SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE toISOWeek(LO_ORDERDATE) = 6 AND toYear(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;

Q2.1: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_CATEGORY = ‘MFGR#12’ AND S_REGION = ‘AMERICA’ GROUP BY year P_BRAND ORDER BY year P_BRAND;

Q2.2: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND >= ‘MFGR#2221’ AND P_BRAND <= ‘MFGR#2228’ AND S_REGION = ‘ASIA’ GROUP BY year P_BRAND ORDER BY year P_BRAND;

Q2.3: SELECT sum(LO_REVENUE) toYear(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND = ‘MFGR#2239’ AND S_REGION = ‘EUROPE’ GROUP BY year P_BRAND ORDER BY year P_BRAND;

Q3.1: SELECT C_NATION S_NATION toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = ‘ASIA’ AND S_REGION = ‘ASIA’ AND year >= 1992 AND year <= 1997 GROUP BY C_NATION S_NATION year ORDER BY year ASC revenue DESC;

Q3.2: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = ‘UNITED STATES’ AND S_NATION = ‘UNITED STATES’ AND year >= 1992 AND year <= 1997 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;

Q3.3: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = ‘UNITED KI1’ OR C_CITY = ‘UNITED KI5’) AND (S_CITY = ‘UNITED KI1’ OR S_CITY = ‘UNITED KI5’) AND year >= 1992 AND year <= 1997 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;

Q3.4: SELECT C_CITY S_CITY toYear(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = ‘UNITED KI1’ OR C_CITY = ‘UNITED KI5’) AND (S_CITY = ‘UNITED KI1’ OR S_CITY = ‘UNITED KI5’) AND toYYYYMM(LO_ORDERDATE) = 199712 GROUP BY C_CITY S_CITY year ORDER BY year ASC revenue DESC;

Q4.1: SELECT toYear(LO_ORDERDATE) AS year C_NATION sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = ‘AMERICA’ AND S_REGION = ‘AMERICA’ AND (P_MFGR = ‘MFGR#1’ OR P_MFGR = ‘MFGR#2’) GROUP BY year C_NATION ORDER BY year ASC C_NATION ASC;

Q4.2: SELECT toYear(LO_ORDERDATE) AS year S_NATION P_CATEGORY sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = ‘AMERICA’ AND S_REGION = ‘AMERICA’ AND (year = 1997 OR year = 1998) AND (P_MFGR = ‘MFGR#1’ OR P_MFGR = ‘MFGR#2’) GROUP BY year S_NATION P_CATEGORY ORDER BY year ASC S_NATION ASC P_CATEGORY ASC;

Q4.3: SELECT toYear(LO_ORDERDATE) AS year S_CITY P_BRAND sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = ‘UNITED STATES’ AND (year = 1997 OR year = 1998) AND P_CATEGORY = ‘MFGR#14’ GROUP BY year S_CITY P_BRAND ORDER BY year ASC S_CITY ASC P_BRAND ASC;

根据OpenLookeng的语法,在功能不变的情况下对Clickhouse语法的sql语句进行一定的修改,主要为一下几点

  • toYear()函数统一修改为year()
  • toISOWeek()函数修改为week()
  • toYYYYMM(A)替换为 year(A)*100+month(A)
  • 将语法为select expr(A) as B group by B 等修改为select expr(A) as B group by expr(A),即去掉对AS的使用

修改后的语法为

Q1.1 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE year(LO_ORDERDATE) = 1993 AND LO_DISCOUNT BETWEEN 1 AND 3 AND LO_QUANTITY < 25;

Q1.2 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE year(LO_ORDERDATE)*100+month(LO_ORDERDATE) = 199401 AND LO_DISCOUNT BETWEEN 4 AND 6 AND LO_QUANTITY BETWEEN 26 AND 35;

Q1.3 : SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue FROM lineorder_flat WHERE week(LO_ORDERDATE) = 6 AND year(LO_ORDERDATE) = 1994 AND LO_DISCOUNT BETWEEN 5 AND 7 AND LO_QUANTITY BETWEEN 26 AND 35;

Q2.1 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_CATEGORY = ‘MFGR#12’ AND S_REGION = ‘AMERICA’ GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;

Q2.2 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND >= ‘MFGR#2221’ AND P_BRAND <= ‘MFGR#2228’ AND S_REGION = ‘ASIA’ GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;

Q2.3 : SELECT sum(LO_REVENUE) year(LO_ORDERDATE) AS year P_BRAND FROM lineorder_flat WHERE P_BRAND = ‘MFGR#2239’ AND S_REGION = ‘EUROPE’ GROUP BY year(LO_ORDERDATE) P_BRAND ORDER BY year(LO_ORDERDATE) P_BRAND;

Q3.1 : SELECT C_NATION S_NATION year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_REGION = ‘ASIA’ AND S_REGION = ‘ASIA’ AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_NATION S_NATION year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q3.2 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE C_NATION = ‘UNITED STATES’ AND S_NATION = ‘UNITED STATES’ AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q3.3 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = ‘UNITED KI1’ OR C_CITY = ‘UNITED KI5’) AND (S_CITY = ‘UNITED KI1’ OR S_CITY = ‘UNITED KI5’) AND year(LO_ORDERDATE) >= 1992 AND year(LO_ORDERDATE) <= 1997 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q3.4 : SELECT C_CITY S_CITY year(LO_ORDERDATE) AS year sum(LO_REVENUE) AS revenue FROM lineorder_flat WHERE (C_CITY = ‘UNITED KI1’ OR C_CITY = ‘UNITED KI5’) AND (S_CITY = ‘UNITED KI1’ OR S_CITY = ‘UNITED KI5’) AND year(LO_ORDERDATE)*100+month(LO_ORDERDATE) = 199712 GROUP BY C_CITY S_CITY year(LO_ORDERDATE) ORDER BY year(LO_ORDERDATE) ASC revenue DESC;

Q4.1 : SELECT year(LO_ORDERDATE) AS year C_NATION sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = ‘AMERICA’ AND S_REGION = ‘AMERICA’ AND (P_MFGR = ‘MFGR#1’ OR P_MFGR = ‘MFGR#2’) GROUP BY year(LO_ORDERDATE) C_NATION ORDER BY year(LO_ORDERDATE) ASC C_NATION ASC;

Q4.2 : SELECT year(LO_ORDERDATE) AS year S_NATION P_CATEGORY sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE C_REGION = ‘AMERICA’ AND S_REGION = ‘AMERICA’ AND (year(LO_ORDERDATE) = 1997 OR year(LO_ORDERDATE) = 1998) AND (P_MFGR = ‘MFGR#1’ OR P_MFGR = ‘MFGR#2’) GROUP BY year(LO_ORDERDATE) S_NATION P_CATEGORY ORDER BY year(LO_ORDERDATE) ASC S_NATION ASC P_CATEGORY ASC;

Q4.3 : SELECT year(LO_ORDERDATE) AS year S_CITY P_BRAND sum(LO_REVENUE - LO_SUPPLYCOST) AS profit FROM lineorder_flat WHERE S_NATION = ‘UNITED STATES’ AND (year(LO_ORDERDATE) = 1997 OR year(LO_ORDERDATE) = 1998) AND P_CATEGORY = ‘MFGR#14’ GROUP BY year(LO_ORDERDATE) S_CITY P_BRAND ORDER BY year(LO_ORDERDATE) ASC S_CITY ASC P_BRAND ASC;

测试方法

ssb-10数据到192.168.40.152节点上,将ssb-1000数据导入到192.168.40.223节点上

使用python脚本分别在Clickhouse和OpenLookeng上执行脚本,每条SQL语句执行间隔为10秒,多次执行取平均值

其中Clickhouse的驱动使用的是开源库clickhouse-driver4,OpenLookeng的驱动来源于对web端API的调用,代码已发布于github5

测试结果

ssb-1000

在OpenLookeng上的执行结果

- Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3 总计
1 16.480s 2.610s 0.636s 242.400s 207.000s 204.600s 373.800 246.600s 166.800s 1.930s 346.800s 85.200s 29.890s 1924.746s
2 15.990s 2.440s 0.689s 226.200s 191.400s 189.000s 372.000s 226.800s 164.400s 2.160s 351.000s 85.200s 18.390s 1845.669s
3 16.270s 2.520s 0.621s 231.000s 199.200s 193.800s 369.600s 241.800s 169.800s 2.450s 346.800s 74.400s 27.110s 1875.371s

在Clickhouse上的执行结果

- Q1.1 Q1.2 Q1.3 Q2.1 Q2.2 Q2.3 Q3.1 Q3.2 Q3.3 Q3.4 Q4.1 Q4.2 Q4.3 Clickhouse total_use
1 14.036s 1.491s 0.388s 119.684s 81.551s 74.901s 103.358s 111.153s 67.420s 0.564s 167.159s 51.108s 60.336s 853.149s
2 16.413s 2.771s 0.399s 111.475s 74.204s 72.538s 98.875s 108.742s 61.698s 0.539s 149.521s 48.743s 49.649s 795.568
3 17.785s 1.591s 0.524s 117.368s 79.059s 70.913s 100.280s 103.644s 57.612s 0.255s 154.977s 43.169s 16.046s 763.222s

结果分析

ssb-1000

序号 Clickhouse OpenLookeng 差异 百分比
Q1.1 16.08 16.25 -0.17 -1.05%
Q1.2 1.95 2.52 -0.57 -29.34%
Q1.3 0.44 0.65 -0.21 -48.44%
Q2.1 116.18 233.20 -117.02 -100.73%
Q2.2 78.27 199.20 -120.93 -154.50%
Q2.3 72.78 195.80 -123.02 -169.02%
Q3.1 100.84 371.80 -270.96 -268.71%
Q3.2 107.85 238.40 -130.55 -121.06%
Q3.3 62.24 167.00 -104.76 -168.30%
Q3.4 0.45 2.18 -1.73 -381.59%
Q4.1 157.22 348.20 -190.98 -121.47%
Q4.2 47.67 81.60 -33.93 -71.16%
Q4.3 42.01 25.13 16.88 40.18%
总计 803.98 1881.93 -1077.95 -134.08%

image-20210120002928398

执行情况分析

根据查询clickhouse中system.query_log表,对后台实际执行的语句进行了分析,发现每在OpenLookeng执行一条sql语句,会在clickhouse中产生2-3条相关查询(忽略获取表时区的查询)

例如Q3.3会对应三次查询,第一次和第三次功能几乎一致,分别如下

--- 第一次
(SELECT *
FROM (
	SELECT LO_ORDERDATE AS lo_orderdate, LO_REVENUE AS lo_revenue, C_CITY AS c_city, S_CITY AS s_city
	FROM ssb.lineorder_flat
)
WHERE (c_city = CAST('UNITED KI1' AS varchar)
		OR c_city = CAST('UNITED KI5' AS varchar))
	AND (s_city = CAST('UNITED KI1' AS varchar)
		OR s_city = CAST('UNITED KI5' AS varchar))
	AND (toYear(lo_orderdate) >= 1992
		AND toYear(lo_orderdate) <= 1997))
FORMAT TabSeparatedWithNamesAndTypes;


--- 第二次
(SELECT year, c_city, s_city, CAST(sum(lo_revenue) AS bigint) AS sum
FROM (
	SELECT c_city, s_city, toYear(lo_orderdate) AS year, lo_revenue
	FROM (
		SELECT *
		FROM (
			SELECT LO_ORDERDATE AS lo_orderdate, LO_REVENUE AS lo_revenue, C_CITY AS c_city, S_CITY AS s_city
			FROM ssb.lineorder_flat
		)
		WHERE (c_city = CAST('UNITED KI1' AS varchar)
				OR c_city = CAST('UNITED KI5' AS varchar))
			AND (s_city = CAST('UNITED KI1' AS varchar)
				OR s_city = CAST('UNITED KI5' AS varchar))
			AND (toYear(lo_orderdate) >= 1992
				AND toYear(lo_orderdate) <= 1997)
	)
)
GROUP BY year, c_city, s_city)
FORMAT TabSeparatedWithNamesAndTypes;

--- 第三次
SELECT lo_orderdate, lo_revenue, c_city, s_city
FROM (
	SELECT *
	FROM (
		SELECT LO_ORDERDATE AS lo_orderdate, LO_REVENUE AS lo_revenue, C_CITY AS c_city, S_CITY AS s_city
		FROM ssb.lineorder_flat
	)
	WHERE (c_city = CAST('UNITED KI1' AS varchar)
			OR c_city = CAST('UNITED KI5' AS varchar))
		AND (s_city = CAST('UNITED KI1' AS varchar)
			OR s_city = CAST('UNITED KI5' AS varchar))
		AND (toYear(lo_orderdate) >= 1992
			AND toYear(lo_orderdate) <= 1997)
)
FORMAT TabSeparatedWithNamesAndTypes;

具体详情见OpenLookeng-clickhouse connector测试

结论

在千万量级的数据下执行sql查询,OpenLookeng相对Clickhouse的查询时间平均慢193.07%,最好情况下慢85%,最差情况下慢778%。

在十亿量级下执行sql查询,平均慢134.08%,最好的情况优于clickhouse 40%,最差的情况比clickhouse慢381.59%

  1. https://clickhouse.tech/benchmark/dbms/ 

  2. https://gitee.com/heatao/hetu-core 

  3. https://github.com/vadimtk/ssb-dbgen 

  4. https://github.com/mymarilyn/clickhouse-driver 

  5. https://github.com/jiangph1001/OpenLookeng-driver 


Comments

Content