Druid 是一个开源的,分布式的,列存储的,适用于实时数据分析的存储系统,能够快速聚合、灵活过滤、毫秒级查询、和低延迟数据导入。

Installation

​ 由于Druid提供了Rest API,所以不需要任何安装。对于想要了解Druid Rest API的用户请查阅 官方文档

Configuration

Druid接入XSQL的配置继承了Configurations中针对特定数据源的type、version、schemas配置。Druid接入XSQL还有一些特有的配置,下表将对他们进行介绍。

Property Name Default Meaning
spark.xsql.datasource.$dataSource.coordinator.uri None 配置coordinator uri

这里给出一个Druid接入XSQL的配置示例:

xsql.conf文件:
spark.xsql.datasource.mydruid.type   DRUID
spark.xsql.datasource.mydruid.uri    http://druidhostname:8082
spark.xsql.datasource.mydruid.coordinator.uri    druidhostname:8081
spark.xsql.datasource.mydruid.user   xxxx
spark.xsql.datasource.mydruid.password   xxxx
spark.xsql.datasource.mydruid.version   0.10.1
spark.xsql.datasource.mydruid.whitelist druid-whitelist.conf

druid-whitelist.conf文件:
数据库名称必须为:druid
lineitem10,xsql为Druid的DataSource
{
  "druid": {
    "includes": [
        "lineitem10",
        "xsql"
    ],
    "excludes": []
   }
}

Execution

1、Select Queries

select * from xx where __time >= '' and __time <= '' and xxx or xxx 

Examples

> select * 
> from mydruid.druid.lineitem10 
> where __time >='1991-01-01' and __time <'1999-01-01' 
> and l_commitdate='1997-02-08'
> limit 10;
结果:
1996-11-11T00:00:00.000+08:00   1       ic deposits. quickly    1997-02-08      0.07999999821186066     59096.3984375   3       O       36001798        1222868    33      1996-11-23      N       COLLECT COD     TRUCK   72893   0.07000000029802322     NULL
1996-11-11T00:00:00.000+08:00   1       ns. requests cajole blith       1997-02-08      0.05000000074505806     28436.0703125   2       O       43300226  57715    17      1996-11-12      N       COLLECT COD     MAIL    57716   0.05999999865889549     NULL
1996-11-11T00:00:00.000+08:00   1       lar sentiments. quickly pen     1997-02-08      0.07999999821186066     5510.1298828125 4       O       45456578  979757   3       1996-11-18      N       NONE    SHIP    29776   0.03999999910593033     NULL
1996-11-11T00:00:00.000+08:00   1        serve carefully along th       1997-02-08      0.019999999552965164    45402.4609375   2       O       47689120  455032   46      1996-11-16      N       COLLECT COD     MAIL    80037   0.07999999821186066     NULL
1996-11-11T00:00:00.000+08:00   1       ar requests wa  1997-02-08      0.03999999910593033     40127.4296875   1       O       47689120        934876  211996-12-05       N       NONE    AIR     34877   0.019999999552965164    NULL
1996-11-11T00:00:00.000+08:00   1        never final accounts wake carefully qu 1997-02-08      0.0     3999.659912109375       4       O       58656996  971262   3       1996-11-12      N       NONE    RAIL    46290   0.05000000074505806     NULL
1996-11-12T00:00:00.000+08:00   1       al requests wake. blithely unusual dep  1997-02-08      0.07999999821186066     68074.078125    4       O       31170243   1046893 37      1996-11-23      N       DELIVER IN PERSON       MAIL    46894   0.029999999329447746    NULL
1996-11-12T00:00:00.000+08:00   1       uctions affix carefully! furio  1997-02-08      0.029999999329447746    21567.69921875  2       O       35882630  1417623  14      1996-12-01      N       COLLECT COD     AIR     17624   0.029999999329447746    NULL
1996-11-12T00:00:00.000+08:00   1       carefully alo   1997-02-08      0.03999999910593033     9488.580078125  4       O       44366240        1192489 6 1996-11-20       N       TAKE BACK RETURN        FOB     42512   0.05000000074505806     NULL
1996-11-12T00:00:00.000+08:00   1       tes. fluffily ironic deposits   1997-02-08      0.05999999865889549     54193.6015625   1       O       46005638  1758635  32      1996-12-02      N       COLLECT COD     RAIL    33687   0.05999999865889549     NULL

实现方式 等效查询
xsql select * from mydruid.druid.lineitem10 limit 3;
druid json {"queryType":"select","dataSource":"lineitem10","granularity":"all"...}
1、__time:必写项 转换成Druid intervals
2、granularity:"all", "none", "second","minute", "fifteen_minute", "thirty_minute","hour", "day", "week", "month", "quarter", "year" 默认"all"
3、where 后面的条件除了__time,granularity 其他条件转化Druid的filter </font>

2、GroupBy Queries

select l_shipinstruct,sum(l_quantity),sum(l_discount) 
from mydruid.druid.lineitem10 
where __time >='1991-01-01' and __time <'1999-01-03' and l_commitdate='1997-02-08' 
group by l_shipinstruct limit 10;

Examples

> select l_shipinstruct,sum(l_quantity) as l_quantity,sum(l_discount) 
> from mydruid.druid.lineitem10 
> where __time >='1991-01-01' and __time <'1999-01-03' and l_commitdate='1997-02-08' 
> group by l_shipinstruct 
> limit 10;
结果:
COLLECT COD         158972  311.4700002372265
DELIVER IN PERSON   158778  305.7599999830127
NONE                160785  313.9799999296665
TAKE BACK RETURN    160337  314.15999987721443
--也支持granularity时间粒度的查询
select l_shipinstruct,sum(l_quantity) as l_quantity,sum(l_discount),count(__time) 
from mydruid.druid.lineitem10 
where __time >='1991-01-01' and __time <'1999-01-01' and granularity = 'year' 
group by l_shipinstruct order by l_quantity asc limit 10;
结果:
DELIVER IN PERSON       48373277        94782.64012855478       1895995
TAKE BACK RETURN        48412356        94971.76013177447       1899524
NONE    48446820        94990.60012911633       1899514
COLLECT COD     48472785        95076.87012936734       1900477
TAKE BACK RETURN        58044094        113911.56015220843      2277249
DELIVER IN PERSON       58048477        113781.76015352085      2276162
带有granularity时间粒度时,如果没有group by __time 不显示时间,也可以如下操作
该操作其实是druid TopN query
带时间的group by
select __time,l_shipinstruct,sum(l_quantity) as l_quantity,sum(l_discount),count(__time) 
from mydruid.druid.lineitem10 
where __time >='1991-01-01' and __time <'1999-01-01' and granularity = 'year' 
group by l_shipinstruct,__time order by l_quantity asc limit 10;
结果:
1992-01-01T00:00:00.000+08:00   DELIVER IN PERSON       48373277        94782.63984715939       1895995
1992-01-01T00:00:00.000+08:00   TAKE BACK RETURN        48412356        94971.75982236862       1899524
1992-01-01T00:00:00.000+08:00   NONE    48446820        94990.59980535507       1899514
1992-01-01T00:00:00.000+08:00   COLLECT COD     48472785        95076.869743824 1900477
1993-01-01T00:00:00.000+08:00   TAKE BACK RETURN        58044094        113911.56018066406      2277249
1993-01-01T00:00:00.000+08:00   DELIVER IN PERSON       58048477        113781.75991821289      2276162
1993-01-01T00:00:00.000+08:00   COLLECT COD     58057693        113913.849609375        2276807
1993-01-01T00:00:00.000+08:00   NONE    58137066        113969.76986694336      2279423
1994-01-01T00:00:00.000+08:00   NONE    57931103        113608.79022216797      2272749
1994-01-01T00:00:00.000+08:00   TAKE BACK RETURN        58047015        113828.31967163086      2275702
该SQL与上面的SQL不用,该操作是Druid的group by query
实现方式 等效查询
xsql select l_shipinstruct,sum(l_quantity),sum(l_discount)...;
druid json {"queryType":"groupBy","dataSource":"lineitem10"...}
1、支持的聚合函数:sum,max,min,count,count(distinct)
2、count:对应Druid  { "type" : "count", "name" : <output_name> }
3、count(distinct a,b):支持多个字段,会根据a,b数据类型采用不同的算法
    hyperUnique:{ "type":"hyperUnique","name":,"fieldName":<metric_name>}
    cardinality:{"type": "cardinality","name": "xxx","fields": xx,"byRow" : true}
    cardinality:{"type": "cardinality","name": "xxx","fields": xx,"byRow" : true}
    多个字段时转换成 {"type": "cardinality","name": "distinct_people","fields": [ "first_name", "last_name" ],"byRow" : true},cardinality只支持"byRow" : true
4、order by 目前只支持数字类型的asc,desc 字符类型的支持asc (druid中的lexicographic)</font>

3、Timeseries queries

select sum(l_extendedprice), sum(l_discount) 
from mydruid.druid.lineitem10 
where __time >='1991-01-01' and __time <'1999-01-03';

Examples

> select sum(l_extendedprice), sum(l_discount) 
> from mydruid.druid.lineitem10 
> where __time >='1991-01-01' and __time <'1999-01-03';
结果:
2.29381315672004E12     2999373.2440630347
当有聚合函数但是没有group by时,转换成druid timeseries查询
select __time,sum(l_quantity) as l_quantity,sum(l_discount),count(__time)
from mydruid.druid.lineitem10
where __time >='1991-01-01' and __time <'1997-01-04' and  granularity = 'year' 
group by __time limit 10;
实现方式 等效查询
xsql select sum(l_extendedprice), sum(l_discount) where xxx
druid json {"queryType":"timeseries","dataSource":"lineitem10"...}
> select __time,sum(l_quantity) as l_quantity,sum(l_discount),count(__time) 
> from mydruid.druid.lineitem10 
> where __time >='1991-01-01' and __time <'1997-01-04' and  granularity = 'year' 
> group by __time limit 10;
结果:
1992-01-01T00:00:00.000+08:00   193705238       379821.8705188129       7595510
1993-01-01T00:00:00.000+08:00   232287330       455576.94061456993      9109641
1994-01-01T00:00:00.000+08:00   232113470       454986.6906189304       9099165
1995-01-01T00:00:00.000+08:00   232236548       455190.7406086065       9106637
1996-01-01T00:00:00.000+08:00   232654414       456373.2906226516       9123688
1997-01-01T00:00:00.000+08:00   1904679         3737.5100051425397      74869
实现方式 等效查询
xsql select __time,sum(l_quantity) granularity = 'year' group by __time
druid json {"queryType":"timeseries","dataSource":"lineitem10"...}
当group by 只有一个字段且为__time时,也会转换成druid timeseries查询
granularity指的是时间粒度也就是对某个时间段的统计

4、TopN queries

select l_shipinstruct,sum(l_quantity) as l_quantity,sum(l_discount),count(__time)
from mydruid.druid.lineitem10 
where __time >='1991-01-01' and __time <'1999-01-01'
group by l_shipinstruct 
order by l_quantity asc 
limit 10;
实现方式 等效查询
xsql select sum(l_extendedprice), sum(l_discount) group by oder by
druid json {"queryType":"topN","dataSource":"lineitem10"...}
> select l_shipinstruct,sum(l_quantity) as l_quantity,sum(l_discount),count(__time)
> from mydruid.druid.lineitem10 
> where __time >='1991-01-01' and __time <'1999-01-01' 
> group by l_shipinstruct 
> order by l_quantity asc 
> limit 10;

结果:
TAKE BACK RETURN        382377564       749815.1810177844       14995638
COLLECT COD     382410465       749926.601011185        14995241
DELIVER IN PERSON       382437327       749621.3410196826       14994611
NONE    382512680       750010.1210143827       15000562