Elasticsearch 与关系型数据库的对比:

1
2
Relational DB -> Databases -> Tables -> Rows -> Columns
Elasticsearch -> Indices -> Types -> Documents -> Fields

Elasticsearch 当中的 Indices 相当于 Databases,Types 相当于 Tables, Documents 相当于 Rows,Fields 相当于 Columns。

与 Elasticsearch 交互

使用 Java api 与 Elasticsearch 交互的时候有两种方式:

节点客户端(node client)

节点客户端以无数据节点(none data node)身份加入集群,换言之,它自己不存储任何数据,但是它知道数据在集群中的具体位置,并且能够直接转发请求到对应的节点上。

1
2
3
4
5
6
7
8
import static org.elasticsearch.node.NodeBuilder.*;

// on startup
Node node = nodeBuilder().client(true).node();
Client client = node.client();

// on shutdown
node.close();

当你启动一个node,它就加入了elasticsearch集群。你可以通过简单的设置 cluster.name 或者明确地使用 clusterName 方法拥有不同的集群。

一种是在 classpath 下添加 elasticsearch.yml 文件并且指定了 cluster.name: yourclustername,另外一种方式就是如下的 Java 当中指定。

1
2
Node node = nodeBuilder().clusterName("yourclustername").node();
Client client = node.client();

利用Client的好处是,操作可以自动地路由到这些操作被执行的节点,而不需要执行双跳(double hop)。例如,索引操作将会在该操作最终存在的分片上执行。

传输客户端(transport client)

TransportClient 利用 transport 模块远程连接一个 elasticsearch 集群。它并不加入到集群中,只是简单的获得一个或者多个初始化的 transport 地址,并以轮询的方式与这些地址进行通信。

1
2
3
4
5
6
7
8
9
10
11
// on startup
Client client = new TransportClient()
.addTransportAddress(new InetSocketTransportAddress("host1", 9300))
.addTransportAddress(new InetSocketTransportAddress("host2", 9300));

// on shutdown
client.close();

Settings settings = ImmutableSettings.settingsBuilder()
.put("cluster.name", "myClusterName").build();//指定集群的名字
Client client = new TransportClient(settings);

多网段集群

1
2
discovery.zen.ping.multicast.enabled: false             #禁用默认的广播方式
discovery.zen.ping.unicast.hosts: [“10.168.202.3″, “10.168.205.81”] #指定node1和node2的地址

在相同网段的 node 并且集群名字相同的时候可以组成一个集群,如果不在一个网段我们可以如上指定集群机器的地址。

建议指定机器地址配置集群,避免产生无意组成集群的尴尬或者网段变动找不到 node 的尴尬。

Elasticsearch Restful 操作

使用 httpie 来在终端操作 http 请求。

查看所有索引:

1
2
3
4
5
6
http get :9200/_cat/indices

yellow open index 5 1 2 0 7.8kb 7.8kb
yellow open gb 5 1 7 0 20.4kb 20.4kb
yellow open us 5 1 7 1 19.6kb 19.6kb
yellow open my_store 5 1 4 0 13.5kb 13.5kb

查看集群中的节点信息:

1
2
3
4
5
6
7
8
9
10
11
12
http :9200/_cluster/state/nodes

{
"cluster_name": "elasticsearch",
"nodes": {
"LTuVe3LzT2aKjynEaUPkCQ": {
"attributes": {},
"name": "Kiwi Black",
"transport_address": "127.0.0.1:9300"
}
}
}

创建索引:

1
2
3
4
5
http put :9200/testindex

{
"acknowledged": true
}

创建文档:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
http post :9200/testindex/user name="rcx" age="19"

{
"_id": "AVGuBqaBqxzGIsNTwSSo",
"_index": "testindex",
"_shards": {
"failed": 0,
"successful": 1,
"total": 2
},
"_type": "user",
"_version": 1,
"created": true
}

上面创建文档的时候没有指定 id,那么 ES 就会自动生产一个 id,我们也可以指定 id 创建文档:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
http post :9200/testindex/user/2 name="rcx2" age="29"

{
"_id": "2",
"_index": "testindex",
"_shards": {
"failed": 0,
"successful": 1,
"total": 2
},
"_type": "user",
"_version": 1,
"created": true
}

获取文档:

1
2
3
4
5
6
7
8
9
10
11
12
13
http get :9200/testindex/user/2

{
"_id": "2",
"_index": "testindex",
"_source": {
"age": "29",
"name": "rcx2"
},
"_type": "user",
"_version": 1,
"found": true
}

删除文档:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
http delete :9200/testindex/user/AVGuBqaBqxzGIsNTwSSo

{
"_id": "AVGuBqaBqxzGIsNTwSSo",
"_index": "testindex",
"_shards": {
"failed": 0,
"successful": 1,
"total": 2
},
"_type": "user",
"_version": 2,
"found": true
}

更新文档:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
http put :9200/testindex/user/2 name="name2"

{
"_id": "2",
"_index": "testindex",
"_shards": {
"failed": 0,
"successful": 1,
"total": 2
},
"_type": "user",
"_version": 2,//版本变成了2
"created": false
}

局部更新:在文档后面添加 _update 并且需要修改的字段在 doc 里面。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
http post :9200/testindex/user/2/_update doc:='{"name":"upname","title":"a title"}'

{
"_id": "2",
"_index": "testindex",
"_shards": {
"failed": 0,
"successful": 1,
"total": 2
},
"_type": "user",
"_version": 5
}

http get :9200/testindex/user/2

{
"_id": "2",
"_index": "testindex",
"_source": {
"age": "25",
"name": "upname",
"title": "a title"
},
"_type": "user",
"_version": 5,
"found": true
}

检索多个文档:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
http get :9200/testindex/user/_mget docs:='[{"_id":1},{"_id":2}]'

{
"docs": [
{
"_id": "1",
"_index": "testindex",
"_source": {
"age": "19",
"name": "rcx1",
"title": "安师大"
},
"_type": "user",
"_version": 1,
"found": true
},
{
"_id": "2",
"_index": "testindex",
"_source": {
"age": "25",
"name": "upname",
"title": "a title"
},
"_type": "user",
"_version": 5,
"found": true
}
]
}

使用 URI 来查询

Elasticsearch 的所有查询都会发到 _search 端点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
http :9200/testindex/_search

{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [
{
"_id": "2",
"_index": "testindex",
"_score": 1.0,
"_source": {
"age": "25",
"name": "upname",
"title": "a title"
},
"_type": "user"
},
{
"_id": "1",
"_index": "testindex",
"_score": 1.0,
"_source": {
"age": "19",
"name": "rcx1",
"title": "安师大"
},
"_type": "user"
}
],
"max_score": 1.0,
"total": 2
},
"timed_out": false,
"took": 31
}

也可以指定到 type 的查询:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
http :9200/us/user/_search

{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [
{
"_id": "1",
"_index": "us",
"_score": 1.0,
"_source": {
"email": "john@smith.com",
"name": "John Smith",
"username": "@john"
},
"_type": "user"
}
],
"max_score": 1.0,
"total": 1
},
"timed_out": false,
"took": 9
}

URI 查询中的字符串参数

  1. 查询 q,一个简单的例子: q=title:hello
  2. 默认字段查询,例子:df=title,这样指定后 q 后面可以直接跟 hello,如 q=hello
  3. 分析器,使用什么分析器,analyzer 参数
  4. 默认操作符,default_operator 可以设置成 OR 或者 AND,指定查询结果的默认布尔值运算。默认情况下是 OR。
  5. 查询解释,explan 参数可以设置成 true,跟 mysql 的 explan 类似
  6. 返回字段,fields 参数指定一个以逗号分割的字段名称列表。
  7. 结果排序,sort 参数,如:sort=date:asc。默认顺序是按照得分降序排列。如果指定了自定义排序,那么就会省略计算每个文档的得分。如果添加了自定义排序的同时还想继续保持每个文档的得分,需要添加 track_scores=true 参数。
  8. 索引超时,timeout=5s 参数
  9. 分页参数,size=5&from=10。size 参数默认是10
  10. 搜索类型 serach_type,默认搜索类型是 query_then_fetch。有如下六个值
    • dfs_query_then_fetch
    • dfs_query_and_fetch
    • query_then_fetch
    • query_and_fetch
    • count
    • scan
  11. 小写扩展词,lowercase_expanded_terms 属性来定义扩展词是否被转换成小写,默认该属性是 true。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
http :9200/us/tweet/_search q=='Elasticsearch' df=='tweet' sort=='date:asc' timeout=='3s' size=='5' from=='0' serach_type=='query_then_fetch'

{
"_shards": {
"failed": 0,
"successful": 5,
"total": 5
},
"hits": {
"hits": [
{
"_id": "6",
"_index": "us",
"_score": null,
"_source": {
"date": "2014-09-16",
"name": "John Smith",
"tweet": "The Elasticsearch API is really easy to use",
"user_id": 1
},
"_type": "tweet",
"sort": [
1410825600000
]
},
{
"_id": "10",
"_index": "us",
"_score": null,
"_source": {
"date": "2014-09-20",
"name": "John Smith",
"tweet": "Elasticsearch surely is one of the hottest new NoSQL products",
"user_id": 1
},
"_type": "tweet",
"sort": [
1411171200000
]
}
],
"max_score": null,
"total": 2
},
"timed_out": false,
"took": 1
}

【参考资料】

  1. Elasticsearch 权威指南

—EOF—