ElasticSearch 运维

ES 集群升级或维护

当把 elasticsearch 版本从 2.0.0 升级到 2.1.1 的时候，客户端完全不可用，接口变化了，因此在升级的时候，需要客户端也升级
重启整个 elasticsearch 集群之前，需要把副本关闭掉，然后在集群其中成功之后，再开启副本功能
重启单个 data 节点需要做以下操作，但是这个只针对冷索引
- 暂停数据写入程序
- 关闭集群shard allocation
- 手动执行POST /_flush/synced
- 重启节点
- 重新开启集群shard allocation
- 等待recovery完成，集群health status变成green
- 重新开启数据写入程序

删除磁盘上的数据的话，会引起 shard 数据变成默认值，因为只有在初始化的时候才会设置，所以删除后，手工进行初始化即可

停止所有的 ES 节点
停止客户端的数据写入
启动所有的 ES 节点
初始化 shard 数

启动客户端的数据写入

// 手动初始化 template    http://ip/_plugin/kopf/#!/indexTemplates    "template": "trace*",    "settings": {      "index": {        "number_of_shards": "3",        "number_of_replicas": "1"      }    },    "mappings": {      "test": {        "_source": {          "includes": [            "traceId"          ]        },        "properties": {          "traceId": {            "type": "string"          },          "binaryAnnotations": {            "search_analyzer": "linieAnalzyer",            "analyzer": "linieAnalzyer",            "index": "analyzed",            "type": "string"          },          "app": {            "index": "not_analyzed",            "type": "string"          },          "duration": {            "type": "long"          },          "suc": {            "type": "string"          },          "bizAnnotations": {            "index": "not_analyzed",            "type": "string"          },          "host": {            "index": "not_analyzed",            "type": "string"          },          "id": {            "type": "string"          },          "serviceId": {            "index": "not_analyzed",            "type": "string"          },          "spanName": {            "type": "string"          },          "timestamp": {            "type": "long"          }        }      }    },    "aliases": {       }

停止 data 节点，shard 分配时间恢复，集群数据量大概是 1.26TB
- 停止1个节点，恢复时间大概是20分钟
- 停止2个节点，恢复时间大概是55分钟
- 停止3个节点，恢复时间大概是100分钟
- 停止5个节点，恢复时间大概是120分钟

ElasticSearch 使用经验

ES_HEAP_SIZE 使用不要超过物理内存的一半，最好不要超过 30.5 G
设置 vm.swappiness = 1
在 elasticsearch.yml 中设置 bootstrap.mlockall: true
官方推荐垃圾回收算法针对 ES 应该设置 CMS
search 的 theadpool 推荐设置为 cores * 3，其他线程池推荐和 cores 数量相等
disk I/O 的线程是有 Lucene 处理的，而不是 ES
设置 vm.max_map_count=262144 用于 mmapped files 的虚拟内存
修改配置请使用 API，API，API，而不是修改配置，修改 API 有两种方式，一种临时的，一种持久的，临时的修改当节点下次重启的时候会失效，永久的当下次重启的时候会覆盖静态配置
```
curl -XPUT ip:9200/_cluster/settings -d  '{      "transient": {          "logger.discover": "DEBUG"       }      "persistent": {          "discovery.zen.minimum_master_nodes": 2      }  }'
```
设置 slowlog 的阀值有利于定位问题，可以设置 queries, fetches, indexing 的 slowlog
如果你的集群是偏重于 indexing，而对 search 的性能要求不是毫秒级别的返回，可以做些设置做平衡。

11. Bulk size 比 documents 数量重要，比如 5-15 MB 每个 bulk

12. 磁盘，首选 SSD，没有 SSD 的话，可以使用RAID0

13. path.data 配置多个路径

14. 避免 segment merging ，当发生的时候， ES 会自动 throttle index 请求到单个线程，当你使用 SSD 的时候，默认的限制 20MB/s 太小了，可以设置成 100-200MB/s

15. 如果在做一个大的 bulk import，可以设置 index.number_of_replicas：0，等到插入完成的时候，再进行 replicas

16. 在升级或者做其他维护需要重启服务器的时候，请保持 ES master 节点是可用的，然后一台台重启 data 节点，最好是可以通过 api 先禁止 shard 分配，如果可以停止 indexing 新数据最好（但是这是不可能的）

执行命令禁止 allocation

PUT /_cluster/settings  {      "transient": {          "cluster.routing.allocation.enable": "none"       }  }

使用 shutdown api 停止一个节点
```
POST /_cluster/nodes/_local/_shutdown
```
执行升级或者维护
重启节点，确认加入了集群

启动 shard allocation

PUT /_cluster/settings  {      "transient": {          "cluster.routing.allocation.enable": "all"       }  }

其他节点重复1到5步骤

17. 使用 snapshot API 备份，建议备份到 HDFS 中，然后使用 _restore API 来恢复备份。

18. 通过设置 action.disable_delete_all_indices: true. 来禁止通过 API 删除所有索引

19. indices.fielddata.cache.size: 25% ，在 datanode 设置 indices.cluster.send_refresh_mapping: false

20. cluster.routing.allocation.cluster_concurrent_rebalance:2 最好设置低一些，这样不会影响索引

21. cluster.routing.allocation.disk.threshold_enabled:true
cluster.routing.allocation.disk.watermark.low:.97
cluster.routing.allocation.disk.watermark.high:.99
22. 恢复： cluster.routing.allocation.node_concurrent_recoveries:4
cluster.routing.allocation.node_initial_primaries_recoveries:18
indices.recovery.concurrent_streams: 4
indices.recovery.max_bytes_per_sec: 40mb

23. 避免数据丢失和 _bulk 的重试 threadpool.bulk.queue_size: 3000