Big Little Ant

elasticsearch-mapping学习

Posted on 2016-12-04 Edited on 2020-09-16 In linux , elastic

它类似于静态语言中的数据类型声明，比如声明一个字段为String，以后这个变量都只能存储String类型的数据。同样的，一个number类型的 mapping 字段只能存储number类型的数据。

如何理解elastic中各参数的定义

index —-> database
type —–> table
mapping —-> 表结构

默认情况不需要自定义mapping，当新的type或者field引入时，Elasticsearch会自动创建并且注册有合理的默认值的mapping，只有要覆盖默认值时才必须要提供mapping定义。

mapping字段

自定义Mapping

下面是一个简单的Mapping定义：

curl -XPUT 'http://localhost:9200/test-index' -d '
{
    "settings": {
        "number_of_shards": 3,    //设置分片
        "number_of_replicas": 1   //设置副本集
    },
  "mappings": {
    "user": {   // 对应type名
      "_all":       { "enabled": false  },
      "properties": {  //指定每个字段的映射类型或属性。
        "title":    { "type": "string"  },
        "body":     { "type": "string"  },
        "user_id":  {
          "type":   "string",
          "index":  "not_analyzed"
        },
        "created":  {
          "type":   "date",
          "format": "strict_date_optional_time||epoch_millis"
        }
      }
    }

其中login_log是type（相当于关系数据库中的表），在login_log中我们定义了name、age、about、ip、last_time这6个列。

type字段

type字段用来规定字段的数据类型。
Elasticsearch支持以下数据类型：

文本: string
数字: byte, short, integer, long
浮点数: float, double
布尔值: boolean
Date: date

index字段

index 属性控制string如何被索引，它有三个可选值:

analyzed：First analyze the string, then index it. In other words, index this field as full text.
not_analyzed：Index this field, so it is searchable, but index the value exactly as specified. Do not analyze it.
no：Don’t index this field at all. This field will not be
searchable.

对于string类型的filed index 默认值是： analyzed.对于URL这些不需要分词的字段，我们可以将它设置为：not_analyzed。

analyzer 字段

analyzer 字段用来指定改字段用什么分词器去分词。
elastic默认的分词是：standard analyzer 。
elastic其他分词器：whitespace, simple, or english 。
我们也可以自己安装指定的分词器并指定。

1 2	{ "name": { "type": "string", "analyzer": "ik" } \\ 指定使用ik分词器对name字段进行分词。

增加mapping字段

将写好的mapping字段保存到mapping.json（名字随便起）文件中。

1	POST 'http://192.168.56.12:9200/test-index' --d @mapping.json

更新mapping字段

将新的mapping字段保存到mapping-1.json（名字随便起）文件中。

1	POST 'http://192.168.56.12:9200/test-index' --d @mapping-1.json

查看mapping字段

curl -XGET 'http://192.168.56.12:9200/test-index/user/_mapping?pretty'
##字段解释
- XGET method类型
- http://192.168.56.12:9200 elastic的搜索端口
- test-index  索引名称。
- user 索引的type类型。
- _mapping?pretty pretty以人类可读的形式显示。

mapping补充

可以修改的项：

增加新的类型定义
增加新的字段
增加新的分析器

不允许修改的项：

更改字段类型(比如文本改为数字)
更改存储为不存储，反之亦然
更改索引属性的值
更改已索引文档的分析器

线上使用mapping字段展示

这是一个logstash的mapping字段

{
    "logstash": {
        "order": 0,
        "template": "logstash-*-*",
        "settings": {  //配置索引的参数
            "index.refresh_interval": "30s",
            "index.translog.flush_threshold_ops": "50000"
        },
        "mappings": { // 必须字段
            "_default_": { //定义默认参数
                "dynamic_templates": [  // 动态字段定义
                    {
                        "message_field": {
                            "mapping": {
                                "index": "analyzed",
                                "omit_norms": true,
                                "type": "string",
                                "fields": {
                                    "raw": {
                                        "ignore_above": 256, //只保留256个字符
                                        "index": "not_analyzed", //不分词
                                        "type": "string",  // 字段类型string
                                        "doc_values": true
                                    }
                                }
                            },
                            "match_mapping_type": "string",  //匹配哪些字段
                            "match": "message" // 匹配哪些名称
                        }
                    },
                    {
                        "string_fields": {
                            "mapping": {
                                "index": "analyzed",
                                "omit_norms": true,
                                "type": "string",
                                "fields": {
                                    "raw": {
                                        "ignore_above": 256,
                                        "index": "not_analyzed",
                                        "type": "string",
                                        "doc_values": true
                                    }
                                }
                            },
                            "match_mapping_type": "string",
                            "match": "*"
                        }
                    }
                ],
                "_all": {
                    "enabled": false
                },
                "properties": {
                    "geoip": {
                        "path": "full",
                        "dynamic": true,
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "geo_point"
                            }
                        }
                    },
                    "@timestamp": {
                        "format": "dateOptionalTime",
                        "index": "not_analyzed",
                        "type": "date",
                        "doc_values": true
                    },
                    "@version": {
                        "index": "not_analyzed",
                        "type": "string"
                    }
                }
            }
        },
        "aliases": {}
    }
}

动态字段

格式如下

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [ // 定义这是动态字段
        {
          "integers": { //名称
            "match_mapping_type": "long", //匹配规则
            "mapping": {
              "type": "integer"}}},
        {
          "strings": { //名称
            "match_mapping_type": "string", //匹配规则
            "mapping": {
              "type": "string",
              "fields": {  //过滤出一个子字段
                "raw": { //名字叫raw，线上会有两个字段name,name.raw ,raw 可以更好成别的字符串。
                  "type":  "string",
                  "index": "not_analyzed",
                  "ignore_above": 256
                }}}}}]}}}

match、unmatch和match_mapping_type

符合 match 的规则会被匹配。
符合 unmatch 的规则会被忽略。
match_mapping_type 匹配类型。

1
2
3

"match_mapping_type": "string",
"match":   "long_*",
"unmatch": "*_text",

match_pattern

match_pathern 正则匹配。

1 2	"match_pattern": "regex", "match": "^profit_\d+$"

path_match and path_unmatch

path_match 路径匹配

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        {
          "full_name": {
            "path_match":   "name.*",
            "path_unmatch": "*.middle",
            "mapping": {
              "type":       "string",
              "copy_to":    "full_name"
                }}}]}}}

{name} and {dynamic_type}

可以使用的变量

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        {
          "named_analyzers": {
            "match_mapping_type": "string",
            "match": "*",
            "mapping": {
              "type": "string",
              "analyzer": "{name}"
            }
          }
        },
        {
          "no_doc_values": {
            "match_mapping_type":"*",
            "mapping": {
              "type": "{dynamic_type}",
              "doc_values": false
                }}}]}}}

elasticsearch-migration 在线导入导出工具

Posted on 2016-12-03 Edited on 2020-09-16 In linux , elastic

elasticsearch 数据导入导出工具
支持基于HTTP导出、导入。
支持保存到文件。

安装

1	npm install elasticdump -g

简单使用

拷贝分词器信息

elasticdump \
--input=http://192.168.56.13:9200/thread \
--output=http://192.168.56.12:9200/thread1 \
--type=analyzer

拷贝mapping信息

elasticdump \
--input=http://192.168.56.13:9200/thread \
--output=http://192.168.56.12:9200/thread1 \
--type=mapping

拷贝数据

elasticdump \
--input=http://192.168.56.13:9200/thread \
--output=http://192.168.56.12:9200/thread1 \
--type=data \
--limit=1000

limit 设置每次拷贝多少内容，默认是100。
拷贝数据的同时mapping也会被拷贝。

将数据保存到本地

elasticdump \
--input=http://192.168.56.13:9200/thread \
--output=thread.json \
--type=mapping

elasticdump \
--input=http://192.168.56.13:9200/thread \
--output=thread-data.json \
--type=data

保存到压缩文件中

# Backup and index to a gzip using stdout:
elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=$ \
  | gzip > /data/my_index.json.gz

基于搜索保存文件内容

elasticdump \
  --input=http://production.es.com:9200/my_index \
  --output=query.json \
  --searchBody '{"query":{"term":{"username": "admin"}}}'

基于全局搜索的结果保存

elasticdump \
  --input=http://es.com:9200/api/search \
  --input-index=my_index/my_type \
  --output=http://es.com:9200/api/search \
  --output-index=my_index \
  --type=mapping

帮助文档

elastic-dump-github

curator-4.1 版本使用

Posted on 2016-12-02 Edited on 2020-09-16 In linux , elastic

旧版使用--host的选项来操作elastic。新版使用接YAML的配置文件来定义配置文件。

安装方法

1
2
3

yum install python-pip  -y
pip install --upgrade pip
pip install elasticsearch-curator -y

命令格式：

# curator --help
Usage: curator [OPTIONS] ACTION_FILE
  Curator for Elasticsearch indices.

Options:
  --config PATH  Path to configuration file. Default: ~/.curator/curator.yml
  --dry-run      Do not perform any changes.
  --version      Show the version and exit.
  --help         Show this message and exit

  ### 实例
  ## 编写curator.yml(服务器的配置文件)
  ## 编写action.yml（执行的命令）
   /bin/curator --config curator.yml action.yml

编写服务器配置文件

vim curator.yml

client: ##配置要连接的客户端
  hosts:
    - 192.168.56.11
    - 192.168.56.12
  port: 9200
  url_prefix:
  use_ssl: False
  certificate:
  client_cert:
  client_key:
  ssl_no_validate: False
  http_auth:
  timeout: 30
  master_only: False

logging: ##配置显示日志的信息
  loglevel: INFO
  logfile:
  logformat: default
  blacklist: ['elasticsearch', 'urllib3']

实例

删除五天前的log

actions:
  1:
    action: delete_indices
    description: >-
      Delete indices older than 45 days (based on index name), for logstash-
      prefixed indices. Ignore the error if the filter does not result in an
      actionable list of indices (ignore_empty_list) and exit cleanly.
    options:
      ignore_empty_list: True
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern
      kind: prefix
      value: chuye-adcounter-
      exclude: True ##默认为False，如果为True，表示排除。
    - filtertype: age
      source: name
      direction: older
      timestring: '%Y.%m.%d'
      unit: days
      unit_count: 5
      exclude:

    ### 代码解释
    第一步：指定要做的动作（delete_indices）删除索引。
    第二步：使用filters过滤出要删除的参数。
    第三步：使用filter：pathern模块，匹配要删除索引的名称，exclude=True表示排除。
    第四步： 使用filter：age模块，匹配时间，删除多久。

创建备份镜像

actions:
  1:
    action: snapshot
    description: >-
      Snapshot logstash- prefixed indices older than 1 day (based on index
      creation_date) with the default snapshot name pattern of
      'curator-%Y%m%d%H%M%S'.  Wait for the snapshot to complete.  Do not skip
      the repository filesystem access check.  Use the other options to create
      the snapshot.
    options:
      repository: search-backup ##备份到那个仓库
      name: search-%Y%m%d%H%M%S ## 备份服务器的时间
      ignore_unavailable: False
      include_global_state: True
      partial: False
      wait_for_completion: True
      skip_repo_fs_check: False
      timeout_override:
      continue_if_exception: False
      disable_action: False
    filters:
    - filtertype: pattern   ## 匹配要备份的分片
      kind: regex
      value: .*
      exclude:
    - filtertype: age ## 设置要备份的时间范围
      source: creation_date
      direction: older
      unit: days
      unit_count: 1
      exclude:

帮助文档

官方文档

elasticsearch 生产环境配置

Posted on 2016-12-01 Edited on 2020-09-16 In linux , elastic

elasticsearch 简介

ElasticSearch是一个基于Lucene构建的开源，分布式，RESTful搜索引擎;设计用于云计算；能够达到实时搜索，稳定，可靠，快速。

ElasticSearch的一些概念:

集群 (cluster)

在一个分布式系统里面,可以通过多个elasticsearch运行实例组成一个集群,这个集群里面有一个节点叫做主节点(master),elasticsearch是去中心化的,所以这里的主节点是动态选举出来的,不存在单点故障。

在同一个子网内，只需要在每个节点上设置相同的集群名,elasticsearch就会自动的把这些集群名相同的节点组成一个集群。节点和节点之间通讯以及节点之间的数据分配和平衡全部由elasticsearch自动管理。
在外部看来elasticsearch就是一个整体。

节点(node)

每一个运行实例称为一个节点,每一个运行实例既可以在同一机器上,也可以在不同的机器上.所谓运行实例,就是一个服务器进程.在测试环境内,可以在一台服务器上运行多个服务器进程,在生产环境建议每台服务器运行一个服务器进程。

索引(index)

这里的索引是名词不是动词,在elasticsearch里面支持多个索引。类似于关系数据库里面每一个服务器可以支持多个数据库一样。在每一索引下面又支持多种类型，类似于关系数据库里面的一个数据库可以有多张表。但是本质上和关系数据库有很大的区别。这里暂时可以这么理解。

分片(shards)

把一个索引分解为多个小的索引，每一个小的索引叫做分片。分片后就可以把各个分片分配到不同的节点中。

副本(replicas)

每一个分片可以有0到多个副本，每个副本都是分片的完整拷贝，可以用来增加速度，同时也可以提高系统的容错性，一旦某个节点数据损坏，其他节点可以代替他。

实验环境介绍

实验环境规划

主机名	ip address	操作系统	职责
linux-node1.example.com	192.168.56.11	centos7	elastic-master，nfs-server
linux-node2.example.com	192.168.56.12	centos7	elastic-slave

系统版本环境

[root@linux-node1 mount10:20:25]#uname -r
3.10.0-229.el7.x86_64
[root@linux-node1 mount12:07:36]#uname -m
x86_64
[root@linux-node1 mount12:07:41]#cat  /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)

elastic安装过程

安装elastic软件

先安装java环境

使用saltstack快速安装java环境

开始安装elastic

cd /usr/local/src/
curl -L -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.1.1/elasticsearch-2.1.1.tar.gz
tar -zxf elasticsearch-2.1.1.tar.gz
mv elasticsearch-2.1.1 /usr/local/
cd ../
ln -s elasticsearch-2.1.1/ elastic

以上配置linux-node1,linux-node2 都执行一遍

elasticsearch yum 安装

rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
cat >/etc/yum.repos.d/elasticsearch.repo <<EOF
[elasticsearch-2.x]
name=Elasticsearch repository for 2.x packages
baseurl=https://packages.elastic.co/elasticsearch/2.x/centos
gpgcheck=1
gpgkey=https://packages.elastic.co/GPG-KEY-elasticsearch
enabled=1
EOF
yum install elasticsearch

编辑elastic的配置文件

完整的配置文件

elastic主节点配置

grep '^[a-z]' /usr/local/elastic/config/elasticsearch.yml
cluster.name: biglittleant
node.name: "linux-node1"
index.number_of_shards: 5
index.number_of_replicas: 1
path.conf: /usr/local/elastic/config
path.data: /usr/local/elastic/data
path.work: /usr/local/elastic/work
path.logs:  /usr/local/elastic/logs
path.plugins: /usr/local/elastic/plugins
bootstrap.mlockall: true
transport.tcp.port: 9300
http.port: 9200
network.host: 192.168.56.11
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.56.11", “192.168.56.12"]
path.repo: ["/data/mount/"]

elastic从节点配置

cluster.name: biglittleant
node.name: "linux-node2"
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.56.11", “192.168.56.12"]
path.repo: ["/data/mount/"]

elastic单节点配置

cluster.name: biglittleant
node.name: "linux-node1"
index.number_of_shards: 1
index.number_of_replicas: 0##单节点复制集要为0
path.conf: /usr/local/elastic/config
path.data: /usr/local/elastic/data
path.work: /usr/local/elastic/work
path.logs:  /usr/local/elastic/logs
path.plugins: /usr/local/elastic/plugins
bootstrap.mlockall: true
http.port: 9200
network.host: 192.168.56.11
path.repo: ["/data/mount/"]

elasticsearch配置文件解释

cluster.name: biglittleant##集群节点的名称，一旦配置后不能更改。
node.name: "linux-node1"#当前节点的名称
index.number_of_shards: 5##索引分几个分片。
index.number_of_replicas: 1##创建几个副本。
path.conf: /usr/local/elastic/config##config 存放的位置
path.data: /usr/local/elastic/data ##数据存放的位置
#path.data: /path/to/data1,/path/to/data2 ###可以配置多个路径。
path.work: /usr/local/elastic/work##临时文件存放路径
path.logs:  /usr/local/elastic/logs##日志存放的路径
path.plugins: /usr/local/elastic/plugins##插件存放的位置
bootstrap.mlockall: true #锁住内存
#transport.tcp.port: 9300 ###集群交互的端口。
http.port: 9200 ##对外的端口
network.host: 192.168.56.11#监听的网络，如果不配置默认为127.0.0.1
discovery.zen.ping.multicast.enabled: false##禁用组播
discovery.zen.ping.unicast.hosts: ["192.168.56.11", “192.168.56.12"]#集群服务器的ip列表
path.repo: ["/data/mount/"]#集群的备份仓库

安装elastic的服务器管理插件

使用head插件来查看索引数据

1 2	/usr/local/elastic/bin/plugin install mobz/elasticsearch-head http://192.168.56.11:9200/_plugin/head/

使用kopf来备份集群节点

1 2	/usr/local/elastic/bin/plugin install lmenezes/elasticsearch-kopf http://192.168.56.11:9200/_plugin/kopf/

使用bigdesk查看集群性能

1 2	/usr/local/elastic/bin/plugin install hlstudio/bigdesk http://192.168.56.11:9200/_plugin/bigdesk/

安装中文分词插件

第一步编译分词插件：

yum install maven
git cloen https://github.com/medcl/elasticsearch-analysis-ik
cd elasticsearch-analysis-ik
mvn package
#编译完成：
Downloaded: http://repo.maven.apache.org/maven2/org/codehaus/plexus/plexus-utils/2.0.1/plexus-utils-2.0.1.jar (217 KB at 43.5 KB/sec)
[INFO] Reading assembly descriptor: /usr/local/src/elasticsearch-analysis-ik-1.10.0/src/main/assemblies/plugin.xml
[INFO] Building zip: /usr/local/src/elasticsearch-analysis-ik-1.10.0/target/releases/elasticsearch-analysis-ik-1.10.0.zip
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 16:26.127s
[INFO] Finished at: Mon Oct 10 16:58:59 CST 2016
[INFO] Final Memory: 25M/61M
[INFO] ------------------------------------------------------------------------

## 将编译好的IK分词拷贝到plugins目录中。
cd target/releases/
unzip elasticsearch-analysis-ik-1.10.0.zip -d ik
mv elasticsearch-analysis-ik-1.10.0 /data/app/elastic/plugins/ik

## 修改ES的配置文件，打开config/elasticsearch.yml,在最后添加配置

index.analysis.analyzer.default.type: ik

第三步：重启elastic服务器，并查看日志

1
2
3

systemctl restart elasticsearch
# 重启elasticsearch即可,重启会看到以下plugins信息
[2016-03-14 23:47:33,184][INFO ][plugins                  ] [Lightspeed] modules [lang-expression, lang-groovy], plugins [elasticsearch-analysis-ik], sites []

安装analysis-pinyin的中文插件（可以不安装，步骤跟安装ik分词插件一样）

git clone https://github.com/medcl/elasticsearch-analysis-pinyin.git
cd elasticsearch-analysis-pinyin
mvn clean install -Dmaven.test.skip
#复制target/releases目录下的*-pinyin.zip并解压到elasticsearch/plugins/

index.analysis.analyzer.default.type: keyword

测试分词插件的效果

http://192.168.0.211:9200/_analyze?analyzer=ik&pretty=true&text=helloworld,欢迎你

关于分词器的详细解释可以参考文章最后elastic分词器详细解释。

服务器启动前调优

常见优化参数

提高索引性能和速度从几下方面着手：

增大索引实时时间设置：index.engine.robin.refresh_interval :10s (默认为1s) 。
增大内存缓冲区： indices.memory.index_buffer_size:20% (默认为heap大小的10%)。
增加translog方面的设置： index.translog.flush_threshold:10000 (默认为5000）。
增加分配给ES的内存，默认为1g。
减小replaca. 索引时可设置为0. 完成索引后再设置成想要的。
增加机器数。
index.merge.policy.use_compound_file 设置为false. 这样的话，可以减少Merge （保证open file size 够大）。

配置汇总

## 第一部分
index.analysis.analyzer.default.type: ik
index.cache.field.type: soft
index.cache.field.max_size: 50000
index.cache.field.expire: 5m

## 第二部分
index.cache.query.enable: true
indices.cache.query.size: 5%

## 第三部分
index.search.slowlog.level: TRACE

## 第四部分
index.store.compress.stored: true
index.store.compress.tv: true

## 第五部分
#indices.store.throttle.type: none
indices.store.throttle.max_bytes_per_sec: 100mb
#index.routing.allocation.total_shards_per_node: 2
#script.disable_dynamic: false

配置解释

第一部分

1. 设置es的缓存类型为Soft Reference，它的主要特点是据有较强的引用功能。只有当内存不够的时候，才进行回收这类内存，因此在内存足够的时候，它们通常不被回收。另外，这些引 用对象还能保证在Java抛出OutOfMemory 异常之前，被设置为null。它可以用于实现一些常用图片的缓存，实现Cache的功能，保证最大限度的使用内存而不引起OutOfMemory。在es的配置文件加上index.cache.field.type: soft即可。
2. 上index.cache.field.type: soft ## 最大限度使用内存。
3. index.cache.field.max_size: 50000## es最大缓存数据条数。
4. index.cache.field.expire: 10m ##把过期时间设置成10分钟。

第二部分

1
2
3

index.cache.query.enable: true ##默认配置是false。

indices.cache.query.size: 2%  ## 默认配置 1%。

第三部分

1	index.search.slowlog.level: TRACE

慢查询的级别：TRACE，表示追踪模式。还可以设置成info模式。

第四部分-索引相关

1
2
3

index.store.compress.stored: true

index.store.compress.tv: true

在elasticsearch.yml设置这两个属性可压缩数据文件，极大的减少文件的大小。

第五部分-硬盘写入速率的设置

#indices.store.throttle.type: none

indices.store.throttle.max_bytes_per_sec: 100mb

#index.routing.allocation.total_shards_per_node: 2

#script.disable_dynamic: false

写入磁盘的速率，默认是20m/s 适用于机械硬盘，100m/s-200m/s 适用于SSD硬盘。参考文档

配置NFS来存放elastic的备份

#linux-node1上执行
yum install nfs-utils rpcbind -y
cat >> /etc/exports<<EOF
/data/backup 192.168.56.0/24(rw,sync,all_squash)
EOF
mkdir /data/{backup,mount}  -p
chown -R nfsnobody.nfsnobody /data/backup
systemctl start rpcbind
systemctl start nfs
mount.nfs 192.168.56.11:/data/backup /data/mount/
#linux-node2上执行
yum install nfs-utils rpcbind -y
mkdir /data/mount -p
mount.nfs 192.168.56.11:/data/backup /data/mount/

java参数的调优

Heap不要超过系统可用内存的一半，并且不要超过32GB。JVM参数呢？对于初级用户来说，并不需要做特别调整，仍然遵从官方的建议，将xms和xmx设置成和heap一样大小，避免动态分配heap size就好了。虽然有针对性的调整JVM参数可以带来些许GC效率的提升，当有一些“坏”用例的时候，这些调整并不会有什么魔法效果帮你减轻heap压力，甚至可能让问题更糟糕。

vim /usr/local/elastic/bin/elasticsearch.in.sh
if [ "x$ES_MIN_MEM" = "x" ]; then
    ES_MIN_MEM=256m
fi
if [ "x$ES_MAX_MEM" = "x" ]; then
    ES_MAX_MEM=256m
fi
#虚拟机环境，所以配置了256M的内存，实际物理机器根据内存大小动态调节。

elastic 开启jmx 监控

/usr/local/elastic/bin/elasticsearch.in.sh
JMX_PORT=9305
JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"
JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.ssl=false"
JAVA_OPTS="$JAVA_OPTS -Dcom.sun.management.jmxremote.authenticate=false"
JAVA_OPTS="$JAVA_OPTS -Djava.rmi.server.hostname=192.168.56.11"

服务器优化

sysctl -w vm.max_map_count=262144##生产上一定要打开文件描述符。
mkdir /usr/local/elastic/{data,logs,work,plugins} -p##创建相应的目录
useradd elastic ##创建启动用户
chown -R elastic.elastic /usr/local/elasticsearch-2.1.1/

启动elastic服务

su -c '/usr/local/elastic/bin/elasticsearch -d ' elastic
##启动服务
##查看端口是否存在
ss -lntup |grep 9300
tcp    LISTEN     0      50                    :::9300                 :::*      users:(("java",9757,56))
# ss -lntup |grep 9200
tcp    LISTEN     0      50                    :::9200                 :::*      users:(("java",9757,94))

##curl 查看结果 ：
curl http://192.168.56.11:9200
{
  "status" : 200,
  "name" : "linux-node1",
  "cluster_name" : "biglittleant",
  "version" : {
    "number" : "1.7.1",
    "build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19",
    "build_timestamp" : "2015-07-29T09:54:16Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.4"
  },
  "tagline" : "You Know, for Search"
}

管理集群配置

查看集群设置

1	curl -XGET http://10.10.160.129:9200/_cluster/settings

停止分片同步

curl -XPUT http://10.10.160.129:9200/_cluster/settings -d '{
  "transient" : {
    "cluster.routing.allocation.enable" : "none"
  }
}'

启动分片同步

curl -XPUT http://10.10.160.129:9200/_cluster/settings -d '{
  "transient" : {
    "cluster.routing.allocation.enable" : "all"
  }
}'

备份elasticsearch 数据

先导入一些数据进行备份

1
2
3

curl -XPOST 'http://192.168.56.11:9200/bank/account/_bulk?pretty' --data-binary @accounts.json
curl -XPOST 'http://192.168.56.11:9200/shakespeare/_bulk?pretty' --data-binary @shakespeare.json
curl -XPOST 'http://192.168.56.11:9200/_bulk?pretty' --data-binary @logs.jsonl

使用API创建一个镜像仓库

curl -XPOST http://192.168.56.11:9200/_snapshot/my_backup -d '
{
    "type": "fs",
    "settings": {
        "location": "/data/mount"
        "compress":  true
    }
}'
##解释：
镜像仓库的名称：my_backup
镜像仓库的类型：fs。还支持curl，hdfs等。
镜像仓库的位置：/data/mount 。这个位置必须在配置文件中定义。
是否启用压缩：compres：true 表示启用压缩。

备份前检查配置

必须确定备份使用的目录在配置文件中声明了，否则会爆如下错误

{
  "error": {
    "root_cause": [
      {
        "type": "repository_exception",
        "reason": "[test-bakcup] failed to create repository"
      }
    ],
    "type": "repository_exception",
    "reason": "[test-bakcup] failed to create repository",
    "caused_by": {
      "type": "creation_exception",
      "reason": "Guice creation errors:\n\n1) Error injecting constructor, RepositoryException[[test-bakcup] location [/data/mount] doesn't match any of the locations specified by path.repo because this setting is empty]\n  at org.elasticsearch.repositories.fs.FsRepository.<init>(Unknown Source)\n  while locating org.elasticsearch.repositories.fs.FsRepository\n  while locating org.elasticsearch.repositories.Repository\n\n1 error",
      "caused_by": {
        "type": "repository_exception",
        "reason": "[test-bakcup] location [/data/mount] doesn't match any of the locations specified by path.repo because this setting is empty"
      }
    }
  },
  "status": 500
}

开始创建一个快照

##在后头创建一个快照
curl -XPUT  http://192.168.56.20:9200/_snapshot/my_backup/snapshot_1
##也可以在前台运行。
curl -XPUT  http://192.168.56.11:9200/_snapshot/my_backup/snapshot_1?wait_for_completion=true
##上面的参数会在my_backup仓库里创建一个snapshot_1 的快照。

可以选择相应的索引进行备份

curl -XPUT  http://192.168.56.20:9200/_snapshot/my_backup/snapshot_2 -d '
{
    "indices": "bank,logstash-2015.05.18"
}'
##解释：
创建一个snapshot_2的快照，只备份bank,logstash-2015.05.18这两个索引。

查看备份状态

整个备份过程中，可以通过如下命令查看备份进度

1 2	curl -XGET http://192.168.0.1:9200/_snapshot/my_backup/snapshot_20150812/_status

主要由如下几种状态：

INITIALIZING 集群状态检查，检查当前集群是否可以做快照，通常这个过程会非常快
STARTED 正在转移数据到仓库
FINALIZING 数据转移完成，正在转移元信息
DONE　完成
FAILED 备份失败

取消备份

1	curl -XDELETE http://192.168.0.1:9200/_snapshot/my_backup/snapshot_20150812

获取所有快照信息。

1
2
3

curl -XGET http://192.168.56.20:9200/_snapshot/my_backup/_all |python -mjson.tool
##解释
查看my_backup仓库下的所有快照。

手动删除快照

1
2
3

curl -XDELETE http://192.168.56.20:9200/_snapshot/my_backup/snapshot_2
##解释
删除my_backup仓库下的snapshot_2的快照。

备份恢复

恢复备份

1	curl -XPOST http://192.168.0.1:9200/_snapshot/my_backup/snapshot_20150812/_restore

同备份一样，也可以设置wait_for_completion=true等待恢复结果

1	curl -XPOST http://192.168.0.1:9200/_snapshot/my_backup/snapshot_20150812/_restore?wait_for_completion=true

默认情况下，是恢复所有的索引，我们也可以设置一些参数来指定恢复的索引，以及重命令恢复的索引，这样可以避免覆盖原有的数据.

curl -XPOST http://192.168.0.1:9200/_snapshot/my_backup/snapshot_20150812/_restore
{
    "indices": "index_1",
    "rename_pattern": "index_(.+)",
    "rename_replacement": "restored_index_$1"
}
上面的indices, 表示只恢复索引’index_1’
rename_pattern: 表示重命名索引以’index_’开头的索引.
rename_replacement: 表示将所有的索引重命名为’restored_index_xxx’.如index_1会被重命名为restored_index_1.

查看所有索引的恢复进度

1	curl -XGET http://192.168.0.1:9200/_recovery/

查看索引restored_index_1的恢复进度

1	curl -XGET http://192.168.0.1:9200/_recovery/restored_index_1

取消恢复

只需要删除索引，即可取消恢复

1	curl -XDELETE http://192.168.0.1:9200/restored_index_1

动态缩写或者扩容副本分片数量

副本节点的数量可以在运行中的集群中动态的变更，这允许我们可以根据需求扩大或者缩小规模。

比如我们执行一次缩小规模操作:

curl -XPUT  http://192.168.56.12:9200/shakespeare/_settings '
{
   "number_of_replicas" : 3
}'
执行结果返回:
{
    "acknowledged": true

这时,我们看到片的信息分又重新做了调整: 主分片分布在节点es-node1,es-node3,es-node4上.从分片分布在es-node2,es-node3,es-node4上.

运维相关

如何重启elastic单台节点

停止数据写入，在重启单台节点，启动后分配同步会很快。
如果开启数据写入，在重启单台节点，分片同步会很耗时。

elastic 帮助文档

elastic调优参考

elastic监控

Mastering Elasticsearch(中文版)

ELK-权威指南

Elasticsearch 权威指南

ELK 之二：ElasticSearch 和Logstash高级使用

elastic-生产部署时遇到的问题

out of memory错误

因为默认情况下es对字段数据缓存（Field Data Cache）大小是无限制的，查询时会把字段值放到内存，特别是facet查询，对内存要求非常高，它会把结果都放在内存，然后进行排序等操作，一直使用内存，直到内存用完，当内存不够用时就有可能出现out of memory错误。

问题原理

设置es的缓存类型为Soft Reference，它的主要特点是据有较强的引用功能。只有当内存不够的时候，才进行回收这类内存，因此在内存足够的时候，它们通常不被回收。另外，这些引用对象还能保证在Java抛出OutOfMemory 异常之前，被设置为null。它可以用于实现一些常用图片的缓存，实现Cache的功能，保证最大限度的使用内存而不引起OutOfMemory。在es的配置文件加上index.cache.field.type: soft即可。
设置es最大缓存数据条数和缓存失效时间，通过设置index.cache.field.max_size: 50000来把缓存field的最大值设置为50000，设置index.cache.field.expire: 10m把过期时间设置成10分钟。

解决办法

1
2
3

index.cache.field.type: soft
index.cache.field.max_size: 50000
index.cache.field.expire: 5m

pip-修改国内镜像源

Posted on 2016-10-10 Edited on 2020-09-18 In python

mkdir ~/.pip
vim ~/.pip/pip.conf
[global]
index-url = http://mirrors.aliyun.com/pypi/simple/
[install]
trusted-host = mirrors.aliyun.com

注意事项：

http://mirrors.aliyun.com/pypi/simple/ 中的simple目录必须有。
–no-cache-dir 重新下载安装包，而不是使用缓存包。
trusted-host = mirrors.aliyun.com 一定要加上这行，否则会报错。
pip国内镜像源

阿里云 http://mirrors.aliyun.com/pypi/simple/
中国科技大学 https://pypi.mirrors.ustc.edu.cn/simple/
豆瓣 http://pypi.douban.com/simple
Python官方 https://pypi.python.org/simple/
v2ex http://pypi.v2ex.com/simple/
中国科学院 http://pypi.mirrors.opencas.cn/simple/
清华大学 https://pypi.tuna.tsinghua.edu.cn/simple/

报错汇总

报错内容

pip install mysql-python
Collecting mysql-python
  The repository located at pypi.douban.com is not a trusted or secure host and is being ignored. If this repository is available via HTTPS it is recommended to use HTTPS instead, otherwise you may silence this warning and allow it anyways with '--trusted-host pypi.douban.com'.
  Could not find a version that satisfies the requirement mysql-python (from versions: )
No matching distribution found for mysql-python

解决办法

编辑 vim .pip/pip.conf

1 2	[install] trusted-host = mirrors.aliyun.com