elasticsearch-mapping学习

Posted on 2016-12-04 Edited on 2020-09-16 In linux , elastic

它类似于静态语言中的数据类型声明，比如声明一个字段为String，以后这个变量都只能存储String类型的数据。同样的，一个number类型的 mapping 字段只能存储number类型的数据。

如何理解elastic中各参数的定义

index —-> database
type —–> table
mapping —-> 表结构

默认情况不需要自定义mapping，当新的type或者field引入时，Elasticsearch会自动创建并且注册有合理的默认值的mapping，只有要覆盖默认值时才必须要提供mapping定义。

mapping字段

自定义Mapping

下面是一个简单的Mapping定义：

curl -XPUT 'http://localhost:9200/test-index' -d '
{
    "settings": {
        "number_of_shards": 3,    //设置分片
        "number_of_replicas": 1   //设置副本集
    },
  "mappings": {
    "user": {   // 对应type名
      "_all":       { "enabled": false  },
      "properties": {  //指定每个字段的映射类型或属性。
        "title":    { "type": "string"  },
        "body":     { "type": "string"  },
        "user_id":  {
          "type":   "string",
          "index":  "not_analyzed"
        },
        "created":  {
          "type":   "date",
          "format": "strict_date_optional_time||epoch_millis"
        }
      }
    }

其中login_log是type（相当于关系数据库中的表），在login_log中我们定义了name、age、about、ip、last_time这6个列。

type字段

type字段用来规定字段的数据类型。
Elasticsearch支持以下数据类型：

文本: string
数字: byte, short, integer, long
浮点数: float, double
布尔值: boolean
Date: date

index字段

index 属性控制string如何被索引，它有三个可选值:

analyzed：First analyze the string, then index it. In other words, index this field as full text.
not_analyzed：Index this field, so it is searchable, but index the value exactly as specified. Do not analyze it.
no：Don’t index this field at all. This field will not be
searchable.

对于string类型的filed index 默认值是： analyzed.对于URL这些不需要分词的字段，我们可以将它设置为：not_analyzed。

analyzer 字段

analyzer 字段用来指定改字段用什么分词器去分词。
elastic默认的分词是：standard analyzer 。
elastic其他分词器：whitespace, simple, or english 。
我们也可以自己安装指定的分词器并指定。

1 2	{ "name": { "type": "string", "analyzer": "ik" } \\ 指定使用ik分词器对name字段进行分词。

增加mapping字段

将写好的mapping字段保存到mapping.json（名字随便起）文件中。

1	POST 'http://192.168.56.12:9200/test-index' --d @mapping.json

更新mapping字段

将新的mapping字段保存到mapping-1.json（名字随便起）文件中。

1	POST 'http://192.168.56.12:9200/test-index' --d @mapping-1.json

查看mapping字段

curl -XGET 'http://192.168.56.12:9200/test-index/user/_mapping?pretty'
##字段解释
- XGET method类型
- http://192.168.56.12:9200 elastic的搜索端口
- test-index  索引名称。
- user 索引的type类型。
- _mapping?pretty pretty以人类可读的形式显示。

mapping补充

可以修改的项：

增加新的类型定义
增加新的字段
增加新的分析器

不允许修改的项：

更改字段类型(比如文本改为数字)
更改存储为不存储，反之亦然
更改索引属性的值
更改已索引文档的分析器

线上使用mapping字段展示

这是一个logstash的mapping字段

{
    "logstash": {
        "order": 0,
        "template": "logstash-*-*",
        "settings": {  //配置索引的参数
            "index.refresh_interval": "30s",
            "index.translog.flush_threshold_ops": "50000"
        },
        "mappings": { // 必须字段
            "_default_": { //定义默认参数
                "dynamic_templates": [  // 动态字段定义
                    {
                        "message_field": {
                            "mapping": {
                                "index": "analyzed",
                                "omit_norms": true,
                                "type": "string",
                                "fields": {
                                    "raw": {
                                        "ignore_above": 256, //只保留256个字符
                                        "index": "not_analyzed", //不分词
                                        "type": "string",  // 字段类型string
                                        "doc_values": true
                                    }
                                }
                            },
                            "match_mapping_type": "string",  //匹配哪些字段
                            "match": "message" // 匹配哪些名称
                        }
                    },
                    {
                        "string_fields": {
                            "mapping": {
                                "index": "analyzed",
                                "omit_norms": true,
                                "type": "string",
                                "fields": {
                                    "raw": {
                                        "ignore_above": 256,
                                        "index": "not_analyzed",
                                        "type": "string",
                                        "doc_values": true
                                    }
                                }
                            },
                            "match_mapping_type": "string",
                            "match": "*"
                        }
                    }
                ],
                "_all": {
                    "enabled": false
                },
                "properties": {
                    "geoip": {
                        "path": "full",
                        "dynamic": true,
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "geo_point"
                            }
                        }
                    },
                    "@timestamp": {
                        "format": "dateOptionalTime",
                        "index": "not_analyzed",
                        "type": "date",
                        "doc_values": true
                    },
                    "@version": {
                        "index": "not_analyzed",
                        "type": "string"
                    }
                }
            }
        },
        "aliases": {}
    }
}

动态字段

格式如下

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [ // 定义这是动态字段
        {
          "integers": { //名称
            "match_mapping_type": "long", //匹配规则
            "mapping": {
              "type": "integer"}}},
        {
          "strings": { //名称
            "match_mapping_type": "string", //匹配规则
            "mapping": {
              "type": "string",
              "fields": {  //过滤出一个子字段
                "raw": { //名字叫raw，线上会有两个字段name,name.raw ,raw 可以更好成别的字符串。
                  "type":  "string",
                  "index": "not_analyzed",
                  "ignore_above": 256
                }}}}}]}}}

match、unmatch和match_mapping_type

符合 match 的规则会被匹配。
符合 unmatch 的规则会被忽略。
match_mapping_type 匹配类型。

1
2
3

"match_mapping_type": "string",
"match":   "long_*",
"unmatch": "*_text",

match_pattern

match_pathern 正则匹配。

1 2	"match_pattern": "regex", "match": "^profit_\d+$"

path_match and path_unmatch

path_match 路径匹配

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        {
          "full_name": {
            "path_match":   "name.*",
            "path_unmatch": "*.middle",
            "mapping": {
              "type":       "string",
              "copy_to":    "full_name"
                }}}]}}}

{name} and {dynamic_type}

可以使用的变量

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        {
          "named_analyzers": {
            "match_mapping_type": "string",
            "match": "*",
            "mapping": {
              "type": "string",
              "analyzer": "{name}"
            }
          }
        },
        {
          "no_doc_values": {
            "match_mapping_type":"*",
            "mapping": {
              "type": "{dynamic_type}",
              "doc_values": false
                }}}]}}}