You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Generate and load ElasticSearch indexes based on JSON Table Schema descriptors.
8
+
Generate and load ElasticSearch indexes based on [Table Schema](http://specs.frictionlessdata.io/table-schema/) descriptors.
9
+
10
+
## Features
11
+
12
+
- implements `tableschema.Storage` interface
10
13
11
14
## Getting Started
12
15
13
16
### Installation
14
17
18
+
The package use semantic versioning. It means that major versions could include breaking changes. It's highly recommended to specify `package` version range in your `setup/requirements` file e.g. `package>=1.0,<2.0`.
19
+
15
20
```bash
16
21
pip install tableschema-elasticsearch
17
22
```
18
23
24
+
### Examples
25
+
26
+
Code examples in this readme requires Python 3.3+ interpreter. You could see even more example in [examples](https://github.com/frictionlessdata/tableschema-spss-py/tree/master/examples) directory.
27
+
28
+
```python
29
+
import elasticsearch
30
+
import jsontableschema_es
31
+
32
+
INDEX_NAME='testing_index'
33
+
34
+
# Connect to Elasticsearch instance running on localhost
35
+
es=elasticsearch.Elasticsearch()
36
+
storage=jsontableschema_es.Storage(es)
37
+
38
+
# List all indexes
39
+
print(list(storage.buckets))
40
+
41
+
# Create a new index
42
+
storage.create('test', [
43
+
('numbers',
44
+
{
45
+
'fields': [
46
+
{
47
+
'name': 'num',
48
+
'type': 'number'
49
+
}
50
+
]
51
+
})
52
+
])
53
+
54
+
# Write data to index
55
+
l=list(storage.write(INDEX_NAME, 'numbers', ({'num':i} for i inrange(1000)), ['num']))
56
+
print(len(l))
57
+
print(l[:10], '...')
58
+
59
+
l=list(storage.write(INDEX_NAME, 'numbers', ({'num':i} for i inrange(500,1500)), ['num']))
60
+
print(len(l))
61
+
print(l[:10], '...')
62
+
63
+
# Read all data from index
64
+
storage=jsontableschema_es.Storage(es)
65
+
print(list(storage.buckets))
66
+
l=list(storage.read(INDEX_NAME))
67
+
print(len(l))
68
+
print(l[:10])
69
+
70
+
```
71
+
72
+
## Documentation
73
+
74
+
The whole public API of this package is described here and follows semantic versioning rules. Everyting outside of this readme are private API and could be changed without any notification on any new version.
# primary_key is a list of field names which will be used to generate document ids
115
+
# primary_key is a list of field names which will be used to generate document ids
51
116
```
52
117
53
118
When creating indexes, we always create an index with a semi-random name and a matching alias that points to it. This allows us to decide whether to re-index documents whenever we're re-creating an index, or to discard the existing records.
54
119
55
-
56
120
### Mappings
57
121
58
122
When creating indexes, the tableschema types are converted to ES types and a mapping is generated for the index.
@@ -66,16 +130,16 @@ Example:
66
130
{
67
131
"fields": [
68
132
{
69
-
"name": "my-number",
133
+
"name": "my-number",
70
134
"type": "number"
71
135
},
72
136
{
73
-
"name": "my-array-of-dates",
137
+
"name": "my-array-of-dates",
74
138
"type": "array",
75
139
"es:itemType": "date"
76
140
},
77
141
{
78
-
"name": "my-person-object",
142
+
"name": "my-person-object",
79
143
"type": "object",
80
144
"es:schema": {
81
145
"fields": [
@@ -87,7 +151,7 @@ Example:
87
151
}
88
152
},
89
153
{
90
-
"name": "my-library",
154
+
"name": "my-library",
91
155
"type": "array",
92
156
"es:itemType": "object",
93
157
"es:schema": {
@@ -99,36 +163,62 @@ Example:
99
163
}
100
164
},
101
165
{
102
-
"name": "my-user-provded-object",
166
+
"name": "my-user-provded-object",
103
167
"type": "object",
104
168
"es:enabled": false
105
-
}
169
+
}
106
170
]
107
171
}
108
172
```
109
173
110
174
#### Custom mappings
175
+
111
176
By providing a custom mapping generator class (via `mapping_generator_cls`), inheriting from the MappingGenerator class you should be able
112
177
178
+
## Contributing
113
179
114
-
### Drivers
180
+
The project follows the [Open Knowledge International coding standards](https://github.com/okfn/coding-standards).
115
181
116
-
`elasticsearch-py` is used to access the ElasticSearch interface - [docs](https://elasticsearch-py.readthedocs.io/en/master/).
182
+
Recommended way to get started is to create and activate a project virtual environment.
183
+
To install package and development dependencies into active environment:
For testing `tox` configured in `tox.ini` is used.
206
+
It's already installed into your environment and could be used separately with more fine-grained control as described in documentation - https://testrun.org/tox/latest/.
207
+
208
+
For example to check subset of tests against Python 2 environment with increased verbosity.
209
+
All positional arguments and options after `--` will be passed to `py.test`:
210
+
211
+
```bash
212
+
tox -e py27 -- -v tests/<path>
213
+
```
214
+
215
+
Under the hood `tox` uses `pytest` configured in `pytest.ini`, `coverage`
216
+
and `mock` packages. This packages are available only in tox envionments.
217
+
218
+
## Changelog
129
219
130
-
Please read the contribution guideline:
220
+
Here described only breaking and the most important changes. The full changelog and documentation for all released versions could be found in nicely formatted [commit history](https://github.com/frictionlessdata/tableschema-elasticsearch-py/commits/master).
0 commit comments