|
2 | 2 | Elasticsearch real-time search and analytics natively integrated with Hadoop.
|
3 | 3 | Supports [Map/Reduce](#mapreduce), [Apache Hive](#apache-hive), and [Apache Spark](#apache-spark).
|
4 | 4 |
|
5 |
| -See [project page](http://www.elastic.co/products/hadoop/) and [documentation](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html) for detailed information. |
| 5 | +See [project page](https://www.elastic.co/elasticsearch/hadoop/) and [documentation](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html) for detailed information. |
6 | 6 |
|
7 | 7 | ## Requirements
|
8 |
| -Elasticsearch (__1.x__ or higher (2.x _highly_ recommended)) cluster accessible through [REST][]. That's it! |
9 |
| -Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set. |
| 8 | +Elasticsearch cluster accessible through [REST][]. That's it! |
| 9 | +Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded andput to use without any dependencies. Simply make it available to your job classpath and you're set. |
10 | 10 | For a certain library, see the dedicated [chapter](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/requirements.html).
|
11 | 11 |
|
12 |
| -ES-Hadoop 6.x and higher are compatible with Elasticsearch __1.X__, __2.X__, __5.X__, and __6.X__ |
13 |
| - |
14 |
| -ES-Hadoop 5.x and higher are compatible with Elasticsearch __1.X__, __2.X__ and __5.X__ |
15 |
| - |
16 |
| -ES-Hadoop 2.2.x and higher are compatible with Elasticsearch __1.X__ and __2.X__ |
17 |
| - |
18 |
| -ES-Hadoop 2.0.x and 2.1.x are compatible with Elasticsearch __1.X__ *only* |
| 12 | +While an effort has been made to keep ES-Hadoop backwards compatible with older versions of Elasticsearch, it is best |
| 13 | +to use the version of ES-Hadoop that is the same as the Elasticsearch version. See the |
| 14 | +[product compatibility support matrix](https://www.elastic.co/support/matrix#matrix_compatibility) for more information. |
19 | 15 |
|
20 | 16 | ## Installation
|
21 | 17 |
|
22 |
| -### Stable Release (currently `8.15.1`) |
23 |
| -Available through any Maven-compatible tool: |
| 18 | +### Stable Release (`9.0.0` used in the examples below) |
| 19 | +Support for Hadoop is available through any Maven-compatible tool: |
24 | 20 |
|
25 | 21 | ```xml
|
26 | 22 | <dependency>
|
27 | 23 | <groupId>org.elasticsearch</groupId>
|
28 | 24 | <artifactId>elasticsearch-hadoop</artifactId>
|
29 |
| - <version>8.15.1</version> |
| 25 | + <version>9.0.0</version> |
30 | 26 | </dependency>
|
31 | 27 | ```
|
32 | 28 | or as a stand-alone [ZIP](http://www.elastic.co/downloads/hadoop).
|
33 | 29 |
|
34 |
| -### Development Snapshot |
35 |
| -Grab the latest nightly build from the [repository](http://oss.sonatype.org/content/repositories/snapshots/org/elasticsearch/elasticsearch-hadoop/) again through Maven: |
36 |
| - |
| 30 | +Spark support depends on the versions of Spark and Scala your cluster uses. For Scala 2.12 and Spark 3.0, 3.1, 3.2, 3.3, or 3.4, use: |
37 | 31 | ```xml
|
38 | 32 | <dependency>
|
39 | 33 | <groupId>org.elasticsearch</groupId>
|
40 |
| - <artifactId>elasticsearch-hadoop</artifactId> |
41 |
| - <version>9.1.0-SNAPSHOT</version> |
| 34 | + <artifactId>elasticsearch-spark-30_2.12</artifactId> |
| 35 | + <version>9.0.0</version> |
42 | 36 | </dependency>
|
43 | 37 | ```
|
44 |
| - |
| 38 | +For Scala 2.13 and Spark 3.2, 3.3, or 3.4, use: |
45 | 39 | ```xml
|
46 |
| -<repositories> |
47 |
| - <repository> |
48 |
| - <id>sonatype-oss</id> |
49 |
| - <url>http://oss.sonatype.org/content/repositories/snapshots</url> |
50 |
| - <snapshots><enabled>true</enabled></snapshots> |
51 |
| - </repository> |
52 |
| -</repositories> |
| 40 | +<dependency> |
| 41 | + <groupId>org.elasticsearch</groupId> |
| 42 | + <artifactId>elasticsearch-spark-30_2.13</artifactId> |
| 43 | + <version>9.0.0</version> |
| 44 | +</dependency> |
53 | 45 | ```
|
54 | 46 |
|
55 |
| -or [build](#building-the-source) the project yourself. |
56 |
| - |
57 |
| -We do build and test the code on _each_ commit. |
58 | 47 |
|
59 | 48 | ### Supported Hadoop Versions
|
60 | 49 |
|
61 |
| -Running against Hadoop 1.x is deprecated in 5.5 and will no longer be tested against in 6.0. |
62 |
| -ES-Hadoop is developed for and tested against Hadoop 2.x and YARN. |
| 50 | +ES-Hadoop is developed for and tested against Hadoop 2.x and 3.x on YARN. |
63 | 51 | More information in this [section](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html).
|
64 | 52 |
|
| 53 | +### Supported Spark Versions |
| 54 | + |
| 55 | +Spark 3.0 through 3.4 are supported. Only Scala 2.12 is supported for Spark 3.0 and 3.1. Both Scala 2.12 and 2.13 |
| 56 | +are supported for Spark 3.2 and higher. |
| 57 | + |
65 | 58 | ## Feedback / Q&A
|
66 |
| -We're interested in your feedback! You can find us on the User [mailing list](https://groups.google.com/forum/?fromgroups#!forum/elasticsearch) - please append `[Hadoop]` to the post subject to filter it out. For more details, see the [community](http://www.elastic.co/community) page. |
| 59 | +We're interested in your feedback! You can find us on the [Elastic forum](https://discuss.elastic.co/). |
67 | 60 |
|
68 | 61 |
|
69 | 62 | ## Online Documentation
|
@@ -96,30 +89,7 @@ For basic, low-level or performance-sensitive environments, ES-Hadoop provides d
|
96 | 89 | (either by bundling the library along - it's ~300kB and there are no-dependencies), using the [DistributedCache][] or by provisioning the cluster manually.
|
97 | 90 | See the [documentation](http://www.elastic.co/guide/en/elasticsearch/hadoop/current/index.html) for more information.
|
98 | 91 |
|
99 |
| -Note that es-hadoop supports both the so-called 'old' and the 'new' API through its `EsInputFormat` and `EsOutputFormat` classes. |
100 |
| - |
101 |
| -### 'Old' (`org.apache.hadoop.mapred`) API |
102 |
| - |
103 |
| -### Reading |
104 |
| -To read data from ES, configure the `EsInputFormat` on your job configuration along with the relevant [properties](#configuration-properties): |
105 |
| -```java |
106 |
| -JobConf conf = new JobConf(); |
107 |
| -conf.setInputFormat(EsInputFormat.class); |
108 |
| -conf.set("es.resource", "radio/artists"); |
109 |
| -conf.set("es.query", "?q=me*"); // replace this with the relevant query |
110 |
| -... |
111 |
| -JobClient.runJob(conf); |
112 |
| -``` |
113 |
| -### Writing |
114 |
| -Same configuration template can be used for writing but using `EsOuputFormat`: |
115 |
| -```java |
116 |
| -JobConf conf = new JobConf(); |
117 |
| -conf.setOutputFormat(EsOutputFormat.class); |
118 |
| -conf.set("es.resource", "radio/artists"); // index or indices used for storing data |
119 |
| -... |
120 |
| -JobClient.runJob(conf); |
121 |
| -``` |
122 |
| -### 'New' (`org.apache.hadoop.mapreduce`) API |
| 92 | +Note that es-hadoop supports the Hadoop API through its `EsInputFormat` and `EsOutputFormat` classes. |
123 | 93 |
|
124 | 94 | ### Reading
|
125 | 95 | ```java
|
@@ -187,8 +157,6 @@ As one can note, currently the reading and writing are treated separately but we
|
187 | 157 | ## [Apache Spark][]
|
188 | 158 | ES-Hadoop provides native (Java and Scala) integration with Spark: for reading a dedicated `RDD` and for writing, methods that work on any `RDD`. Spark SQL is also supported
|
189 | 159 |
|
190 |
| -### Scala |
191 |
| - |
192 | 160 | ### Reading
|
193 | 161 | To read data from ES, create a dedicated `RDD` and specify the query as an argument:
|
194 | 162 |
|
|
0 commit comments