Skip to content

Commit 6a4d4af

Browse files
Update ws api (#7)
1 parent 1c6ddc0 commit 6a4d4af

File tree

11 files changed

+761
-28
lines changed

11 files changed

+761
-28
lines changed

docs/img/additive-tumbling.png

11.9 KB
Loading

docs/img/bullet-icons-line.png

46.7 KB
Loading

docs/img/reactive.png

9.22 KB
Loading

docs/img/time-based-tumbling.png

11.8 KB
Loading

docs/index.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,28 @@
1-
# Overview
1+
![Bullet Icons](../img/bullet-icons-line.png)
22

3-
## Bullet ...
3+
# Bullet:
44

5-
* Is a real-time query engine that lets you run queries on very large data streams
5+
* **Is a real-time query engine for very large data streams**
66

7-
* Does not use a **a persistence layer**. This makes it **light-weight, cheap and fast**
7+
* **Has NO persistence layer**
88

9-
* Is a **look-forward** query system. Queries are submitted first and they operate on data that arrive after the query is submitted
9+
* **Is light-weight, cheap and fast**
1010

11-
* Supports rich queries for filtering and getting **Raw data, Counting Distincts, Distincts, Grouping (Sum, Count, Min, Max, Avg), Distributions, and Top K**
11+
* **Is multi-tenant**
1212

13-
* Is **multi-tenant** and can scale for more queries and/or for more data
13+
* **Is pluggable to any data source**
1414

15-
* Provides a **UI and Web Service** that are also pluggable for a full end-to-end solution to your querying needs
15+
* **Provides a UI and Web Service**
1616

17-
* Has an implementation on [Storm](http://storm.apache.org) currently. There are plans to implement it on other Stream Processors.
17+
* **Can filter raw data or aggregate data**
1818

19-
* Is **pluggable**. Any data source that can be read from Storm can be converted into a standard data container letting you query that data. Data is **typed**
19+
* **Can be run on storm or spark streaming**
2020

21-
* Is used at scale and in production at Yahoo with running 500+ queries simultaneously on 200,000 rps (records per second) and tested up to 2,000,000 rps
21+
* **Is a look-forward query system** - operates on data that arrive after the query is submitted
2222

23-
## How is this useful
23+
* **Is big-data scale-tested** - used in production at Yahoo and tested running 500+ queries simultaneously on up to 2,000,000 rps
24+
25+
# How is this useful
2426

2527
How Bullet is used is largely determined by the data source it consumes. Depending on what kind of data you put Bullet on, the types of queries you run on it and your use-cases will change. As a look-forward query system with no persistence, you will not be able to repeat your queries on the same data. The next time you run your query, it will operate on the different data that arrives after that submission. If this usage pattern is what you need and you are looking for a light-weight system that can tap into your streaming data, then Bullet is for you!
2628

docs/pubsub/kafka.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Kafka PubSub
22

3-
The Kafka implemented of the Bullet PubSub can be used on any Backend and Web Service. It uses [Apache Kafka](https://kafka.apache.org) as the backing PubSub queue and works on all Backends.
3+
The Kafka implementation of the Bullet PubSub can be used on any Backend and Web Service. It uses [Apache Kafka](https://kafka.apache.org) as the backing PubSub queue and works on all Backends.
44

55
## How does it work?
66

docs/pubsub/rest.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# REST PubSub
2+
3+
The REST PubSub implementation is included in bullet-core, and can be launched along with the Web Service. If it is enabled the Web Service will expose two additional REST endpoints, one for reading/writing Bullet queries, and one
4+
for reading/writing results.
5+
6+
## How does it work?
7+
8+
When the Web Service receives a query from a user, it will create a PubSubMessage and write the message to the "query" RESTPubSub endpoint. This PubSubMessage will contain not only the query, but also some metadata, including the
9+
appropriate host/port to which the response should be sent (this is done to allow for multiple Web Services running simultaneously). The query is then stored in memory until the backend does a GET from this endpoint, at which
10+
time the query will be served to the backend, and dropped from the queue in memory.
11+
12+
Once the backed has generated the results of the query, it will wrap those results in PubSubMessage. The backend extracts the URL to send the results to from the metadata and writes the results PubSubMessage to the
13+
"results" REST endpoint with a POST. This result will then be stored in memory until the Web Service does a GET to that endpoint, at which time the Web Service will have the results of the query to send back to the user.
14+
15+
## Setup
16+
17+
To enable the RESTPubSub and expose the two additional necessary REST endpoints, you must enable the setting:
18+
19+
```yaml
20+
bullet.pubsub.builtin.rest.enabled: true
21+
```
22+
23+
...in the Web Service Application.yaml file. This can also be done from the command line when launching the Web Service jar file by adding the command-line option:
24+
25+
```bash
26+
--bullet.pubsub.builtin.rest.enabled=true
27+
```
28+
29+
This will enable the two necessary REST endpoints, the paths for which can be configured in the Application.yaml file with the settings:
30+
31+
```yaml
32+
bullet.pubsub.builtin.rest.query.path: /pubsub/query
33+
bullet.pubsub.builtin.rest.result.path: /pubsub/result
34+
```
35+
36+
### Plug into the Backend
37+
38+
Configure the backend to use the REST PubSub:
39+
40+
```yaml
41+
bullet.pubsub.context.name: "QUERY_PROCESSING"
42+
bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"
43+
44+
bullet.pubsub.rest.connect.timeout.ms: 5000
45+
bullet.pubsub.rest.subscriber.max.uncommitted.messages: 100
46+
bullet.pubsub.rest.result.subscriber.min.wait.ms: 10
47+
bullet.pubsub.rest.query.subscriber.min.wait.ms: 10
48+
bullet.pubsub.rest.query.urls:
49+
- "http://webServiceHostNameA:9901/api/bullet/pubsub/query"
50+
- "http://webServiceHostNameB:9902/api/bullet/pubsub/query"
51+
```
52+
53+
* __bullet.pubsub.context.name: "QUERY_PROCESSING"__ - tells the PubSub that it is running in the backend
54+
* __bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"__ - tells Bullet to use this class for it's PubSub
55+
* __bullet.pubsub.rest.connect.timeout.ms: 5000__ - sets the HTTP connect timeout to a half second
56+
* __bullet.pubsub.rest.subscriber.max.uncommitted.messages: 100__ - this is the maxiumum number of uncommitted messages allowed before blocking
57+
* __bullet.pubsub.rest.query.subscriber.min.wait.ms: 10__ - this setting is used to avoid making an http request too rapidly and overloading the http endpoint. It will force the backend to poll the query endpoint at most once every 10ms.
58+
* __bullet.pubsub.rest.query.urls__ - this should be a list of all the query rest enpoint URLs. If you are only running one Web Service this will only contain one url (the url of your Web Service followed by the full path of the query endpoint).
59+
60+
### Plug into the Web Service
61+
62+
Configure the Web Service to use the REST PubSub:
63+
64+
```yaml
65+
bullet.pubsub.context.name: "QUERY_SUBMISSION"
66+
bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"
67+
68+
bullet.pubsub.rest.connect.timeout.ms: 5000
69+
bullet.pubsub.rest.subscriber.max.uncommitted.messages: 100
70+
bullet.pubsub.rest.result.subscriber.min.wait.ms: 10
71+
bullet.pubsub.rest.query.subscriber.min.wait.ms: 10
72+
bullet.pubsub.rest.result.url: "http://localhost:9901/api/bullet/pubsub/result"
73+
bullet.pubsub.rest.query.urls:
74+
- "http://localhost:9901/api/bullet/pubsub/query"
75+
```
76+
77+
* __bullet.pubsub.context.name: "QUERY_SUBMISSION"__ - tells the PubSub that it is running in the Web Service
78+
* __bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"__ - tells Bullet to use this class for it's PubSub
79+
* __bullet.pubsub.rest.connect.timeout.ms: 5000__ - sets the HTTP connect timeout to a half second
80+
* __bullet.pubsub.rest.subscriber.max.uncommitted.messages: 100__ - this is the maxiumum number of uncommitted messages allowed before blocking
81+
* __bullet.pubsub.rest.query.subscriber.min.wait.ms: 10__ - this setting is used to avoid making an http request too rapidly and overloading the http endpoint. It will force the backend to poll the query endpoint at most once every 10ms.
82+
* __bullet.pubsub.rest.result.url: "http://localhost:9901/api/bullet/pubsub/result"__ - this is the endpoint from which the WebService should read results - it should generally be the hostname of that machine the Web Service is running on (or "localhost").
83+
* __bullet.pubsub.rest.query.urls__ - in the Web Service this setting should contain __exactly one__ url - the url to which queries should be written - it should generally be the hostname of that machine the Web Service is running on (or "localhost").
84+

docs/ws/api.md

Lines changed: 115 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,36 @@
11
# API
22

3-
See the [UI Usage section](../ui/usage.md) for using the UI to build Bullet queries. This section deals with examples of the JSON query format that the API currently exposes (and the UI uses underneath).
3+
This section gives a comprehensive overview of the Web Service API for launching Bullet queries.
44

5-
Bullet queries allow you to filter, project and aggregate data. It lets you fetch raw and aggregated data. Fields inside maps can be accessed using the '.' notation in queries. For example, myMap.key will access the key field inside the myMap map. There is no support for accessing fields inside Lists or inside nested Maps as of yet. Only the entire object can be operated on for now.
5+
* For info on how to use the UI, see the [UI Usage section](../ui/usage.md)
6+
* For examples of specific queries see the [Examples](examples.md) section
67

7-
The three main sections of a Bullet query are:
8+
The main constituents of a Bullet query are:
9+
10+
* __filters__, which determine which records will be consumed by your query
11+
* __projection__, which determines which fields will be projected in the resulting output from Bullet
12+
* __aggregation__, which allows users to aggregate data and perform aggregation operations
13+
* __window__, which can be used to return incremental results on "windowed" data
14+
* __duration__, which determines the maximum duration of the query in milliseconds
15+
16+
Fields inside maps can be accessed using the '.' notation in queries. For example,
17+
18+
`myMap.key`
19+
20+
will access the "key" field inside the "myMap" map. There is no support for accessing fields inside Lists or inside nested Maps as of yet. Only the entire object can be operated on for now.
21+
22+
The main constituents of a Bullet query listed above create the top level fields of the Bullet query:
823
```javascript
924
{
10-
"filters": {},
25+
"filters": [{}, {}, ...],
1126
"projection": {},
1227
"aggregation": {}.
28+
"window": {},
1329
"duration": 20000
1430
}
1531
```
16-
The duration represents how long the query runs for (a window from when you submit it to that many milliseconds into the future).
1732

18-
See the [Filters](#filters), [Projections](#projections) and [Aggregation](#aggregations) sections for their respective specifications. Each of those sections are objects and you will need to be place the entire object in the respective sections above.
33+
We will describe how to specify each of these top-level fields below:
1934

2035
## Filters
2136

@@ -36,7 +51,7 @@ The current logical operators allowed in filters are:
3651
| OR | Any filter must be true. The first true filter evaluated left to right will short-circuit the computation. |
3752
| NOT | Negates the value of the first filter clause. The filter is satisfied iff the value is true. |
3853

39-
The format for a Logical filter is:
54+
The format for a __single__ Logical filter is:
4055

4156
```javascript
4257
{
@@ -52,6 +67,8 @@ The format for a Logical filter is:
5267

5368
Any other type of filter may be provided as a clause in clauses.
5469

70+
Note that the "filter" field in the query is a __list__ of as many filters as you'd like.
71+
5572
### Relational Filters
5673

5774
Relational filters allow you to specify conditions on a field, using a comparison operator and a list of values.
@@ -68,7 +85,7 @@ The current comparisons allowed in filters are:
6885
| > | Greater than any value in values |
6986
| RLIKE | Matches using [Java Regex notation](http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html), any Regex value in values |
7087

71-
These operators are all typed based on the type of the left hand side from the Bullet record. If the elements on the right hand side cannot be
88+
Note: These operators are all typed based on the type of the __left hand side__ from the Bullet record. If the elements on the right hand side cannot be
7289
casted to the types on the LHS, those items will be ignored for the comparison.
7390

7491
The format for a Relational filter is:
@@ -263,6 +280,96 @@ The following attributes are supported for ```TOP K```:
263280

264281
Note that the ```K``` in ```TOP K``` is specified using the ```size``` field in the ```aggregation``` object.
265282

283+
## Window
284+
285+
The "window" field is **optional** and allows you to instruct Bullet to return incremental results. For example you might want to return the COUNT of a field and return that count every 2 seconds.
286+
287+
If "window" is ommitted Bullet will emit only a single result at the very end of the query.
288+
289+
An example window might look like this:
290+
291+
```javascript
292+
"window": { "emit": { "type": "TIME/RECORD", "every": 5000 },
293+
"include": { "type": "TIME/RECORD/ALL", "first": 5000 } },
294+
```
295+
296+
* The __emit__ field is used to specify when a window should be emmitted and the current results sent back to the user
297+
* The __type__ subfield for "emit" can have two values:
298+
* __"TIME"__ specifies that the window will emit after a specific number of milliseconds
299+
* __"RECORD"__ specifies that the window will emit after consuming a specific number of records
300+
* The __every__ subfield for "emit" specifies how many records/milliseconds (depending on "type") will be counted before the window is emmitted
301+
* The __include__ field is used to specify what will be included in the emmitted window
302+
* The __type__ subfield for "include" can have three values:
303+
* __"TIME"__ specifies that the window will include all records seen in a certain time period in the window
304+
* e.g. All records seen in the first 2 seconds of a 10 second window
305+
* __"RECORD"__ specifies that the window will include the first n records, where n is specified in the "first" field below
306+
* __"ALL"__ specifies that the window will include ALL results accumulated since the very beginning of the __query__ (not just this window)
307+
* the __first__ subfield for "include" specifies the number of records/milliseconds at the beginning of this window to include in the emmitted result - it should be ommitted if "type" is "ALL".
308+
309+
**NOTE: Not all windowing types are supported at this time.**
310+
311+
### **Currently Bullet supports the following window types**:
312+
313+
* Time-Based Tumbling Windows
314+
* Additive Tumbling Windows
315+
* Reactive Record-Based Windows
316+
* No Window
317+
318+
Support for more windows will be added in the future.
319+
320+
Each currently supported window type will be described below:
321+
322+
#### **Time-Based Tumbling Windows**
323+
324+
Currently time-based tumbling windows **must** have emit == include. In other words, only the entire window can be emitted, and windows must be adjacent.
325+
326+
![Time-Based Tumbling Windows](../img/time-based-tumbling.png)
327+
328+
The above example windowing would be specified with the window:
329+
330+
```javascript
331+
"window": { "emit": { "type": "TIME", "every": 3000 },
332+
"include": { "type": "TIME", "first": 3000 } },
333+
```
334+
335+
Any aggregation can be done in each window, or the raw records themselves can be returned as specified in the "aggregation" object.
336+
337+
In this example the first window would include 3 records, the second would include 4 records, the third would include 3 records and the fourth would include 2 records.
338+
339+
#### **Additive Tumbling Windows**
340+
341+
Additive tumbling windows emit with the same logic as time-based tumbling windows, but include ALL results from the beginning of the query:
342+
343+
![Additive Tumbling Windows](../img/additive-tumbling.png)
344+
345+
The above example would be specified with the window:
346+
347+
```javascript
348+
"window": { "emit": { "type": "TIME", "every": 3000 },
349+
"include": { "type": "ALL" } },
350+
```
351+
352+
In this example the first window would include 3 records, the second would include 7 records, the third would include 10 records and the fourth would include 12 records.
353+
354+
#### **Sliding "Reactive" Windows**
355+
356+
Sliding windows emit based on the arrival of an event, rather than after a certain period of time. In general sliding windows often do some aggregation on the previous X records, or on all records that arrived in the last X seconds.
357+
Bullet will support this functionality in the future, at this time Bullet only supports **Sliding Windows of size 1**, often referred to as "reactive" windows. It does not support sliding windows with an aggregation at this time.
358+
Effectively this query will simply return every event that matches the filters instantly to the user.
359+
360+
![Reactive Windows](../img/reactive.png)
361+
362+
The above example would be specified with the window:
363+
364+
```javascript
365+
"window": { "emit": { "type": "RECORD", "every": 1 },
366+
"include": { "type": "RECORD", "last": 1 } },
367+
```
368+
369+
#### **No Window**
370+
371+
If the "window" field is optional. If it is ommitted, the query will only emit when the entire query is finished.
372+
266373
## Results
267374

268375
Bullet results are JSON objects with two fields:

0 commit comments

Comments
 (0)