Skip to content

Commit 0eb8b19

Browse files
authored
Merge pull request #342 from src-d/doc-gitbook
New documentation
2 parents f69fe09 + c8926fc commit 0eb8b19

14 files changed

+511
-182
lines changed

.gitbook.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
structure:
2+
readme: README.md
3+
summary: docs/README.md

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ FROM ubuntu:16.04
1515
COPY --from=builder /go/bin/gitbase /bin
1616
RUN mkdir -p /opt/repos
1717

18-
ENV GITBASE_USER=gitbase
18+
ENV GITBASE_USER=root
1919
ENV GITBASE_PASSWORD=""
2020
ENV GITBASE_REPOS=/opt/repos
2121
EXPOSE 3306

README.md

Lines changed: 10 additions & 179 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# gitbase [![GitHub version](https://badge.fury.io/gh/src-d%2Fgitbase.svg)](https://github.com/mcuadros/ofelia/releases) [![Build Status](https://travis-ci.org/src-d/gitbase.svg?branch=master)](https://travis-ci.org/src-d/gitbase) [![codecov](https://codecov.io/gh/src-d/gitbase/branch/master/graph/badge.svg)](https://codecov.io/gh/src-d/gitbase) [![GoDoc](https://godoc.org/gopkg.in/src-d/gitbase.v0?status.svg)](https://godoc.org/gopkg.in/src-d/gitbase.v0) [![Go Report Card](https://goreportcard.com/badge/github.com/src-d/gitbase)](https://goreportcard.com/report/github.com/src-d/gitbase)
1+
# gitbase [![GitHub version](https://badge.fury.io/gh/src-d%2Fgitbase.svg)](https://github.com/src-d/gitbase/releases) [![Build Status](https://travis-ci.org/src-d/gitbase.svg?branch=master)](https://travis-ci.org/src-d/gitbase) [![codecov](https://codecov.io/gh/src-d/gitbase/branch/master/graph/badge.svg)](https://codecov.io/gh/src-d/gitbase) [![GoDoc](https://godoc.org/gopkg.in/src-d/gitbase.v0?status.svg)](https://godoc.org/gopkg.in/src-d/gitbase.v0) [![Go Report Card](https://goreportcard.com/badge/github.com/src-d/gitbase)](https://goreportcard.com/report/github.com/src-d/gitbase)
22

33
**gitbase**, is a SQL database interface to Git repositories.
44

@@ -8,196 +8,27 @@ about the [Universal AST](https://doc.bblf.sh/) of the code itself. gitbase is b
88
gitbase implements the *MySQL* wire protocol, it can be accessed using any MySQL
99
client or library from any language.
1010

11+
[src-d/go-mysql-server](https://github.com/src-d/go-mysql-server) is the SQL engine implementation used by `gitbase`.
12+
1113
## Status
1214

1315
The project is currently in **alpha** stage, meaning it's still lacking performance in a number of cases but we are working hard on getting a performant system able to process thousands of repositories in a single node. Stay tuned!
1416

1517
## Examples
1618

17-
To see the SQL subset currently supported take a look at [this list](https://github.com/src-d/go-mysql-server/blob/0084abf48137b4d72c6f948abfde91a00f3f77f0/SUPPORTED.md) from [src-d/go-mysql-server](https://github.com/src-d/go-mysql-server).
18-
19-
[src-d/go-mysql-server](https://github.com/src-d/go-mysql-server) is the project where the SQL engine used by ***gitbase*** is implemented.
20-
21-
#### Get all the HEAD references from all the repositories
22-
23-
```sql
24-
SELECT * FROM refs WHERE ref_name = 'HEAD'
25-
```
26-
27-
#### Commits that appears in more than one reference
28-
29-
```sql
30-
SELECT * FROM (
31-
SELECT COUNT(c.commit_hash) AS num, c.commit_hash
32-
FROM ref_commits r
33-
INNER JOIN commits c
34-
ON r.commit_hash = c.commit_hash
35-
GROUP BY c.commit_hash
36-
) t WHERE num > 1
37-
```
38-
39-
#### Get the number of blobs per HEAD commit
40-
41-
```sql
42-
SELECT COUNT(c.commit_hash), c.commit_hash
43-
FROM ref_commits as r
44-
INNER JOIN commits c
45-
ON r.ref_name = 'HEAD' AND r.commit_hash = c.commit_hash
46-
INNER JOIN commit_blobs cb
47-
ON cb.commit_hash = c.commit_hash
48-
GROUP BY c.commit_hash
49-
```
50-
51-
#### Get commits per commiter, per month in 2015
52-
53-
```sql
54-
SELECT COUNT(*) as num_commits, month, repo_id, committer_email
55-
FROM (
56-
SELECT
57-
MONTH(committer_when) as month,
58-
r.repository_id as repo_id,
59-
committer_email
60-
FROM ref_commits r
61-
INNER JOIN commits c
62-
ON YEAR(c.committer_when) = 2015 AND r.commit_hash = c.commit_hash
63-
WHERE r.ref_name = 'HEAD'
64-
) as t
65-
GROUP BY committer_email, month, repo_id
66-
```
67-
68-
## Installation
69-
70-
### Prerequisites
19+
You can see some [query examples](/docs/using-gitbase/examples.md) in [gitbase documentation](/docs).
7120

72-
**gitbase** has two optional dependencies that should be running on your system if you're planning on using certain functionality.
21+
## Motivation and scope
7322

74-
- [bblfsh](https://github.com/bblfsh/bblfshd) >= 2.5.0 (only if you're planning to use the `UAST` functionality provided in gitbase).
75-
- [pilosa](https://github.com/pilosa/pilosa) 0.9.0 (only if you're planning on using indexes).
76-
77-
### Installing from binaries
78-
79-
Check the [Release](https://github.com/src-d/gitbase/releases) page to download the gitbase binary.
80-
81-
### Installing from source
82-
83-
Because gitbase uses [bblfsh's client-go](https://github.com/bblfsh/client-go), which uses cgo, you need to install some dependencies by hand instead of just using `go get`.
84-
85-
_Note_: we use `go get -d` so the code is not compiled yet, as it would
86-
fail before `make dependencies` is executed successfully.
87-
88-
```
89-
go get -d github.com/src-d/gitbase
90-
cd $GOPATH/src/github.com/src-d/gitbase
91-
make dependencies
92-
```
93-
94-
## Usage
95-
96-
### Local
97-
```bash
98-
Usage:
99-
gitbase [OPTIONS] <server | version>
100-
101-
Help Options:
102-
-h, --help Show this help message
103-
104-
Available commands:
105-
server Start SQL server.
106-
version Show the version information.
107-
```
108-
109-
You can start a server providing a path which contains multiple git repositories `/path/to/repositories` with this command:
110-
111-
```
112-
$ gitbase server -v -g /path/to/repositories -u gitbase
113-
```
114-
115-
### Docker
116-
117-
You can use the official image from [docker hub](https://hub.docker.com/r/srcd/gitbase/tags/) to quickly run gitbase:
118-
```
119-
docker run --rm --name gitbase -p 3306:3306 -v /my/git/repos:/opt/repos srcd/gitbase:latest
120-
```
121-
122-
If you want to speedup gitbase using indexes, you must run a pilosa container:
123-
```
124-
docker run -it --rm --name pilosa -p 10101:10101 pilosa/pilosa:v0.9.0
125-
```
126-
127-
Then link the gitbase container to the pilosa one:
128-
```
129-
docker run --rm --name gitbase -p 3306:3306 --link pilosa:pilosa -e PILOSA_ENDPOINT="http://pilosa:10101" -v /my/git/repos:/opt/repos srcd/gitbase:latest
130-
```
131-
132-
### Client
133-
A MySQL client is needed to connect to the server. For example:
134-
135-
```bash
136-
$ mysql -q -u root -h 127.0.0.1
137-
MySQL [(none)]> SELECT commit_hash, commit_author_email, commit_author_name FROM commits LIMIT 2;
138-
SELECT commit_hash, commit_author_email, commit_author_name FROM commits LIMIT 2;
139-
+------------------------------------------+---------------------+-----------------------+
140-
| commit_hash | commit_author_email | commit_author_name |
141-
+------------------------------------------+---------------------+-----------------------+
142-
| 003dc36e0067b25333cb5d3a5ccc31fd028a1c83 | [email protected] | Santiago M. Mola |
143-
| 01ace9e4d144aaeb50eb630fed993375609bcf55 | [email protected] | Antonio Navarro Perez |
144-
+------------------------------------------+---------------------+-----------------------+
145-
2 rows in set (0.01 sec)
146-
```
147-
148-
If gitbase is running in a container from the official image, you must use `gitbase` as user:
149-
```
150-
mysql -q -u gitbase -h 127.0.0.1
151-
```
152-
153-
### Environment variables
154-
155-
| Name | Description |
156-
|:---------------------------------|:----------------------------------------------------|
157-
| `BBLFSH_ENDPOINT` | bblfshd endpoint, default "127.0.0.1:9432" |
158-
| `PILOSA_ENDPOINT` | pilosa endpoint, default "http://localhost:10101" |
159-
| `GITBASE_BLOBS_MAX_SIZE` | maximum blob size to return in MiB, default 5 MiB |
160-
| `GITBASE_BLOBS_ALLOW_BINARY` | enable retrieval of binary blobs, default `false` |
161-
| `GITBASE_UNSTABLE_SQUASH_ENABLE` | **UNSTABLE** check *Unstable features* |
162-
| `GITBASE_SKIP_GIT_ERRORS` | do not stop queries on git errors, default disabled |
163-
| `GITBASE_INDEX_DIR` | directory where indexes will be persisted |
23+
gitbase was born to ease the analysis of git repositories and its source code.
16424

165-
## Tables
25+
Also, making it MySQL compatible, we provide the maximum compatibility between languages and existing tools.
16626

167-
You can execute the `SHOW TABLES` statement to get a list of the available tables.
168-
To get all the columns and types of a specific table, you can write `DESCRIBE TABLE [tablename]`.
27+
As a single binary allows use it as a standalone service. The service is able to process local repositories or integrate with existing tools and frameworks (e.g. spark) to make source code analysis on the large scale.
16928

170-
gitbase exposes the following tables:
29+
## Further reading
17130

172-
| Name | Columns |
173-
|:-------------|:------------------------------------------------------------------------------------------------------------------|
174-
| repositories | repository_id |
175-
| remotes | repository_id, remote_name, remote_push_url, remote_fetch_url, remote_push_refspec, remote_fetch_refspec |
176-
| commits | repository_id, commit_hash, commit_author_name, commit_author_email, commit_author_when, committer_name, committer_email, committer_when, commit_message, tree_hash |
177-
| blobs | repository_id, blob_hash, blob_size, blob_content |
178-
| refs | repository_id, ref_name, commit_hash |
179-
| ref_commits | repository_id, ref_name, commit_hash, index |
180-
| tree_entries | repository_id, tree_hash, blob_hash, tree_entry_mode, tree_entry_name |
181-
| references | repository_id, ref_name, commit_hash |
182-
| commit_trees | repository_id, commit_hash, tree_hash |
183-
| commit_blobs | repository_id, commit_hash, blob_hash |
184-
| files | repository_id, file_path, blob_hash, tree_hash, tree_entry_mode, blob_content, blob_size |
185-
186-
## Functions
187-
188-
To make some common tasks easier for the user, there are some functions to interact with the previous mentioned tables:
189-
190-
| Name | Description |
191-
|:-------------|:----------------------------------------------------------------------------------------------------|
192-
|is_remote(reference_name)bool| check if the given reference name is from a remote one |
193-
|is_tag(reference_name)bool| check if the given reference name is a tag |
194-
|language(path, [blob])text| gets the language of a file given its path and the optional content of the file |
195-
|uast(blob, [lang, [xpath]])json_blob| returns an array of UAST nodes as blobs |
196-
|uast_xpath(json_blob, xpath)| performs an XPath query over the given UAST nodes |
197-
198-
## Unstable features
199-
200-
- **Table squashing:** there is an optimization that collects inner joins between tables with a set of supported conditions and converts them into a single node that retrieves the data in chained steps (getting first the commits and then the blobs of every commit instead of joining all commits and all blobs, for example). It can be enabled with the environment variable `GITBASE_UNSTABLE_SQUASH_ENABLE`.
31+
From here, you can directly go to [getting started](/docs/using-gitbase/getting-started.md).
20132

20233
## License
20334

cmd/gitbase/command/server.go

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,9 @@ const (
2323
ServerDescription = "Starts a gitbase server instance"
2424
ServerHelp = ServerDescription + "\n\n" +
2525
"The squashing tables and pushing down join conditions is still a\n" +
26-
"work in progress and unstable,disable by default can be enabled\n" +
26+
"work in progress and unstable, disabled by default. It can be enabled\n" +
2727
"using a not empty value at GITBASE_UNSTABLE_SQUASH_ENABLE env variable.\n\n" +
28-
"By default when gitbase encounters and error in a repository it\n" +
28+
"By default when gitbase encounters an error in a repository it\n" +
2929
"stops the query. With GITBASE_SKIP_GIT_ERRORS variable it won't\n" +
3030
"complain and just skip those rows or repositories."
3131
)

docs/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# gitbase
2+
3+
* [Join the community](join-the-community.md)
4+
5+
### Using gitbase
6+
7+
* [Getting started](using-gitbase/getting-started.md)
8+
* [Configuration](using-gitbase/configuration.md)
9+
* [Schema](using-gitbase/schema.md)
10+
* [Supported syntax](using-gitbase/supported-syntax.md)
11+
* [Functions](using-gitbase/functions.md)
12+
* [Examples](using-gitbase/examples.md)

docs/assets/gitbase-db-diagram.png

142 KB
Loading

docs/join-the-community.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Join the community
2+
3+
## Chat
4+
5+
If you need support, want to contribute or just want to say hi, join us at the [source{d} community Slack](https://join.slack.com/t/sourced-community/shared_invite/enQtMjc4Njk5MzEyNzM2LTFjNzY4NjEwZGEwMzRiNTM4MzRlMzQ4MmIzZjkwZmZlM2NjODUxZmJjNDI1OTcxNDAyMmZlNmFjODZlNTg0YWM). We hang out in the #general channel.
6+
7+
## Contributing
8+
9+
You can start contributing in many ways:
10+
11+
* [Report bugs](/docs/join-the-community.md#reporting-bugs)
12+
* [Request a feature](https://github.com/src-d/gitbase/issues)
13+
* Improve the [documentation](https://github.com/src-d/gitbase/docs)
14+
* Contribute code to [gitbase](https://github.com/src-d/gitbase)
15+
16+
## Reporting bugs
17+
18+
Bugs should be reported through [GitHub Issues](https://github.com/src-d/gitbase/issues).

docs/using-gitbase/configuration.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Configuration
2+
3+
## Environment variables
4+
5+
| Name | Description |
6+
|:---------------------------------|:----------------------------------------------------|
7+
| `BBLFSH_ENDPOINT` | bblfshd endpoint, default "127.0.0.1:9432" |
8+
| `PILOSA_ENDPOINT` | pilosa endpoint, default "http://localhost:10101" |
9+
| `GITBASE_BLOBS_MAX_SIZE` | maximum blob size to return in MiB, default 5 MiB |
10+
| `GITBASE_BLOBS_ALLOW_BINARY` | enable retrieval of binary blobs, default `false` |
11+
| `GITBASE_UNSTABLE_SQUASH_ENABLE` | enable join squash rule to improve query performance **experimental**. This optimization collects inner joins between tables with a set of supported conditions and converts them into a single node that retrieves the data in chained steps (getting first the commits and then the blobs of every commit instead of joining all commits and all blobs, for example).|
12+
| `GITBASE_SKIP_GIT_ERRORS` | do not stop queries on git errors, default disabled |
13+
14+
## Command line arguments
15+
16+
```bash
17+
Please specify one command of: server or version
18+
Usage:
19+
gitbase [OPTIONS] <server | version>
20+
21+
Help Options:
22+
-h, --help Show this help message
23+
24+
Available commands:
25+
server Starts a gitbase server instance
26+
version Show the version information
27+
```
28+
29+
`server` command contains the following options:
30+
31+
```bash
32+
Usage:
33+
gitbase [OPTIONS] server [server-OPTIONS]
34+
35+
Starts a gitbase server instance
36+
37+
The squashing tables and pushing down join conditions is still a
38+
work in progress and unstable, disable by default. It can be enabled
39+
using a not empty value at GITBASE_UNSTABLE_SQUASH_ENABLE env variable.
40+
41+
By default when gitbase encounters an error in a repository it
42+
stops the query. With GITBASE_SKIP_GIT_ERRORS variable it won't
43+
complain and just skip those rows or repositories.
44+
45+
Help Options:
46+
-h, --help Show this help message
47+
48+
[server command options]
49+
-v Activates the verbose mode
50+
-g, --git= Path where the git repositories are located, multiple directories can be defined. Accepts globs.
51+
--siva= Path where the siva repositories are located, multiple directories can be defined. Accepts globs.
52+
-h, --host= Host where the server is going to listen (default: localhost)
53+
-p, --port= Port where the server is going to listen (default: 3306)
54+
-u, --user= User name used for connection (default: root)
55+
-P, --password= Password used for connection
56+
--pilosa= URL to your pilosa server (default: http://localhost:10101)
57+
-i, --index= Directory where the gitbase indexes information will be persisted. (default: /var/lib/gitbase/index)
58+
```

docs/using-gitbase/examples.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Examples
2+
3+
## Get all the repositories where a specific user contributes on HEAD reference
4+
5+
```sql
6+
SELECT refs.repository_id
7+
FROM refs
8+
NATURAL JOIN commits
9+
WHERE commits.commit_author_name = 'Javi Fontan' AND refs.ref_name='HEAD';
10+
```
11+
12+
## Get all the HEAD references from all the repositories
13+
14+
```sql
15+
SELECT * FROM refs WHERE ref_name = 'HEAD'
16+
```
17+
18+
## Commits that appear in more than one reference
19+
20+
```sql
21+
SELECT * FROM (
22+
SELECT COUNT(c.commit_hash) AS num, c.commit_hash
23+
FROM ref_commits r
24+
INNER JOIN commits c
25+
ON r.commit_hash = c.commit_hash
26+
GROUP BY c.commit_hash
27+
) t WHERE num > 1
28+
```
29+
30+
## Get the number of blobs per HEAD commit
31+
32+
```sql
33+
SELECT COUNT(c.commit_hash), c.commit_hash
34+
FROM ref_commits as r
35+
INNER JOIN commits c
36+
ON r.ref_name = 'HEAD' AND r.commit_hash = c.commit_hash
37+
INNER JOIN commit_blobs cb
38+
ON cb.commit_hash = c.commit_hash
39+
GROUP BY c.commit_hash
40+
```
41+
42+
## Get commits per committer, per month in 2015
43+
44+
```sql
45+
SELECT COUNT(*) as num_commits, month, repo_id, committer_email
46+
FROM (
47+
SELECT
48+
MONTH(committer_when) as month,
49+
r.repository_id as repo_id,
50+
committer_email
51+
FROM ref_commits r
52+
INNER JOIN commits c
53+
ON YEAR(c.committer_when) = 2015 AND r.commit_hash = c.commit_hash
54+
WHERE r.ref_name = 'HEAD'
55+
) as t
56+
GROUP BY committer_email, month, repo_id
57+
```

0 commit comments

Comments
 (0)