Skip to content

Commit 3e13a15

Browse files
Create sql.md
1 parent f0f51a0 commit 3e13a15

File tree

1 file changed

+39
-0
lines changed

1 file changed

+39
-0
lines changed

docs/sql.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# sql Function
2+
Submits *Structured Query Language* (SQL), *Data Manipulation Language* (DML) and *Data Definition Language* (DDL) statements to Apache Spark.
3+
4+
# Arguments
5+
- `session:SparkSession`: the SparkSession. See SparkSession help for instructions on how to create in Julia.
6+
- `sqlText::String`: the DDL, DML or SQL statements.
7+
8+
# DDL Supported formats:
9+
- File formats including: CSV, JSON, arrow, parquet
10+
- Data Lakes including: Hive, ORC, Avro
11+
- Data Lake Houses: Delta Lake, Apache Iceberg.
12+
- Cloud Object Stores: S3, Azure Blob Storage, Swift Object.
13+
14+
# Examples
15+
16+
## CSV file example:
17+
Comma Separated Value (CSV) format.
18+
```
19+
stmt = sql(session, "SELECT * FROM CSV.`/pathToFile/fileName.csv`;")
20+
```
21+
## Parquet file example:
22+
Apache Parquet format.
23+
```
24+
stmt = sql(session, "SELECT * FROM PARQUET.`/pathToFile/fileName.parquet`;")
25+
```
26+
## Delta Lake Example:
27+
Delta Lake is an open-source storage layer for Spark. Delta Lake offers:
28+
29+
ACID transactions on Spark: Serializable isolation levels ensure that readers never see inconsistent data.
30+
Scalable metadata handling: Leverages Spark’s distributed processing power to handle all the metadata for petabyte-scale tables with billions of files at ease.
31+
32+
To use Delta Lake you must add the Delta Lake jar to your Spark jars folder.
33+
34+
Example shows create table (DDL), insert (DML) and select statements (SQL) using Delta Lake and SparkSQL:
35+
```
36+
sql(session, "CREATE DATABASE demo;")
37+
sql(session, "USE demo;")
38+
sql(session, "CREATE TABLE tb(col STRING) USING DELTA;" )
39+
```

0 commit comments

Comments
 (0)