|
| 1 | +# sql Function |
| 2 | +Submits *Structured Query Language* (SQL), *Data Manipulation Language* (DML) and *Data Definition Language* (DDL) statements to Apache Spark. |
| 3 | + |
| 4 | +# Arguments |
| 5 | +- `session:SparkSession`: the SparkSession. See SparkSession help for instructions on how to create in Julia. |
| 6 | +- `sqlText::String`: the DDL, DML or SQL statements. |
| 7 | + |
| 8 | +# DDL Supported formats: |
| 9 | +- File formats including: CSV, JSON, arrow, parquet |
| 10 | +- Data Lakes including: Hive, ORC, Avro |
| 11 | +- Data Lake Houses: Delta Lake, Apache Iceberg. |
| 12 | +- Cloud Object Stores: S3, Azure Blob Storage, Swift Object. |
| 13 | + |
| 14 | +# Examples |
| 15 | + |
| 16 | +## CSV file example: |
| 17 | +Comma Separated Value (CSV) format. |
| 18 | +``` |
| 19 | +stmt = sql(session, "SELECT * FROM CSV.`/pathToFile/fileName.csv`;") |
| 20 | +``` |
| 21 | +## Parquet file example: |
| 22 | +Apache Parquet format. |
| 23 | +``` |
| 24 | +stmt = sql(session, "SELECT * FROM PARQUET.`/pathToFile/fileName.parquet`;") |
| 25 | +``` |
| 26 | +## Delta Lake Example: |
| 27 | +Delta Lake is an open-source storage layer for Spark. Delta Lake offers: |
| 28 | + |
| 29 | +ACID transactions on Spark: Serializable isolation levels ensure that readers never see inconsistent data. |
| 30 | +Scalable metadata handling: Leverages Spark’s distributed processing power to handle all the metadata for petabyte-scale tables with billions of files at ease. |
| 31 | + |
| 32 | +To use Delta Lake you must add the Delta Lake jar to your Spark jars folder. |
| 33 | + |
| 34 | +Example shows create table (DDL), insert (DML) and select statements (SQL) using Delta Lake and SparkSQL: |
| 35 | +``` |
| 36 | +sql(session, "CREATE DATABASE demo;") |
| 37 | +sql(session, "USE demo;") |
| 38 | +sql(session, "CREATE TABLE tb(col STRING) USING DELTA;" ) |
| 39 | +``` |
0 commit comments