From 1b042456f3fdc92efac662378be1d4ad93e9e70d Mon Sep 17 00:00:00 2001 From: Andi Skrgat Date: Mon, 27 Oct 2025 08:50:20 +0100 Subject: [PATCH 1/6] docs: Describe LOAD PARQUET clause usage --- pages/data-migration/parquet.mdx | 240 +++++++++++++++++++++++++++++++ 1 file changed, 240 insertions(+) create mode 100644 pages/data-migration/parquet.mdx diff --git a/pages/data-migration/parquet.mdx b/pages/data-migration/parquet.mdx new file mode 100644 index 000000000..080612979 --- /dev/null +++ b/pages/data-migration/parquet.mdx @@ -0,0 +1,240 @@ +--- +title: Import data from Parquet files +description: Leverage Parquet files in Memgraph operations. Our detailed guide simplifies the process for an enhanced graph computing journey. +--- + +import { Callout } from 'nextra/components' +import { Steps } from 'nextra/components' +import { Tabs } from 'nextra/components' + +# Import data from Parquet file + +The data from Parquet files can be imported using the [`LOAD PARQUET` Cypher clause](#load-parquet-cypher-clause) from the local disk +and from the s3. + +## `LOAD PARQUET` Cypher clause + +The `LOAD PARQUET` clause uses a background thread that reads column batches, assembles batch of 64K rows and puts it on the queue from +where the main thread pulls the data. The main thread then reads row by row from the queue, binds the contents of the parsed row to the +specified variable, populates the database if it is empty or appends new data to an existing dataset. + +### `LOAD PARQUET` clause syntax + + + +The syntax of the `LOAD PARQUET` clause is: + +```cypher +LOAD PARQUET FROM AS +``` + +- TODO: (andi) Disk? +- TODO: (andi) Measure the effect of IN_MEMORY_ANALYTICAL +- TODO: (andi) Measure and/or describe USING PERIODIC COMMIT IMPROVEMENTS +- TODO: (andi) S3 authentication +- TODO: (andi) Describe how are null values handled +- TODO: (andi) Here describe whether it is s3 or not +- `` is a string of the location of the Parquet file.
Without a + s3:// prefix, it refers to a path on the local and with s3:// prefix, it pulls the file with specified URI from the S3. + There are no restrictions on where in + your file system the file can be located, as long as the path is valid (i.e., + the file exists). If you are using Docker to run Memgraph, you will need to + [copy the files from your local directory into + Docker](/getting-started/first-steps-with-docker#copy-files-from-and-to-a-docker-container) + container where Memgraph can access them.
+ +* `` is a symbolic name representing the variable to which the + contents of the parsed row will be bound to, enabling access to the row + contents later in the query. The variable doesn't have to be used in any + subsequent clause. + +### `LOAD PARQUET` clause specificities + +When using the `LOAD PARQUET` clause please keep in mind: + +- The parser parses the values in their appropriate type so you should get the same type as in the Parquet file. Types `BOOL`, `INT8`, `INT16`, `INT32`, `INT64`, `UINT8`, `UINT16`, `UINT32`, `UINT64`, + `HALF_FLOAT`, `FLOAT`, `DOUBLE`, `STRING`, `LARGE_STRING`, `STRING_VIEW`, `DATE32`, `DATE64`, `TIME32`, `TIME64`, `TIMESTAMP`, `DURATION`, `DECIMAL128`, `DECIMAL256`, `BINARY`, `LARGE_BINARY`, `FIXED_SIZE_BINARY`, + `LIST` and `MAP` are supported. Unsupported types will be saved as string in Memgraph. + +- **The `LOAD PARQUET` clause is not a standalone clause**, meaning a valid query + must contain at least one more clause, for example: + + ```cypher + LOAD PARQUET FROM "/people.parquet" AS row + CREATE (p:People) SET p += row; + ``` + + In this regard, the following query will throw an exception: + + ```cypher + LOAD PARQUET FROM "/file.parquet" AS row; + ``` + + **Adding a `MATCH` or `MERGE` clause before LOAD PARQUET** allows you to match certain + entities in the graph before running LOAD PARQUET, optimizing the process as + matched entities do not need to be searched for every row in the PARQUET file. + + But, the `MATCH` or `MERGE` clause can be used prior the `LOAD PARQUET` clause only + if the clause returns only one row. Returning multiple rows before calling the + `LOAD PARQUET` clause will cause a Memgraph runtime error. + +- **The `LOAD PARQUET` clause can be used at most once per query**, so queries like + the one below will throw an exception: + + ```cypher + LOAD PARQUET FROM "/x.parquet" AS x + LOAD PARQUET FROM "/y.parquet" AS y + CREATE (n:A {p1 : x, p2 : y}); + ``` + +### Increase import speed + +The `LOAD PARQUET` clause will create relationships much faster and consequently +speed up data import if you [create indexes](/fundamentals/indexes) on nodes or +node properties once you import them: + +```cypher + CREATE INDEX ON :Node(id); +``` + +If the LOAD PARQUET clause is merging data instead of creating it, create indexes +before running the LOAD PARQUET clause. + +TODO: (andi) Check that +You can also speed up import if you switch Memgraph to [**analytical storage +mode**](/fundamentals/storage-memory-usage#storage-modes). In the analytical +storage mode there are no ACID guarantees besides manually created snapshots but +it does **increase the import speed up to 6 times with 6 times less memory +consumption**. After import you can switch the storage mode back to +transactional and enable ACID guarantees. + +You can switch between modes within the session using the following query: + +```cypher +STORAGE MODE IN_MEMORY_{TRANSACTIONAL|ANALYTICAL}; +``` + +If you use `IN_MEMORY_ANALYTICAL` mode and have nodes and relationships stored in + separate PARQUET files, you can run multiple concurrent `LOAD PARQUET` queries to import data even faster. +In order to achieve the best import performance, split your nodes and relationships +files into smaller files and run multiple `LOAD PARQUET` queries in parallel. +The key is to run all `LOAD PARQUET` queries, which create nodes first. After that, run +all `LOAD PARQUET` queries that create relationships. + + +### Import multiple Parquet files with distinct graph objects + +In this example, the data is split across four files, each file contains nodes +of a single label or relationships of a single type. + + + + {

Download the files

} + + - [`people_nodes.parquet`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/people_nodes.parquet) is used to create nodes labeled `:Person`.
The file contains the following data: + ```parquet + id,name,age,city + 100,Daniel,30,London + 101,Alex,15,Paris + 102,Sarah,17,London + 103,Mia,25,Zagreb + 104,Lucy,21,Paris + ``` +- [`restaurants_nodes.parquet`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/restaurants_nodes.parquet) is used to create nodes labeled `:Restaurants`.
The file contains the following data: + ```parquet + id,name,menu + 200,Mc Donalds,Fries;BigMac;McChicken;Apple Pie + 201,KFC,Fried Chicken;Fries;Chicken Bucket + 202,Subway,Ham Sandwich;Turkey Sandwich;Foot-long + 203,Dominos,Pepperoni Pizza;Double Dish Pizza;Cheese filled Crust + ``` + +- [`people_relationships.parquet`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/people_relationships.parquet) is used to connect people with the `:IS_FRIENDS_WITH` relationship.
The file contains the following data: + ```parquet + first_person,second_person,met_in + 100,102,2014 + 103,101,2021 + 102,103,2005 + 101,104,2005 + 104,100,2018 + 101,102,2017 + 100,103,2001 + ``` +- [`restaurants_relationships.parquet`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/restaurants_relationships.parquet) is used to connect people with restaurants using the `:ATE_AT` relationship.
The file contains the following data: + ```parquet + PERSON_ID,REST_ID,liked + 100,200,true + 103,201,false + 104,200,true + 101,202,false + 101,203,false + 101,200,true + 102,201,true + ``` + + {

Check the location of the Parquet files

} + If you are working with Docker, [copy the files from your local directory into + the Docker container](/getting-started/first-steps-with-docker#copy-files-from-and-to-a-docker-container) + so that Memgraph can access them. + + {

Import nodes

} + + Each row will be parsed as a map, and the + fields can be accessed using the property lookup syntax (e.g. `id: row.id`). + + The following query will load row by row from the file, and create a new node + for each row with properties based on the parsed row values: + + ```cypher + LOAD PARQUET FROM "/path-to/people_nodes_wh.parquet" AS row + CREATE (n:Person {id: row.id, name: row.name, age: row.age, city: row.city}); + ``` + + In the same manner, the following query will create new nodes for each restaurant: + + ```cypher + LOAD PARQUET FROM "/path-to/restaurants_nodes.parquet" AS row + CREATE (n:Restaurant {id: row.id, name: row.name, menu: row.menu}); + ``` + + {

Create indexes

} + + Creating an [index](/fundamentals/indexes) on a property used to connect nodes + with relationships, in this case, the `id` property of the `:Person` nodes, + will speed up the import of relationships, especially with large datasets: + + ```cypher + CREATE INDEX ON :Person(id); + ``` + + {

Import relationships

} + The following query will create relationships between the people nodes: + + ```cypher + LOAD PARQUET FROM "/path-to/people_relationships.parquet" AS row + MATCH (p1:Person {id: row.first_person}) + MATCH (p2:Person {id: row.second_person}) + CREATE (p1)-[f:IS_FRIENDS_WITH]->(p2) + SET f.met_in = row.met_in; + ``` + + The following query will create relationships between people and restaurants where they ate: + + ```cypher + LOAD PARQUET FROM "/path-to/restaurants_relationships.parquet" AS row + MATCH (p1:Person {id: row.PERSON_ID}) + MATCH (re:Restaurant {id: row.REST_ID}) + CREATE (p1)-[ate:ATE_AT]->(re) + SET ate.liked = ToBoolean(row.liked); + ``` + + {

Final result

} + Run the following query to see how the imported data looks as a graph: + + ``` + MATCH p=()-[]-() RETURN p; + ``` + + ![](/pages/data-migration/csv/load_csv_restaurants_relationships.png) + +
From b901e76e543f1e103f19af6c4ac8f2a67f3ca7c4 Mon Sep 17 00:00:00 2001 From: Andi Skrgat Date: Mon, 27 Oct 2025 09:10:49 +0100 Subject: [PATCH 2/6] docs: Add more details --- pages/data-migration.mdx | 9 +++++++-- pages/data-migration/_meta.ts | 1 + .../role-based-access-control.mdx | 2 +- pages/help-center/faq.mdx | 4 ++-- pages/index.mdx | 6 +++++- 5 files changed, 16 insertions(+), 6 deletions(-) diff --git a/pages/data-migration.mdx b/pages/data-migration.mdx index c3478a1b4..911a2102f 100644 --- a/pages/data-migration.mdx +++ b/pages/data-migration.mdx @@ -15,7 +15,7 @@ instance. Whether your data is structured in files, relational databases, or other graph databases, Memgraph provides the flexibility to integrate and analyze your data efficiently. -Memgraph supports file system imports like CSV files, offering efficient and +Memgraph supports file system imports like Parquet and CSV files, offering efficient and structured data ingestion. **However, if you want to migrate directly from another data source, you can use the [`migrate` module](/advanced-algorithms/available-algorithms/migrate)** from Memgraph MAGE @@ -31,6 +31,11 @@ In order to learn all the pre-requisites for importing data into Memgraph, check ## File types +### Parquet files + +Parquet files can be imported efficiently from the local disk and from s3:// using the +[LOAD PARQUET clause](/querying/claused/load-parquet). + ### CSV files CSV files provide a simple and efficient way to import tabular data into Memgraph @@ -262,4 +267,4 @@ nonsense or sales pitch, just tech. /> - \ No newline at end of file + diff --git a/pages/data-migration/_meta.ts b/pages/data-migration/_meta.ts index 454863979..fc73210b5 100644 --- a/pages/data-migration/_meta.ts +++ b/pages/data-migration/_meta.ts @@ -1,6 +1,7 @@ export default { "best-practices": "Best practices", "csv": "CSV", + "parquet": "PARQUET", "json": "JSON", "cypherl": "CYPHERL", "migrate-from-neo4j": "Migrate from Neo4j", diff --git a/pages/database-management/authentication-and-authorization/role-based-access-control.mdx b/pages/database-management/authentication-and-authorization/role-based-access-control.mdx index e83499792..c4519e211 100644 --- a/pages/database-management/authentication-and-authorization/role-based-access-control.mdx +++ b/pages/database-management/authentication-and-authorization/role-based-access-control.mdx @@ -159,7 +159,7 @@ of the following commands: | Privilege to enforce [constraints](/fundamentals/constraints). | `CONSTRAINT` | | Privilege to [dump the database](/configuration/data-durability-and-backup#database-dump).| `DUMP` | | Privilege to use [replication](/clustering/replication) queries. | `REPLICATION` | -| Privilege to access files in queries, for example, when using `LOAD CSV` clause. | `READ_FILE` | +| Privilege to access files in queries, for example, when using `LOAD CSV` and `LOAD PARQUET` clauses. | `READ_FILE` | | Privilege to manage [durability files](/configuration/data-durability-and-backup#database-dump). | `DURABILITY` | | Privilege to try and [free memory](/fundamentals/storage-memory-usage#deallocating-memory). | `FREE_MEMORY` | | Privilege to use [trigger queries](/fundamentals/triggers). | `TRIGGER` | diff --git a/pages/help-center/faq.mdx b/pages/help-center/faq.mdx index 943163c43..70e676799 100644 --- a/pages/help-center/faq.mdx +++ b/pages/help-center/faq.mdx @@ -226,11 +226,11 @@ You can migrate from [MySQL](/data-migration/migrate-from-rdbms) or ### What file formats does Memgraph support for import? -You can import data from [CSV](/data-migration/csv), +You can import data from [CSV](/data-migration/csv), [PARQUET](/data-migration/parquet) [JSON](/data-migration/json) or [CYPHERL](/data-migration/cypherl) files. CSV files can be imported in on-premise instances using the [LOAD CSV -clause](/data-migration/csv), and JSON files can be imported using a +clause](/data-migration/csv), PARQUET files can be imported using the [LOAD PARQUET](/data-migration/parquet) and JSON files can be imported using a [json_util](/advanced-algorithms/available-algorithms/json_util) module from the MAGE library. On a Cloud instance, data from CSV and JSON files can be imported only from a remote address. diff --git a/pages/index.mdx b/pages/index.mdx index f01b5d965..d04ef148d 100644 --- a/pages/index.mdx +++ b/pages/index.mdx @@ -165,6 +165,10 @@ JSON files, and import data using queries within a CYPHERL file. title="JSON" href="/data-migration/json" /> + - \ No newline at end of file + From cf21b1b5f1fec05332cec1abd362bd8645089297 Mon Sep 17 00:00:00 2001 From: Andi Skrgat Date: Mon, 27 Oct 2025 14:42:04 +0100 Subject: [PATCH 3/6] docs: Add details about LoadParquet clause --- pages/data-migration/best-practices.mdx | 2 +- pages/data-migration/parquet.mdx | 37 ++++++++++++++++++------- pages/help-center/faq.mdx | 6 ++-- pages/querying/query-plan.mdx | 1 + 4 files changed, 32 insertions(+), 14 deletions(-) diff --git a/pages/data-migration/best-practices.mdx b/pages/data-migration/best-practices.mdx index 2d8e742a9..c5384b82a 100644 --- a/pages/data-migration/best-practices.mdx +++ b/pages/data-migration/best-practices.mdx @@ -572,4 +572,4 @@ For more information about `Delta` objects, check the information on the [IN_MEMORY_TRANSACTIONAL storage mode](/fundamentals/storage-memory-usage#in-memory-transactional-storage-mode-default). - \ No newline at end of file + diff --git a/pages/data-migration/parquet.mdx b/pages/data-migration/parquet.mdx index 080612979..be20816cd 100644 --- a/pages/data-migration/parquet.mdx +++ b/pages/data-migration/parquet.mdx @@ -25,17 +25,16 @@ specified variable, populates the database if it is empty or appends new data to The syntax of the `LOAD PARQUET` clause is: ```cypher -LOAD PARQUET FROM AS +LOAD PARQUET FROM ( WITH CONFIG configs=configMap ) ? AS ``` +- TODO: (andi) config_map = {'aws_region': 'region', 'aws_access_key': 'acc_key', 'aws_secret_key': 'secret_key', 'aws_endpoint_url': 'endpoint_url'} - TODO: (andi) Disk? - TODO: (andi) Measure the effect of IN_MEMORY_ANALYTICAL - TODO: (andi) Measure and/or describe USING PERIODIC COMMIT IMPROVEMENTS -- TODO: (andi) S3 authentication - TODO: (andi) Describe how are null values handled -- TODO: (andi) Here describe whether it is s3 or not - `` is a string of the location of the Parquet file.
Without a - s3:// prefix, it refers to a path on the local and with s3:// prefix, it pulls the file with specified URI from the S3. + s3:// prefix, it refers to a path on the local and with s3:// prefix, it pulls the file with specified URI from the S3-compatible storage. There are no restrictions on where in your file system the file can be located, as long as the path is valid (i.e., the file exists). If you are using Docker to run Memgraph, you will need to @@ -43,7 +42,12 @@ LOAD PARQUET FROM AS Docker](/getting-started/first-steps-with-docker#copy-files-from-and-to-a-docker-container) container where Memgraph can access them.
-* `` is a symbolic name representing the variable to which the +- `` Represents an optional configuration map through which you can specify configuration options: `aws_region`, `aws_access_key`, `aws_secret_key` and `aws_endpoint_url`. + - ``: The region in which your S3 service is being located + - ``: Access key used to connect to S3 service + - ``: Secret key used to connect S3 service + - `: Optional configuration parameter. Can be used to set the URL of the S3 compatible storage. +- `` is a symbolic name representing the variable to which the contents of the parsed row will be bound to, enabling access to the row contents later in the query. The variable doesn't have to be used in any subsequent clause. @@ -56,6 +60,10 @@ When using the `LOAD PARQUET` clause please keep in mind: `HALF_FLOAT`, `FLOAT`, `DOUBLE`, `STRING`, `LARGE_STRING`, `STRING_VIEW`, `DATE32`, `DATE64`, `TIME32`, `TIME64`, `TIMESTAMP`, `DURATION`, `DECIMAL128`, `DECIMAL256`, `BINARY`, `LARGE_BINARY`, `FIXED_SIZE_BINARY`, `LIST` and `MAP` are supported. Unsupported types will be saved as string in Memgraph. +- Authentication parameters (`aws_region`, `aws_access_key`, `aws_secret_key` and `aws_endpoint_url`) can be provided in the `LOAD PARQUET` query using WITH CONFIG construct, through environment variables + (`AWS_REGION`, `AWS_ACCESS_KEY`, `AWS_SECRET_KEY` and `AWS_ENDPOINT_URL`) and through run-time database settings. For setting authentication parameters through run-time settings, use `SET DATABASE SETTING to ;` + query. Keys of this authentication parameters are `aws.access_key`, `aws.region`, `aws.secret_key` and `aws.endpoint_url`. + - **The `LOAD PARQUET` clause is not a standalone clause**, meaning a valid query must contain at least one more clause, for example: @@ -100,12 +108,21 @@ node properties once you import them: If the LOAD PARQUET clause is merging data instead of creating it, create indexes before running the LOAD PARQUET clause. -TODO: (andi) Check that + +The construct `USING PERIODIC COMMIT ` also improves the import speed because +it optimizes some of the memory allocation patterns. In our benchmarks, this construct +speeds up the execution from 25% to 35%. + +```cypher + USING PERIODIC COMMMIT 1024 LOAD PARQUET FROM "/x.parquet" AS x + CREATE (n:A {p1 : x, p2 : y}); +``` + + You can also speed up import if you switch Memgraph to [**analytical storage mode**](/fundamentals/storage-memory-usage#storage-modes). In the analytical -storage mode there are no ACID guarantees besides manually created snapshots but -it does **increase the import speed up to 6 times with 6 times less memory -consumption**. After import you can switch the storage mode back to +storage mode there are no ACID guarantees besides manually created snapshots. +After import you can switch the storage mode back to transactional and enable ACID guarantees. You can switch between modes within the session using the following query: @@ -118,7 +135,7 @@ If you use `IN_MEMORY_ANALYTICAL` mode and have nodes and relationships stored i separate PARQUET files, you can run multiple concurrent `LOAD PARQUET` queries to import data even faster. In order to achieve the best import performance, split your nodes and relationships files into smaller files and run multiple `LOAD PARQUET` queries in parallel. -The key is to run all `LOAD PARQUET` queries, which create nodes first. After that, run +The key is to run all `LOAD PARQUET` queries which create nodes first. After that, run all `LOAD PARQUET` queries that create relationships. diff --git a/pages/help-center/faq.mdx b/pages/help-center/faq.mdx index 70e676799..a7674c480 100644 --- a/pages/help-center/faq.mdx +++ b/pages/help-center/faq.mdx @@ -212,11 +212,11 @@ us](https://memgraph.com/enterprise-trial) for more information. ### What is the fastest way to import data into Memgraph? -Currently, the fastest way to import data is from a CSV file with a [LOAD CSV -clause](/data-migration/csv). Check out the [best practices for importing +Currently, the fastest way to import data is from a Parquet file with a [LOAD PARQUET +clause](/data-migration/parquet). Check out the [best practices for importing data](/data-migration/best-practices). -[Other import methods](/data-migration) include importing data from JSON and CYPHERL files, +[Other import methods](/data-migration) include importing data from CSV, JSON and CYPHERL files, migrating from relational databases, or connecting to a data stream. ### How to import data from MySQL or PostgreSQL? diff --git a/pages/querying/query-plan.mdx b/pages/querying/query-plan.mdx index 532867e67..9ad3ae3a2 100644 --- a/pages/querying/query-plan.mdx +++ b/pages/querying/query-plan.mdx @@ -241,6 +241,7 @@ The following table lists all the operators currently supported by Memgraph: | `IndexedJoin` | Performs an indexed join of the input from its two input branches. | | `Limit` | Limits certain rows from the pull chain. | | `LoadCsv` | Loads CSV file in order to import files into the database. | +| `LoadParquet` | Loads Parqet file in order to import files into the database. | | `Merge` | Applies merge on the input it received. | | `Once` | Forms the beginning of an operator chain with "only once" semantics. The operator will return false on subsequent pulls. | | `Optional` | Performs optional matching. | From dd6a267b9a4def5353fc7c30bf6e59e159761db7 Mon Sep 17 00:00:00 2001 From: Andi Skrgat Date: Tue, 28 Oct 2025 09:31:42 +0100 Subject: [PATCH 4/6] docs: Remove TODOs --- pages/data-migration/parquet.mdx | 5 ----- 1 file changed, 5 deletions(-) diff --git a/pages/data-migration/parquet.mdx b/pages/data-migration/parquet.mdx index be20816cd..1d17d41bc 100644 --- a/pages/data-migration/parquet.mdx +++ b/pages/data-migration/parquet.mdx @@ -28,11 +28,6 @@ The syntax of the `LOAD PARQUET` clause is: LOAD PARQUET FROM ( WITH CONFIG configs=configMap ) ? AS ``` -- TODO: (andi) config_map = {'aws_region': 'region', 'aws_access_key': 'acc_key', 'aws_secret_key': 'secret_key', 'aws_endpoint_url': 'endpoint_url'} -- TODO: (andi) Disk? -- TODO: (andi) Measure the effect of IN_MEMORY_ANALYTICAL -- TODO: (andi) Measure and/or describe USING PERIODIC COMMIT IMPROVEMENTS -- TODO: (andi) Describe how are null values handled - `` is a string of the location of the Parquet file.
Without a s3:// prefix, it refers to a path on the local and with s3:// prefix, it pulls the file with specified URI from the S3-compatible storage. There are no restrictions on where in From a8d4894f0c96df7b31b69f6fdf614179ddff3e9d Mon Sep 17 00:00:00 2001 From: Andi Skrgat Date: Wed, 29 Oct 2025 10:06:29 +0100 Subject: [PATCH 5/6] docs: Add command-line args and runtime flags --- pages/database-management/configuration.mdx | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/pages/database-management/configuration.mdx b/pages/database-management/configuration.mdx index 236ed476c..74accb95b 100644 --- a/pages/database-management/configuration.mdx +++ b/pages/database-management/configuration.mdx @@ -318,6 +318,10 @@ fallback to the value of the command-line argument. | hops_limit_partial_results | If set to `true`, partial results are returned when the hops limit is reached. If set to `false`, an exception is thrown when the hops limit is reached. The default value is `true`. | yes | | timezone | IANA timezone identifier string setting the instance's timezone. | yes | | storage.snapshot.interval | Define periodic snapshot schedule via cron expression ([crontab](https://crontab.guru/) format, an [Enterprise feature](/database-management/enabling-memgraph-enterprise)) or as a period in seconds. Set to empty string to disable. | no | +| aws.region | AWS region in which your S3 service is located. | yes | +| aws.access_key | Access key used to READ the file from S3. | yes | +| aws.secret_key | Secret key used to READ the file from S3. | yes | +| aws.endpoint_url | URL on which S3 can be accessed (if using some other S3-compatible storage). | yes | All settings can be fetched by calling the following query: @@ -481,6 +485,19 @@ connections in Memgraph. | `--stream-transaction-retry-interval=500` | The interval to wait (measured in milliseconds) before retrying to execute again a conflicting transaction. | `[uint32]` | +### AWS + +This section contains the list of flags that are used when connecting to S3-compatible storage. + + +| Flag | Description | Type | +|--------------------------------------------|-------------------------------------------------------------------------------------------------------------|------------| +| `--aws-region` | AWS region in which your S3 service is located. | `[string]` | +| `--aws-access-key` | Access key used to READ the file from S3. | `[string]` | +| `--aws-secret-key` | Secret key used to READ the file from S3. | `[string]` | +| `--aws-endpoint-url` | URL on which S3 can be accessed (if using some other S3-compatible storage). | `[string]` | + + ### Other This section contains the list of all other relevant flags used within Memgraph. From 40ab6644f7113aa5cb86faa48961d2cb2c34f2cc Mon Sep 17 00:00:00 2001 From: Andi Skrgat Date: Thu, 30 Oct 2025 09:45:01 +0100 Subject: [PATCH 6/6] docs: Add Parquet example --- pages/data-migration/parquet.mdx | 25 ++++++++++--------------- 1 file changed, 10 insertions(+), 15 deletions(-) diff --git a/pages/data-migration/parquet.mdx b/pages/data-migration/parquet.mdx index 1d17d41bc..39d0b39c4 100644 --- a/pages/data-migration/parquet.mdx +++ b/pages/data-migration/parquet.mdx @@ -141,9 +141,9 @@ of a single label or relationships of a single type. - {

Download the files

} + {

Parquet files

} - - [`people_nodes.parquet`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/people_nodes.parquet) is used to create nodes labeled `:Person`.
The file contains the following data: + - [`people_nodes.parquet`](s3://download.memgraph.com/asset/docs/people_nodes.parquet) is used to create nodes labeled `:Person`.
The file contains the following data: ```parquet id,name,age,city 100,Daniel,30,London @@ -152,7 +152,7 @@ of a single label or relationships of a single type. 103,Mia,25,Zagreb 104,Lucy,21,Paris ``` -- [`restaurants_nodes.parquet`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/restaurants_nodes.parquet) is used to create nodes labeled `:Restaurants`.
The file contains the following data: +- [`restaurants_nodes.parquet`](s3://download.memgraph.com/asset/docs/restaurants_nodes.parquet) is used to create nodes labeled `:Restaurants`.
The file contains the following data: ```parquet id,name,menu 200,Mc Donalds,Fries;BigMac;McChicken;Apple Pie @@ -161,7 +161,7 @@ of a single label or relationships of a single type. 203,Dominos,Pepperoni Pizza;Double Dish Pizza;Cheese filled Crust ``` -- [`people_relationships.parquet`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/people_relationships.parquet) is used to connect people with the `:IS_FRIENDS_WITH` relationship.
The file contains the following data: +- [`people_relationships.parquet`](s3://download.memgraph.com/asset/docs/people_relationships.parquet) is used to connect people with the `:IS_FRIENDS_WITH` relationship.
The file contains the following data: ```parquet first_person,second_person,met_in 100,102,2014 @@ -172,7 +172,7 @@ of a single label or relationships of a single type. 101,102,2017 100,103,2001 ``` -- [`restaurants_relationships.parquet`](https://public-assets.memgraph.com/import-data/load-csv-cypher/multiple-types-nodes/restaurants_relationships.parquet) is used to connect people with restaurants using the `:ATE_AT` relationship.
The file contains the following data: +- [`restaurants_relationships.parquet`](s3://download.memgraph.com/asset/docs/restaurants_relationships.parquet) is used to connect people with restaurants using the `:ATE_AT` relationship.
The file contains the following data: ```parquet PERSON_ID,REST_ID,liked 100,200,true @@ -184,28 +184,23 @@ of a single label or relationships of a single type. 102,201,true ``` - {

Check the location of the Parquet files

} - If you are working with Docker, [copy the files from your local directory into - the Docker container](/getting-started/first-steps-with-docker#copy-files-from-and-to-a-docker-container) - so that Memgraph can access them. - {

Import nodes

} Each row will be parsed as a map, and the - fields can be accessed using the property lookup syntax (e.g. `id: row.id`). + fields can be accessed using the property lookup syntax (e.g. `id: row.id`). Files can be imported directly from s3 or can be downloaded and then accessed from the local disk. The following query will load row by row from the file, and create a new node for each row with properties based on the parsed row values: ```cypher - LOAD PARQUET FROM "/path-to/people_nodes_wh.parquet" AS row + LOAD PARQUET FROM "s3://download.memgraph.com/asset/docs/people_nodes.parquet" AS row CREATE (n:Person {id: row.id, name: row.name, age: row.age, city: row.city}); ``` In the same manner, the following query will create new nodes for each restaurant: ```cypher - LOAD PARQUET FROM "/path-to/restaurants_nodes.parquet" AS row + LOAD PARQUET FROM "s3://download.memgraph.com/asset/docs/restaurants_nodes.parquet" AS row CREATE (n:Restaurant {id: row.id, name: row.name, menu: row.menu}); ``` @@ -223,7 +218,7 @@ of a single label or relationships of a single type. The following query will create relationships between the people nodes: ```cypher - LOAD PARQUET FROM "/path-to/people_relationships.parquet" AS row + LOAD PARQUET FROM "s3://download.memgraph.com/asset/docs/people_relationships.parquet" AS row MATCH (p1:Person {id: row.first_person}) MATCH (p2:Person {id: row.second_person}) CREATE (p1)-[f:IS_FRIENDS_WITH]->(p2) @@ -233,7 +228,7 @@ of a single label or relationships of a single type. The following query will create relationships between people and restaurants where they ate: ```cypher - LOAD PARQUET FROM "/path-to/restaurants_relationships.parquet" AS row + LOAD PARQUET FROM "s3://download.memgraph.com/asset/docs/restaurants_relationships.parquet" AS row MATCH (p1:Person {id: row.PERSON_ID}) MATCH (re:Restaurant {id: row.REST_ID}) CREATE (p1)-[ate:ATE_AT]->(re)