21.4.14 ndb_import — Import CSV Data Into NDB
ndb_import imports CSV-formatted data, such as that produced by mysqldump --tab
, directly into NDB
using the NDB API. ndb_import requires a connection to an NDB management server (ndb_mgmd) to function; it does not require a connection to a MySQL Server.
Usage
ndb_import db_name file_name options
ndb_import requires two arguments. db_name
is the name of the database where the table into which to import the data is found; file_name
is the name of the CSV file from which to read the data; this must include the path to this file if it is not in the current directory. The name of the file must match that of the table; the file's extension, if any, is not taken into consideration. Options supported by ndb_import include those for specifying field separators, escapes, and line terminators, and are described later in this section. ndb_import must be able to connect to an NDB Cluster management server; for this reason, there must be an unused [api]
slot in the cluster config.ini
file.
To duplicate an existing table that uses a different storage engine, such as InnoDB
, as an NDB
table, use the mysql client to perform a SELECT INTO OUTFILE
statement to export the existing table to a CSV file, then to execute a CREATE TABLE LIKE
statement to create a new table having the same structure as the existing table, then perform ALTER TABLE ... ENGINE=NDB
on the new table; after this, from the system shell, invoke ndb_import to load the data into the new NDB
table. For example, an existing InnoDB
table named myinnodb_table
in a database named myinnodb
can be exported into an NDB
table named myndb_table
in a database named myndb
as shown here, assuming that you are already logged in as a MySQL user with the appropriate privileges:
In the mysql client:
mysql> USE myinnodb; mysql> SELECT * INTO OUTFILE '/tmp/myndb_table.csv' > FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\' > LINES TERMINATED BY '\n' > FROM myinnodbtable; mysql> CREATE DATABASE myndb; mysql> USE myndb; mysql> CREATE TABLE myndb_table LIKE myinnodb.myinnodb_table; mysql> ALTER TABLE myndb_table ENGINE=NDB; mysql> EXIT; Bye shell>
Once the target database and table have been created, a running mysqld is no longer required. You can stop it using mysqladmin shutdown or another method before proceeding, if you wish.
In the system shell:
# if you are not already in the MySQL bin directory: shell> cd path-to-mysql-bin-dir shell> ndb_import myndb /tmp/myndb_table.csv --fields-optionally-enclosed-by='"' \ --fields-terminated-by="," --fields-escaped-by='\\'
The output should resemble what is shown here:
job-1 import myndb.myndb_table from /tmp/myndb_table.csv job-1 [running] import myndb.myndb_table from /tmp/myndb_table.csv job-1 [success] import myndb.myndb_table from /tmp/myndb_table.csv job-1 imported 19984 rows in 0h0m9s at 2277 rows/s jobs summary: defined: 1 run: 1 with success: 1 with failure: 0 shell>
The following table includes options that are specific to ndb_import. Additional descriptions follow the table. For options common to most NDB Cluster programs (including ndb_import), see Section 21.4.32, “Options Common to NDB Cluster Programs — Options Common to NDB Cluster Programs”.
Table 21.250 Command-line options for the ndb_import program
Format | Description | Added, Deprecated, or Removed |
---|---|---|
Dump core on any fatal error; used for debugging | ADDED: NDB 7.6.2 |
|
For table with hidden PK, specify autoincrement increment. See mysqld | ADDED: NDB 7.6.2 |
|
For table with hidden PK, specify autoincrement offset. See mysqld | ADDED: NDB 7.6.2 |
|
For table with hidden PK, specify number of autoincrement values that are prefetched. See mysqld | ADDED: NDB 7.6.2 |
|
Number of cluster connections to create | ADDED: NDB 7.6.2 |
|
When job fails, continue to next job | ADDED: NDB 7.6.2 |
|
Number of threads, per data node, executing database operations | ADDED: NDB 7.6.2 |
|
Error insert type, for testing purposes; use "list" to obtain all possible values | ADDED: NDB 7.6.2 |
|
Error insert delay in milliseconds; random variation is added | ADDED: NDB 7.6.2 |
|
Same as FIELDS ENCLOSED BY option for LOAD DATA statements. For CSV input this is same as using --fields-optionally-enclosed-by | ADDED: NDB 7.6.2 |
|
Same as FIELDS ESCAPED BY option for LOAD DATA statements | ADDED: NDB 7.6.2 |
|
Same as FIELDS OPTIONALLY ENCLOSED BY option for LOAD DATA statements | ADDED: NDB 7.6.2 |
|
Same as FIELDS TERMINATED BY option for LOAD DATA statements | ADDED: NDB 7.6.2 |
|
Number of milliseconds to sleep waiting for more to do | ADDED: NDB 7.6.2 |
|
Number of times to re-try before idlesleep | ADDED: NDB 7.6.2 |
|
Ignore first # lines in input file. Used to skip a non-data header | ADDED: NDB 7.6.2 |
|
Input type: random or csv | ADDED: NDB 7.6.2 |
|
Number of threads processing input. Must be 2 or more if --input-type is csv | ADDED: NDB 7.6.2 |
|
State files (except non-empty *.rej files) are normally removed on job completion. Using this option causes all state files to be preserved instead | ADDED: NDB 7.6.4 |
|
Same as LINES TERMINATED BY option for LOAD DATA statements | ADDED: NDB 7.6.2 |
|
Import only this number of input data rows; default is 0, which imports all rows | ADDED: NDB 7.6.2 |
|
Periodically print status of running job if something has changed (status, rejected rows, temporary errors). Value 0 disables. Value 1 prints any change seen. Higher values reduce status printing exponentially up to some pre-defined limit | ADDED: NDB 7.6.2 |
|
Run database operations as batches, in single transactions | ADDED: NDB 7.6.2 |
|
Do not use distribution key hint to select data node (TC) | ADDED: NDB 7.6.2 |
|
A db execution batch is a set of transactions and operations sent to NDB kernel. This option limits NDB operations (including blob operations) in a db execution batch. Therefore it also limits number of asynch transactions. Value 0 is not valid | ADDED: NDB 7.6.2 |
|
Limit bytes in execution batch (default 0 = no limit) | ADDED: NDB 7.6.2 |
|
Output type: ndb is default, null used for testing | ADDED: NDB 7.6.2 |
|
Number of threads processing output or relaying database operations | ADDED: NDB 7.6.2 |
|
Align I/O buffers to given size | ADDED: NDB 7.6.2 |
|
Size of I/O buffers as multiple of page size. CSV input worker allocates a double-sized buffer | ADDED: NDB 7.6.2 |
|
Timeout per poll for completed asynchonous transactions; polling continues until all polls are completed, or error occurs | ADDED: NDB 7.6.2 |
|
Limit number of rejected rows (rows with permanent error) in data load. Default is 0 which means that any rejected row causes a fatal error. The row exceeding the limit is also added to *.rej | ADDED: NDB 7.6.2 |
|
If job aborted (temporary error, user interrupt), resume with rows not yet processed | ADDED: NDB 7.6.2 |
|
Limit rows in row queues (default 0 = no limit); must be 1 or more if --input-type is random | ADDED: NDB 7.6.2 |
|
Limit bytes in row queues (0 = no limit) | ADDED: NDB 7.6.2 |
|
Where to write state files; currect directory is default | ADDED: NDB 7.6.2 |
|
Save performance related options and internal statistics in *.sto and *.stt files. These files are kept on successful completion even if --keep-state is not used | ADDED: NDB 7.6.4 |
|
Number of milliseconds to sleep between temporary errors | ADDED: NDB 7.6.2 |
|
Number of times a transaction can fail due to a temporary error, per execution batch; 0 means any temporary error is fatal. Such errors do not cause any rows to be written to .rej file | ADDED: NDB 7.6.2 |
|
Enable verbose output | ADDED: NDB 7.6.2 |
-
Property Value Command-Line Format --abort-on-error
Introduced 5.7.18-ndb-7.6.2 Type Boolean Default Value FALSE
Dump core on any fatal error; used for debugging only.
-
Property Value Command-Line Format --ai-increment=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 1
Minimum Value 1
Maximum Value 4294967295
For a table with a hidden primary key, specify the autoincrement increment, like the
auto_increment_increment
system variable does in the MySQL Server. -
Property Value Command-Line Format --ai-offset=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 1
Minimum Value 1
Maximum Value 4294967295
For a table with hidden primary key, specify the autoincrement offset. Similar to the
auto_increment_offset
system variable. -
Property Value Command-Line Format --ai-prefetch-sz=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 1024
Minimum Value 1
Maximum Value 4294967295
For a table with a hidden primary key, specify the number of autoincrement values that are prefetched. Behaves like the
ndb_autoincrement_prefetch_sz
system variable does in the MySQL Server. -
Property Value Command-Line Format --connections=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 1
Minimum Value 1
Maximum Value 4294967295
Number of cluster connections to create.
-
Property Value Command-Line Format --continue
Introduced 5.7.18-ndb-7.6.2 Type Boolean Default Value FALSE
When a job fails, continue to the next job.
-
Property Value Command-Line Format --db-workers=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value (≥ 5.7.20-ndb-7.6.4) 4
Default Value (≥ 5.7.18-ndb-7.6.2, ≤ 5.7.18-ndb-7.6.3) 1
Minimum Value 1
Maximum Value 4294967295
Number of threads, per data node, executing database operations.
-
Property Value Command-Line Format --errins-type=name
Introduced 5.7.18-ndb-7.6.2 Type Enumeration Default Value [none]
Valid Values stopjob
stopall
sighup
sigint
list
Error insert type; use
list
as thename
value to obtain all possible values. This option is used for testing purposes only. -
Property Value Command-Line Format --errins-delay=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 1000
Minimum Value 0
Maximum Value 4294967295
Unit ms
Error insert delay in milliseconds; random variation is added. This option is used for testing purposes only.
-
Property Value Command-Line Format --fields-enclosed-by=char
Introduced 5.7.18-ndb-7.6.2 Type String Default Value [none]
This works in the same way as the
FIELDS ENCLOSED BY
option does for theLOAD DATA
statement, specifying a character to be interpeted as quoting field values. For CSV input, this is the same as--fields-optionally-enclosed-by
. -
Property Value Command-Line Format --fields-escaped-by=name
Introduced 5.7.18-ndb-7.6.2 Type String Default Value \
Specify an escape character in the same way as the
FIELDS ESCAPED BY
option does for the SQLLOAD DATA
statement. --fields-optionally-enclosed-by
=char
Property Value Command-Line Format --fields-optionally-enclosed-by=char
Introduced 5.7.18-ndb-7.6.2 Type String Default Value [none]
This works in the same way as the
FIELDS OPTIONALLY ENCLOSED BY
option does for theLOAD DATA
statement, specifying a character to be interpeted as optionally quoting field values. For CSV input, this is the same as--fields-enclosed-by
.-
Property Value Command-Line Format --fields-terminated-by=char
Introduced 5.7.18-ndb-7.6.2 Type String Default Value \t
This works in the same way as the
FIELDS TERMINATED BY
option does for theLOAD DATA
statement, specifying a character to be interpeted as the field separator. -
Property Value Command-Line Format --idlesleep=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 1
Minimum Value 1
Maximum Value 4294967295
Unit ms
Number of milliseconds to sleep waiting for more work to perform.
-
Property Value Command-Line Format --idlespin=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 0
Minimum Value 0
Maximum Value 4294967295
Number of times to retry before sleeping.
-
Property Value Command-Line Format --ignore-lines=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 0
Minimum Value 0
Maximum Value 4294967295
Cause ndb_import to ignore the first
#
lines of the input file. This can be employed to skip a file header that does not contain any data. -
Property Value Command-Line Format --input-type=name
Introduced 5.7.18-ndb-7.6.2 Type Enumeration Default Value csv
Valid Values random
csv
Set the type of input type. The default is
csv
;random
is intended for testing purposes only. . -
Property Value Command-Line Format --input-workers=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value (≥ 5.7.20-ndb-7.6.4) 4
Default Value (≥ 5.7.18-ndb-7.6.2, ≤ 5.7.18-ndb-7.6.3) 2
Minimum Value 1
Maximum Value 4294967295
Set the number of threads processing input.
-
Property Value Command-Line Format --keep-state
Introduced 5.7.20-ndb-7.6.4 Type Boolean Default Value false
By default, ndb_import removes all state files (except non-empty
*.rej
files) when it completes a job. Specify this option (nor argument is required) to force the program to retain all state files instead. -
Property Value Command-Line Format --lines-terminated-by=name
Introduced 5.7.18-ndb-7.6.2 Type String Default Value \n
This works in the same way as the
LINES TERMINATED BY
option does for theLOAD DATA
statement, specifying a character to be interpeted as end-of-line. -
Property Value Command-Line Format --log-level=#
Type Integer Default Value 0
Minimum Value 0
Maximum Value 2
Performs internal logging at the given level. This option is intended primarily for internal and development use.
In debug builds of NDB only, the logging level can be set using this option to a maximum of 4.
-
Property Value Command-Line Format --max-rows=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 0
Minimum Value 0
Maximum Value 4294967295
Unit bytes
Import only this number of input data rows; the default is 0, which imports all rows.
-
Property Value Command-Line Format --monitor=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 2
Minimum Value 0
Maximum Value 4294967295
Unit bytes
Periodically print the status of a running job if something has changed (status, rejected rows, temporary errors). Set to 0 to disable this reporting. Setting to 1 prints any change that is seen. Higher values reduce the frequency of this status reporting.
-
Property Value Command-Line Format --no-asynch
Introduced 5.7.18-ndb-7.6.2 Type Boolean Default Value FALSE
Run database operations as batches, in single transactions.
-
Property Value Command-Line Format --no-hint
Introduced 5.7.18-ndb-7.6.2 Type Boolean Default Value FALSE
Do not use distribution key hinting to select a data node.
-
Property Value Command-Line Format --opbatch=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 256
Minimum Value 1
Maximum Value 4294967295
Unit bytes
Set a limit on the number of operations (including blob operations), and thus the number of asynchronous transactions, per execution batch.
-
Property Value Command-Line Format --opbytes=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 0
Minimum Value 0
Maximum Value 4294967295
Unit bytes
Set a limit on the number of bytes per execution batch. Use 0 for no limit.
-
Property Value Command-Line Format --output-type=name
Introduced 5.7.18-ndb-7.6.2 Type Enumeration Default Value ndb
Valid Values null
Set the output type.
ndb
is the default.null
is used only for testing. -
Property Value Command-Line Format --output-workers=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 2
Minimum Value 1
Maximum Value 4294967295
Set the number of threads processing output or relaying database operations.
-
Property Value Command-Line Format --pagesize=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 4096
Minimum Value 1
Maximum Value 4294967295
Unit bytes
Align I/O buffers to the given size.
-
Property Value Command-Line Format --pagecnt=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 64
Minimum Value 1
Maximum Value 4294967295
Set the size of I/O buffers as multiple of page size. The CSV input worker allocates buffer that is doubled in size.
-
Property Value Command-Line Format --polltimeout=#
Introduced 5.7.18-ndb-7.6.2 Type Integer Default Value 1000
Minimum Value 1
Maximum Value 4294967295