MySQL数据库 Load Data 多种用法

目录
  •  mysql load data 的多种用法
    • 一、load 基本背景
    • 二、load 基础参数
    • 三、load 示例数据及示例表结构
    • 四、load 场景示例
    • 五、load 总结

 mysql load data 的多种用法

一、load 基本背景

我们在数据库运维过程中难免会涉及到需要对文本数据进行处理,并导入到数据库中,本文整理了一些导入导出时常见的场景进行示例演示。

二、load 基础参数

文章后续示例均使用以下命令导出的 csv 格式样例数据(以 , 逗号做分隔符,以 ” 双引号作为界定符)

-- 导出基础参数
select * into outfile '/data/mysql/3306/tmp/employees.txt'
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
from employees.employees limit 10;

-- 导入基础参数
load data infile '/data/mysql/3306/tmp/employees.txt'
replace into table demo.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
...

三、load 示例数据及示例表结构

以下为示例数据,表结构及对应关系信息

-- 导出的文件数据内容
[root@10-186-61-162 tmp]# cat employees.txt
"10001","1953-09-02","georgi","facello","m","1986-06-26"
"10002","1964-06-02","bezalel","simmel","f","1985-11-21"
"10003","1959-12-03","parto","bamford","m","1986-08-28"
"10004","1954-05-01","chirstian","koblick","m","1986-12-01"
"10005","1955-01-21","kyoichi","maliniak","m","1989-09-12"
"10006","1953-04-20","anneke","preusig","f","1989-06-02"
"10007","1957-05-23","tzvetan","zielinski","f","1989-02-10"
"10008","1958-02-19","saniya","kalloufi","m","1994-09-15"
"10009","1952-04-19","sumant","peac","f","1985-02-18"
"10010","1963-06-01","duangkaew","piveteau","f","1989-08-24"

-- 示例表结构
sql > desc demo.emp;
+-------------+---------------+------+-----+---------+-------+
| field       | type          | null | key | default | extra |
+-------------+---------------+------+-----+---------+-------+
| emp_no      | int           | no   | pri | null    |       |
| birth_date  | date          | no   |     | null    |       |
| first_name  | varchar(16)   | no   |     | null    |       |
| last_name   | varchar(16)   | no   |     | null    |       |
| fullname    | varchar(32)   | yes  |     | null    |       | -- 表新增字段,导出数据文件中不存在
| gender      | enum('m','f') | no   |     | null    |       |
| hire_date   | date          | no   |     | null    |       |
| modify_date | datetime      | yes  |     | null    |       | -- 表新增字段,导出数据文件中不存在
| delete_flag | char(1)       | yes  |     | null    |       | -- 表新增字段,导出数据文件中不存在
+-------------+---------------+------+-----+---------+-------+

-- 导出的数据与字段对应关系
emp_no  birth_date    first_name   last_name    gender  hire_date
"10001"  "1953-09-02"  "georgi"     "facello"    "m"    "1986-06-26"
"10002"  "1964-06-02"  "bezalel"    "simmel"     "f"    "1985-11-21"
"10003"  "1959-12-03"  "parto"      "bamford"    "m"    "1986-08-28"
"10004"  "1954-05-01"  "chirstian"  "koblick"    "m"    "1986-12-01"
"10005"  "1955-01-21"  "kyoichi"    "maliniak"   "m"    "1989-09-12"
"10006"  "1953-04-20"  "anneke"     "preusig"    "f"    "1989-06-02"
"10007"  "1957-05-23"  "tzvetan"    "zielinski"  "f"    "1989-02-10"
"10008"  "1958-02-19"  "saniya"     "kalloufi"   "m"    "1994-09-15"
"10009"  "1952-04-19"  "sumant"     "peac"       "f"    "1985-02-18"
"10010"  "1963-06-01"  "duangkaew"  "piveteau"   "f"    "1989-08-24"

四、load 场景示例

场景1. load 文件中的字段比数据表中的字段多

只需要文本文件中部分数据导入到数据表中

-- 临时创建2个字段的表结构
sql > create table emp_tmp select emp_no,hire_date from emp;
sql > desc emp_tmp;
+-----------+------+------+-----+---------+-------+
| field     | type | null | key | default | extra |
+-----------+------+------+-----+---------+-------+
| emp_no    | int  | no   |     | null    |       |
| hire_date | date | no   |     | null    |       |
+-----------+------+------+-----+---------+-------+

-- 导入数据语句
load data infile '/data/mysql/3306/tmp/employees.txt'
replace into table demo.emp_tmp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@c1,@c2,@c3,@c4,@c5,@c6) -- 该部分对应employees.txt文件中6列数据
-- 只对导出数据中指定的2个列与表中字段做匹配,mapping关系指定的顺序不影响导入结果
set hire_date=@c6,
    emp_no=@c1; 

-- 导入数据结果示例
sql > select * from emp_tmp;
+--------+------------+
| emp_no | hire_date  |
+--------+------------+
|  10001 | 1986-06-26 |
|  10002 | 1985-11-21 |
|  10003 | 1986-08-28 |
|  10004 | 1986-12-01 |
|  10005 | 1989-09-12 |
|  10006 | 1989-06-02 |
|  10007 | 1989-02-10 |
|  10008 | 1994-09-15 |
|  10009 | 1985-02-18 |
|  10010 | 1989-08-24 |
+--------+------------+
10 rows in set (0.0016 sec)

场景 2. load 文件中的字段比数据表中的字段少

表字段不仅包含文本文件中所有数据,还包含了额外的字段

-- 导入数据语句
load data infile '/data/mysql/3306/tmp/employees.txt'
replace into table demo.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@c1,@c2,@c3,@c4,@c5,@c6) -- 该部分对应employees.txt文件中6列数据
-- 将文件中的字段与表中字段做mapping对应,表中多出的字段不做处理
set emp_no=@c1,
   birth_date=@c2,
   first_name=@c3,
   last_name=@c4,
   gender=@c5,
   hire_date=@c6;

 

场景3. load 生成自定义字段数据

从场景 2 的验证可以看到,emp 表中新增的字段 fullname,modify_date,delete_flag 字段在导入时并未做处理,被置为了 null 值,如果需要对其进行处理,可在 load 时通过 mysql支持的函数 或给定 固定值 自行定义数据,对于文件中存在的字段也可做函数处理,结合导入导出,实现简单的 etl 功能,如下所示:

-- 导入数据语句
load data infile '/data/mysql/3306/tmp/employees.txt'
replace into table demo.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@c1,@c2,@c3,@c4,@c5,@c6)              -- 该部分对应employees.txt文件中6列数据

-- 以下部分明确对表中字段与数据文件中的字段做mapping关系,不存在的数据通过函数处理生成(也可设置为固定值)
set emp_no=@c1,
   birth_date=@c2,
   first_name=upper(@c3),              -- 将导入的数据转为大写
   last_name=lower(@c4),               -- 将导入的数据转为小写
   fullname=concat(first_name,' ',last_name),    -- 对first_name和last_name做拼接
   gender=@c5,
   hire_date=@c6 ,
   modify_date=now(),                 -- 生成当前时间数据
   delete_flag=if(hire_date<'1988-01-01','y','n'); -- 对需要生成的值基于某一列做条件运算

场景4. load 定长数据

定长数据的特点如下所示,可以使用函数取出字符串中固定长度来生成指定列数据

sql > select 
    c1 as sample_data,
    substr(c1,1,3)  as c1,
    substr(c1,4,3)  as c2,
    substr(c1,7,2)  as c3,
    substr(c1,9,5)  as c4,
    substr(c1,14,3) as c5,
    substr(c1,17,3) as c6 from t1
    
*************************** 1. row ***************************
sample_data: abc余振兴cdmysqlefg数据库
         c1: abc
         c2: 余振兴
         c3: cd
         c4: mysql
         c5: efg
         c6: 数据库

定长数据导入需要明确每列数据占用的字符个数,以下直接使用 rpad 对现有的表数据填充空格的方式生成定长数据用作示例使用

-- 生成定长数据
sql > select 
    concat(rpad(emp_no,10,' '),
          rpad(birth_date,19,' '),
          rpad(first_name,14,' '),
          rpad(last_name,16,' '),
          rpad(gender,2,' '),
          rpad(hire_date,19,' ')) as fixed_length_data 
      from employees.employees limit 10;

+----------------------------------------------------------------------------------+
| fixed_length_data                                                                |
+----------------------------------------------------------------------------------+
| 10001     1953-09-02         georgi        facello         m 1986-06-26          |
| 10002     1964-06-02         bezalel       simmel          f 1985-11-21          |
| 10003     1959-12-03         parto         bamford         m 1986-08-28          |
| 10004     1954-05-01         chirstian     koblick         m 1986-12-01          |
| 10005     1955-01-21         kyoichi       maliniak        m 1989-09-12          |
| 10006     1953-04-20         anneke        preusig         f 1989-06-02          |
| 10007     1957-05-23         tzvetan       zielinski       f 1989-02-10          |
| 10008     1958-02-19         saniya        kalloufi        m 1994-09-15          |
| 10009     1952-04-19         sumant        peac            f 1985-02-18          |
| 10010     1963-06-01         duangkaew     piveteau        f 1989-08-24          |
+----------------------------------------------------------------------------------+

-- 导出定长数据
select 
    concat(rpad(emp_no,10,' '),
          rpad(birth_date,19,' '),
          rpad(first_name,14,' '),
          rpad(last_name,16,' '),
          rpad(gender,2,' '),
          rpad(hire_date,19,' ')) as fixed_length_data 
into outfile '/data/mysql/3306/tmp/employees_fixed.txt'
character set utf8mb4
lines terminated by '\n'
from employees.employees limit 10;

-- 导出数据示例
[root@10-186-61-162 tmp]# cat employees_fixed.txt
10001     1953-09-02         georgi        facello         m 1986-06-26
10002     1964-06-02         bezalel       simmel          f 1985-11-21
10003     1959-12-03         parto         bamford         m 1986-08-28
10004     1954-05-01         chirstian     koblick         m 1986-12-01
10005     1955-01-21         kyoichi       maliniak        m 1989-09-12
10006     1953-04-20         anneke        preusig         f 1989-06-02
10007     1957-05-23         tzvetan       zielinski       f 1989-02-10
10008     1958-02-19         saniya        kalloufi        m 1994-09-15
10009     1952-04-19         sumant        peac            f 1985-02-18
10010     1963-06-01         duangkaew     piveteau        f 1989-08-24

-- 导入定长数据
load data infile '/data/mysql/3306/tmp/employees_fixed.txt'
replace into table demo.emp
character set utf8mb4
fields terminated by ','
enclosed by '"'
lines terminated by '\n'
(@row)  -- 对一行数据定义为一个整体
set emp_no   = trim(substr(@row,1,10)),-- 使用substr取前10个字符,并去除头尾空格数据
   birth_date = trim(substr(@row,11,19)),-- 后续字段以此类推
   first_name = trim(substr(@row,30,14)),
   last_name  = trim(substr(@row,44,16)),
   fullname  = concat(first_name,' ',last_name),  -- 对first_name和last_name做拼接
   gender   = trim(substr(@row,60,2)),
   hire_date  = trim(substr(@row,62,19)),
   modify_date = now(),
   delete_flag = if(hire_date<'1988-01-01','y','n'); -- 对需要生成的值基于某一列做条件运算

五、load 总结

1.默认情况下导入的顺序以文本文件 列-从左到右,行-从上到下 的顺序导入

2.如果表结构和文本数据不一致,建议将文本文件中的各列依次顺序编号并与表中字段建立 mapping 关系,以防数据导入到错误的字段

3.对于待导入的文本文件较大的场景,建议将文件 按行拆分 为多个小文件,如用 split 拆分

4.对文件导入后建议执行以下语句验证导入的数据是否有 warning,error 以及导入的数据量

  • get diagnostics @p1=number,@p2=row_count;
  • select @p1 as error_count,@p2 as row_count;

5.文本文件数据与表结构存在过大的差异或数据需要做清洗转换,建议还是用专业的 etl 工具或先粗略导入 mysql 中再进行加工转换处理

以上就是 mysql load data 数据的多种用法的详细内容,更多关于mysql load data 的用法的资料请关注www.887551.com其它相关文章!,希望大家以后多多支持www.887551.com!

(0)
上一篇 2022年3月21日
下一篇 2022年3月21日

相关推荐