web123456

Practical tips for preventing duplication when inserting data in batches

Scene

In business, we often encounter scenarios of batch inserting data, such as uploading Excel or syncing data (save dozens or hundreds of records at a time). Need to be guaranteed when storingIdepotency, for example, there cannot be multiple data with the same order number.

  • The simple way to handle it is to add a query before each save, and insert it if it exists, otherwise the data will be updated or not processed.
  • Add a unique index of a certain field. If it is repeated during insertion, a MySQL exception will be explicitly thrown, and the exception needs to be handled by yourself.
    Both solutions have obvious disadvantages, that is, it is acceptable if the data volume is small, and when the data volume is large, it is acceptable if the data volume is large.Very slow, and the database interactions are many times, occupying database connections.

Solution

1. Useinsertignore into statement

SQL example
insert ignore into table (field1, field2, ...) values (property1, property2, ...), (property1, property2, ...);
There is no difference from ordinary insert statements, it just ignores the existing records and judges based on the primary key or unique key.
tips: If you deduplicate based on the business field, you need to first build the corresponding unique index.

2. Use insert into … on duplicate key update statement

Example
insert into table (field1, field2, ...) values (property1, property2, ...), (property1, property2, ...) on duplicate key update field1=values(field1), field2=values(field2), ...;

Similar to the insert ignore into statement above, except that the function of this statement is as follows

  1. If this record exists in the table, the corresponding field value will be updated according to the content specified after update;
  2. This record does not exist in the table, then insert it directly.

The basis for judging the duplication of records is still the primary key or the unique key. A special point to note is that if the primary key is self-incremented, when inserting using this statement, the auto-increment start value will be +1 regardless of whether the new record is added.

XML dynamics when using Mybatismapperas follows:

<insert id="batchSaveOrUpdate" parameterType="list">
    insert into user (name, sex, email)
    values
    <foreach collection="list" item="item" index="index" separator=",">
        (
            #{},
            #{},
            #{}
        )
    </foreach>
    ON duplicate KEY UPDATE email=values(email)
</insert>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12

tips: If you deduplicate based on the business field, you need to first build the corresponding unique index.