[대용량 이관] ItemWriter 선택 (JPA vs JDBC vs MyBatis)

프로젝트/대용량 이관

[대용량 이관] ItemWriter 선택 (JPA vs JDBC vs MyBatis)

흰둥아 2025. 5. 7. 20:36

파일 형식에 따라 Reader가 달라지기 때문에 이건 차치하고, 어떤 Writer를 써야할지 고민했다.

MyBatisBatchItemWriter만 사용해봤는데, 실무에서 이걸 사용했던 이유는 테이블명이 기간에 따라 변경되기도 하고, if 문을 활용한 분기처리를 위한 동적매핑이 필요했기 때문이다.

현재는 동적 매핑이 필요하지 않고 단순 Insert 처리만 하면 되기 때문에 굳이 이걸로 구현해야할까?란 생각이 들었기에 JdbcBatchItemWriter를 고려하게 됐다. JpaItemWriter는 [Spring Batch ItemWriter 성능 비교]를 보고 속도상 제외했다.

하지만!

실제로 속도차이가 많이 나는지 궁금해서 테스트해보기로 했다.

구성

공통 설정 (Entity)

// BookRating.java
@NoArgsConstructor
@AllArgsConstructor
@Data
@Entity
public class BookRating {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    private String userId;
    private String bookId;
    private int rating;

    @CreatedDate
    private Date createdAt;
}


// BookRatingRepository.java
public interface BookRatingRepository extends JpaRepository<BookRating, Long> {
}

Jpa

# application.yml
spring:
  application:
    name: batch
  datasource:
    url: jdbc:mariadb://localhost:3306/batch
    username: user
    password: user
    driver-class-name: org.mariadb.jdbc.Driver
  jpa:
    hibernate:
      ddl-auto: create
    properties:
      hibernate:
        format_sql: true
    show-sql: false

Mybatis

// RatingMapper.java
@Mapper
public interface RatingMapper {
    void insertBookRatings(@Param("bookRatings") List<BookRating> bookRatings);
}

# mapper/RatingMapper.xml
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="io.github.haeun.batch.mapper.RatingMapper">
    <insert id="insertBookRatings">
        INSERT INTO book_rating (user_id, book_id, rating, created_at)
        VALUES
        <foreach collection="bookRatings" item="r" separator=",">
            (#{r.userId}, #{r.bookId}, #{r.rating}, now())
        </foreach>
    </insert>
</mapper>

# application.yml (추가)
mybatis:
  mapper-locations: classpath:mapper/*.xml

mybatis 실행 시 "Invalid bound statement (not found)"에러가 난다면 [해당 블로그] 확인.

참고로 나는 mybatis 설정을 spring 안에 둬서 에러가 났었다.

Jdbc

테스트 로직에서 구현

테스트 코드 작성

@Slf4j
@Transactional
@Service
public class InsertTester implements CommandLineRunner {
    @Autowired
    JdbcTemplate jdbcTemplate;
    @PersistenceContext
    EntityManager em;
    @Autowired
    RatingMapper ratingMapper;

    @Override
    public void run(String... args) {
        List<BookRating> dummyList = IntStream.range(0, 100_000)
                .mapToObj(i -> new BookRating(null, "user" + i, "book" + i, i % 5 + 1, null))
                .collect(Collectors.toList());

        insertWithJdbc(dummyList);
        insertWithJpa(dummyList);
        insertWithMyBatis(dummyList);
    }

    public void insertWithJdbc(List<BookRating> list) {
        long start = System.currentTimeMillis();

        String sql = "INSERT INTO book_rating (user_id, book_id, rating, created_at) VALUES (?, ?, ?, ?)";

        jdbcTemplate.batchUpdate(sql, new BatchPreparedStatementSetter() {
            public void setValues(PreparedStatement ps, int i) throws SQLException {
                BookRating r = list.get(i);
                ps.setString(1, r.getUserId());
                ps.setString(2, r.getBookId());
                ps.setInt(3, r.getRating());
                ps.setTimestamp(4, new Timestamp(new Date().getTime()));
            }

            public int getBatchSize() {
                return list.size();
            }
        });

        log.info("[ JDBC ] size: {}, {}ms", list.size(), System.currentTimeMillis() - start);
    }

    public void insertWithJpa(List<BookRating> list) {
        long start = System.currentTimeMillis();
        int batchSize = 1000;
        for (int i = 0; i < list.size(); i++) {
            em.persist(list.get(i));
            if (i % batchSize == 0 && i > 0) {
                em.flush();
                em.clear();
            }
        }
        em.flush();
        em.clear();
        log.info("[ JPA ] size: {}, {}ms", list.size(), System.currentTimeMillis() - start);
    }

    public void insertWithMyBatis(List<BookRating> list) {
        long start = System.currentTimeMillis();
        ratingMapper.insertBookRatings(list);
        log.info("[ MyBatis ] size: {}, {}ms", list.size(), System.currentTimeMillis() - start);
    }
}

Annotation

Annotation	설명
@Slf4j	로그 출력
@Transactional	jpa에서 em.persist()를 사용하기 위해선 반드시 필요
@Service	비즈니스 로직을 담은 서비스 계층 클래스에서 사용됨 (해당 서비스에서는 컴포넌트 스캔 대상이 되도록 사용)

implements

implements	설명
CommandLineRunner	Spring Boot 시작 직후 트랜잭션이 적용된 상태에서 코드 실행을 하기위해 사용 (@PostConstruct를 사용하려했으나 트랜잭션이 적용되지 않기 때문에 대체함)

insertWithJpa

for (int i = 0; i < list.size(); i++) {
    em.persist(list.get(i));
    if (i % batchSize == 0 && i > 0) {
        em.flush();
        em.clear();
    }
}

EntityManager.persist()로 하나씩 등록하고, flush()로 실제 DB에 반영

1000개씩 flush(), clear()를 해줌으로써 DB반영과 메모리 과다사용 방지

(현재는 클래스에 @Transactional 이 있기 때문에 run 완료 시 반영됨)

테스트 결과

방식	1차	2차
JDBC	501ms	394ms
JPA	60664ms	57702ms
MyBatis	1729ms	1614ms

2번 테스트로 돌려봤을 때 JDBC - MyBatis - JPA 순으로 빨랐다. JPA는 빠르다고 하긴 민망할정도;;

왜 이런 결과가 나올까?

먼저 가장 느린 JPA같은 경우에는 [쿠폰 성능 테스트]에서 했던 것처럼, persiste() 호출 마다 insert 쿼리가 1개씩 생성되고 1000개 단위로 flush 하면 그 때 1000개의 쿼리가 실행된다. 또한 내부적으로 Entity Mapping 도 해야하고 flush 단위로 DB통신을 하게되는데 테스트같은 경우는 100번의 통신 비용까지 추가되기 때문에 느릴 수 밖에 없다.

JDBC와 Mybatis를 보면 똑같이 10만개의 데이터를 values 로 처리하는 것을 확인할 수 있다. 근데 JDBC가 훨씬 빠른 이유는 뭔지 찾아봤다.

MyBatis는 내부적으로 객체 변환, SQL 파싱, 리플렉션 호출(실행 중 클래스나 메서드를 보고 조작하는 자바 기능)을 하기 때문에 row수만큼 부가 작업을 하고 있다고 보면 된다.

JDBC는 low-level로 오버헤드 없이 순수하게 DB 요청만 날리는 구조로, 단순 쿼리 실행만 하기 때문에 빠르다.

방식	내부 처리 방식	성능	오버헤드
JDBC	addBatch() → executeBatch() (1번 통신)	🟢 빠름	매우 낮음
JPA	persist() × 100,000 + flush × 100	🔴 매우 느림	매핑, 컨텍스트, 트랜잭션
MyBatis	foreach → SQL 한 번 실행	🟡 중간	SQL 조립 + 매핑 처리

오버헤드
어떤 작업을 처리하는 데 필요한 부가적인 시간, 메모리, 자원 소비

Entity 관리에 목적보다 속도에 초점을 맞춘다면 JDBC로 하는게 맞는 것 같다.

insert 를 동적으로 생성할 수 없다는게 걸리지만, DTO가 아니라 Map 형태로 구현하면 동적으로도 가능할 것 같긴하다.

테스트 전체 코드는 [관련 Git Commit] 참고

'프로젝트 > 대용량 이관' 카테고리의 다른 글

[대용량 이관] Book Job/Step 구성 (1)	2025.05.08
[대용량 이관] 데이터 찾기 (Kaggle) (0)	2025.05.07

현재글[대용량 이관] ItemWriter 선택 (JPA vs JDBC vs MyBatis)

할 수 있다

개발하는 개발자

GitHub

Spring, docker, mariadb, it, datasets, mybatis, Redis, JPA, 반려동물, Kaggle, 대용량이관, CustomException, springboot, Java, @Transactional, python, redisstream, 쿠폰, dockerdesktop, Project,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

할 수 있다