前言
動(dòng)態(tài)數(shù)據(jù)導(dǎo)出是一般項(xiàng)目都會(huì)涉及到的功能。它的基本實(shí)現(xiàn)邏輯就是從mysql查詢數(shù)據(jù),加載到內(nèi)存,然后從內(nèi)存創(chuàng)建excel或者csv,以流的形式響應(yīng)給前端。
參考:https://grokonez.com/spring-framework/spring-boot/excel-file-download-from-springboot-restapi-apache-poi-mysql。
SpringBoot下載excel基本都是這么干。
雖然這是個(gè)可行的方案,然而一旦mysql數(shù)據(jù)量太大,達(dá)到十萬級(jí),百萬級(jí),千萬級(jí),大規(guī)模數(shù)據(jù)加載到內(nèi)存必然會(huì)引起OutofMemoryError
。
要考慮如何避免OOM,一般有兩個(gè)方面的思路。
一方面就是盡量不做唄,先懟產(chǎn)品下面幾個(gè)問題?。?/p>
-
我們?yōu)槭裁匆獙?dǎo)出這么多數(shù)據(jù)呢?誰傻到去看這么大的數(shù)據(jù)啊,這個(gè)設(shè)計(jì)是不是合理的呢?
-
怎么做好權(quán)限控制?百萬級(jí)數(shù)據(jù)導(dǎo)出你確定不會(huì)泄露商業(yè)機(jī)密?
-
如果要導(dǎo)出百萬級(jí)數(shù)據(jù),那為什么不直接找大數(shù)據(jù)或者DBA來干呢?然后以郵件形式傳遞不行嗎?
-
為什么要通過后端的邏輯來實(shí)現(xiàn),不考慮時(shí)間成本,流量成本嗎?
-
如果通過分頁導(dǎo)出,每次點(diǎn)擊按鈕只導(dǎo)2萬條,分批導(dǎo)出難道不能滿足業(yè)務(wù)需求嗎?
如果產(chǎn)品說 “甲方是爸爸,你去和甲方說啊”,“客戶說這個(gè)做出來,才考慮付尾款!”,如果客戶的確缺根筋要讓你這樣搞, 那就只能從技術(shù)上考慮如何實(shí)現(xiàn)了。
從技術(shù)上講,為了避免OOM,我們一定要注意一個(gè)原則:
不能將全量數(shù)據(jù)一次性加載到內(nèi)存之中。
全量加載不可行,那我們的目標(biāo)就是如何實(shí)現(xiàn)數(shù)據(jù)的分批加載了。實(shí)事上,Mysql本身支持Stream查詢,我們可以通過Stream流獲取數(shù)據(jù),然后將數(shù)據(jù)逐條刷入到文件中,每次刷入文件后再從內(nèi)存中移除這條數(shù)據(jù),從而避免OOM。
由于采用了數(shù)據(jù)逐條刷入文件,而且數(shù)據(jù)量達(dá)到百萬級(jí),所以文件格式就不要采用excel了,excel2007最大才支持104萬行的數(shù)據(jù)。這里推薦:
以csv代替excel。
考慮到當(dāng)前SpringBoot持久層框架通常為JPA和mybatis,我們可以分別從這兩個(gè)框架實(shí)現(xiàn)百萬級(jí)數(shù)據(jù)導(dǎo)出的方案。
JPA實(shí)現(xiàn)百萬級(jí)數(shù)據(jù)導(dǎo)出
具體方案不妨參考:http://knes1.github.io/blog/2015/2015-10-19-streaming-mysql-results-using-java8-streams-and-spring-data.html。
實(shí)現(xiàn)項(xiàng)目對(duì)應(yīng):
https://github.com/knes1/todo
核心注解如下,需要加入到具體的Repository
之上。方法的返回類型定義成Stream。Integer.MIN_VALUE
告訴jdbc driver
逐條返回?cái)?shù)據(jù)。
@QueryHints(value?=?@QueryHint(name?=?HINT_FETCH_SIZE,?value?=?""?+?Integer.MIN_VALUE))
@Query(value?=?"select?t?from?Todo?t")
Stream<Todo>?streamAll();
此外還需要在Stream處理數(shù)據(jù)的方法之上添加@Transactional(readOnly = true)
,保證事物是只讀的。
同時(shí)需要注入javax.persistence.EntityManager
,通過detach從內(nèi)存中移除已經(jīng)使用后的對(duì)象。
@RequestMapping(value?=?"/todos.csv",?method?=?RequestMethod.GET)
@Transactional(readOnly?=?true)
public?void?exportTodosCSV(HttpServletResponse?response)?{
?response.addHeader("Content-Type",?"application/csv");
?response.addHeader("Content-Disposition",?"attachment;?filename=todos.csv");
?response.setCharacterEncoding("UTF-8");
?try(Stream<Todo>?todoStream?=?todoRepository.streamAll())?{
??PrintWriter?out?=?response.getWriter();
??todoStream.forEach(rethrowConsumer(todo?->?{
???String?line?=?todoToCSV(todo);
???out.write(line);
???out.write("\n");
???entityManager.detach(todo);
??}));
??out.flush();
?}?catch?(IOException?e)?{
??log.info("Exception?occurred?"?+?e.getMessage(),?e);
??throw?new?RuntimeException("Exception?occurred?while?exporting?results",?e);
?}
}
MyBatis實(shí)現(xiàn)百萬級(jí)數(shù)據(jù)導(dǎo)出
MyBatis實(shí)現(xiàn)逐條獲取數(shù)據(jù),必須要自定義ResultHandler
,然后在mapper.xml文件中,對(duì)應(yīng)的select語句中添加fetchSize="-2147483648"
。
最后將自定義的ResultHandler傳給SqlSession來執(zhí)行查詢,并將返回的結(jié)果進(jìn)行處理。
MyBatis實(shí)現(xiàn)百萬級(jí)數(shù)據(jù)導(dǎo)出的具體實(shí)例
以下是基于MyBatis Stream
導(dǎo)出的完整的工程樣例,我們將通過對(duì)比Stream文件導(dǎo)出和傳統(tǒng)方式導(dǎo)出的內(nèi)存占用率的差異,來驗(yàn)證Stream文件導(dǎo)出的有效性。
我們先定義一個(gè)工具類DownloadProcessor
,它內(nèi)部封裝一個(gè)HttpServletResponse
對(duì)象,用來將對(duì)象寫入到csv。
public?class?DownloadProcessor?{
????private?final?HttpServletResponse?response;
?????
????public?DownloadProcessor(HttpServletResponse?response)?{
????????this.response?=?response;
????????String?fileName?=?System.currentTimeMillis()?+?".csv";
????????this.response.addHeader("Content-Type",?"application/csv");
????????this.response.addHeader("Content-Disposition",?"attachment;?filename="+fileName);
????????this.response.setCharacterEncoding("UTF-8");
????}
?????
????public?<E>?void?processData(E?record)?{
????????try?{
????????????response.getWriter().write(record.toString());?//如果是要寫入csv,需要重寫toString,屬性通過","分割
????????????response.getWriter().write("\n");
????????}catch?(IOException?e){
????????????e.printStackTrace();
????????}
????}
}
然后通過實(shí)現(xiàn)org.apache.ibatis.session.ResultHandler
,自定義我們的ResultHandler
,它用于獲取java對(duì)象,然后傳遞給上面的DownloadProcessor
處理類進(jìn)行寫文件操作:
public?class?CustomResultHandler?implements?ResultHandler?{
????private?final?DownloadProcessor?downloadProcessor;
?????
????public?CustomResultHandler(
????????????DownloadProcessor?downloadProcessor)?{
????????super();
????????this.downloadProcessor?=?downloadProcessor;
????}
?????
????@Override
????public?void?handleResult(ResultContext?resultContext)?{
????????Authors?authors?=?(Authors)resultContext.getResultObject();
????????downloadProcessor.processData(authors);
????}
}
實(shí)體類:
public?class?Authors?{
????private?Integer?id;
????private?String?firstName;
?????
????private?String?lastName;
?????
????private?String?email;
?????
????private?Date?birthdate;
?????
????private?Date?added;
?????
????public?Integer?getId()?{
????????return?id;
????}
?????
????public?void?setId(Integer?id)?{
????????this.id?=?id;
????}
?????
????public?String?getFirstName()?{
????????return?firstName;
????}
?????
????public?void?setFirstName(String?firstName)?{
????????this.firstName?=?firstName?==?null???null?:?firstName.trim();
????}
?????
????public?String?getLastName()?{
????????return?lastName;
????}
?????
????public?void?setLastName(String?lastName)?{
????????this.lastName?=?lastName?==?null???null?:?lastName.trim();
????}
?????
????public?String?getEmail()?{
????????return?email;
????}
?????
????public?void?setEmail(String?email)?{
????????this.email?=?email?==?null???null?:?email.trim();
????}
?????
????public?Date?getBirthdate()?{
????????return?birthdate;
????}
?????
????public?void?setBirthdate(Date?birthdate)?{
????????this.birthdate?=?birthdate;
????}
?????
????public?Date?getAdded()?{
????????return?added;
????}
?????
????public?void?setAdded(Date?added)?{
????????this.added?=?added;
????}
?????
????@Override
????public?String?toString()?{
????????return?this.id?+?","?+?this.firstName?+?","?+?this.lastName?+?","?+?this.email?+?","?+?this.birthdate?+?","?+?this.added;
????}
}
Mapper接口:
public?interface?AuthorsMapper?{
???List<Authors>?selectByExample(AuthorsExample?example);
????
???List<Authors>?streamByExample(AuthorsExample?example);?//以stream形式從mysql獲取數(shù)據(jù)
}
Mapper xml文件核心片段,以下兩條select的唯一差異就是在stream獲取數(shù)據(jù)的方式中多了一條屬性:fetchSize="-2147483648"
<select?id="selectByExample"?parameterType="com.alphathur.mysqlstreamingexport.domain.AuthorsExample"?resultMap="BaseResultMap">
????select
????<if?test="distinct">
??????distinct
????</if>
????'false'?as?QUERYID,
????<include?refid="Base_Column_List"?/>
????from?authors
????<if?test="_parameter?!=?null">
??????<include?refid="Example_Where_Clause"?/>
????</if>
????<if?test="orderByClause?!=?null">
??????order?by?${orderByClause}
????</if>
??</select>
??<select?id="streamByExample"?fetchSize="-2147483648"?parameterType="com.alphathur.mysqlstreamingexport.domain.AuthorsExample"?resultMap="BaseResultMap">
????select
????<if?test="distinct">
??????distinct
????</if>
????'false'?as?QUERYID,
????<include?refid="Base_Column_List"?/>
????from?authors
????<if?test="_parameter?!=?null">
??????<include?refid="Example_Where_Clause"?/>
????</if>
????<if?test="orderByClause?!=?null">
??????order?by?${orderByClause}
????</if>
??</select>
獲取數(shù)據(jù)的核心service如下,由于只做個(gè)簡(jiǎn)單演示,就懶得寫成接口了。其中?streamDownload
?方法即為stream取數(shù)據(jù)寫文件的實(shí)現(xiàn),它將以很低的內(nèi)存占用從MySQL獲取數(shù)據(jù);此外還提供traditionDownload
方法,它是一種傳統(tǒng)的下載方式,批量獲取全部數(shù)據(jù),然后將每個(gè)對(duì)象寫入文件。
@Service
public?class?AuthorsService?{
????private?final?SqlSessionTemplate?sqlSessionTemplate;
????private?final?AuthorsMapper?authorsMapper;
????public?AuthorsService(SqlSessionTemplate?sqlSessionTemplate,?AuthorsMapper?authorsMapper)?{
????????this.sqlSessionTemplate?=?sqlSessionTemplate;
????????this.authorsMapper?=?authorsMapper;
????}
????/**
?????*?stream讀數(shù)據(jù)寫文件方式
?????*?@param?httpServletResponse
?????*?@throws?IOException
?????*/
????public?void?streamDownload(HttpServletResponse?httpServletResponse)
????????????throws?IOException?{
????????AuthorsExample?authorsExample?=?new?AuthorsExample();
????????authorsExample.createCriteria();
????????HashMap<String,?Object>?param?=?new?HashMap<>();
????????param.put("oredCriteria",?authorsExample.getOredCriteria());
????????param.put("orderByClause",?authorsExample.getOrderByClause());
????????CustomResultHandler?customResultHandler?=?new?CustomResultHandler(new?DownloadProcessor?(httpServletResponse));
????????sqlSessionTemplate.select(
????????????????"com.alphathur.mysqlstreamingexport.mapper.AuthorsMapper.streamByExample",?param,?customResultHandler);
????????httpServletResponse.getWriter().flush();
????????httpServletResponse.getWriter().close();
????}
????/**
?????*?傳統(tǒng)下載方式
?????*?@param?httpServletResponse
?????*?@throws?IOException
?????*/
????public?void?traditionDownload(HttpServletResponse?httpServletResponse)
????????????throws?IOException?{
????????AuthorsExample?authorsExample?=?new?AuthorsExample();
????????authorsExample.createCriteria();
????????List<Authors>?authors?=?authorsMapper.selectByExample?(authorsExample);
????????DownloadProcessor?downloadProcessor?=?new?DownloadProcessor?(httpServletResponse);
????????authors.forEach?(downloadProcessor::processData);
????????httpServletResponse.getWriter().flush();
????????httpServletResponse.getWriter().close();
????}
}
下載的入口controller:
@RestController
@RequestMapping("download")
public?class?HelloController?{
????private?final?AuthorsService?authorsService;
????public?HelloController(AuthorsService?authorsService)?{
????????this.authorsService?=?authorsService;
????}
????@GetMapping("streamDownload")
????public?void?streamDownload(HttpServletResponse?response)
????????????throws?IOException?{
????????authorsService.streamDownload(response);
????}
????@GetMapping("traditionDownload")
????public?void?traditionDownload(HttpServletResponse?response)
????????????throws?IOException?{
????????authorsService.traditionDownload?(response);
????}
}???
實(shí)體類對(duì)應(yīng)的表結(jié)構(gòu)創(chuàng)建語句:
CREATE?TABLE?`authors`?(
??`id`?int(11)?NOT?NULL?AUTO_INCREMENT,
??`first_name`?varchar(50)?CHARACTER?SET?utf8?COLLATE?utf8_unicode_ci?NOT?NULL,
??`last_name`?varchar(50)?CHARACTER?SET?utf8?COLLATE?utf8_unicode_ci?NOT?NULL,
??`email`?varchar(100)?CHARACTER?SET?utf8?COLLATE?utf8_unicode_ci?NOT?NULL,
??`birthdate`?date?NOT?NULL,
??`added`?timestamp?NOT?NULL?DEFAULT?CURRENT_TIMESTAMP,
??PRIMARY?KEY?(`id`)
)?ENGINE=InnoDB?AUTO_INCREMENT=10095?DEFAULT?CHARSET=utf8?COLLATE=utf8_unicode_ci;
這里有個(gè)問題:如何短時(shí)間內(nèi)創(chuàng)建大批量測(cè)試數(shù)據(jù)到MySQL呢?一種方式是使用存儲(chǔ)過程 + 大殺器 select insert 語句
!不太懂?
沒關(guān)系,且看我另一篇文章 MySQL如何生成大批量測(cè)試數(shù)據(jù) 你就會(huì)明白了。如果你懶得看,我這里已經(jīng)將生成的270多萬條測(cè)試數(shù)據(jù)上傳到網(wǎng)盤,你直接下載然后通過navicat導(dǎo)入就好了。
鏈接:https://pan.baidu.com/s/1hqnWU2JKlL4Tb9nWtJl4sw
提取碼:nrp0
有了測(cè)試數(shù)據(jù),我們就可以直接測(cè)試了。先啟動(dòng)項(xiàng)目,然后打開jdk bin目錄下的?jconsole.exe
首先我們測(cè)試傳統(tǒng)方式下載文件的內(nèi)存占用,直接瀏覽器訪問:http://localhost:8080/download/traditionDownload
。
可以看出,下載開始前內(nèi)存占用大概為幾十M,下載開始后內(nèi)存占用急速上升,峰值達(dá)到接近2.5G,即使是下載完成,堆內(nèi)存也維持一個(gè)較高的占用,這實(shí)在是太可怕了,如果生產(chǎn)環(huán)境敢這么搞,不出意外肯定內(nèi)存溢出。
接著我們測(cè)試stream方式文件下載的內(nèi)存占用,瀏覽器訪問:http://localhost:8080/download/streamDownload
,當(dāng)下載開始后,內(nèi)存占用也會(huì)有一個(gè)明顯的上升,但是峰值才到500M。對(duì)比于上面的方式,內(nèi)存占用率足足降低了80%!怎么樣,興奮了嗎!
文章來源:http://www.zghlxwxcb.cn/news/detail-412171.html
我們?cè)偻ㄟ^記事本打開下載后的兩個(gè)文件,發(fā)現(xiàn)內(nèi)容沒有缺斤少兩,都是2727127行,完美!文章來源地址http://www.zghlxwxcb.cn/news/detail-412171.html
到了這里,關(guān)于CTO:給我一個(gè)SpringBoot實(shí)現(xiàn)MySQL百萬級(jí)數(shù)據(jù)量導(dǎo)出并避免OOM的解決方案的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!