2016年6月20日

如何使用 Apache POI 處理 Excel 檔案

Apache POI 是 Poor Obfuscation Implementation 的縮寫,其目的是建立與讀取 Office Open XML(OOXML)標準和微軟的OLE 2復合文檔格式(OLE2)的Java API。

主要的元件有:

  • Excel (SS=HSSF+XSSF)
  • Word (HWPF+XWPF)
  • PowerPoint (HSLF+XSLF)
  • OpenXML4J (OOXML)
  • OLE2 Filesystem (POIFS): OLE 2 Compound Document format 的Java Implementation
  • OLE2 Document Props (HPSF): Open Packaging Conventions (OPC) 的 Java Implementation
  • Outlook (HSMF)
  • Visio (HDGF+XDGF)
  • TNEF (HMEF): Microsoft's TNEF (Transport Neutral Encoding Format),也就是 winmail.dat,用在 Outlook 跟 Exchange -Publisher (HPBF): Publisher file format

怎麼會命名成 Poor Obfuscation Implementation

POI 套件從 2001 年就開始了初始專案,由於 MS 的封閉特性,Office 檔案格式並沒有開放,作者就戲稱這個檔案格式是非常難以被理解,很混亂的一種檔案,就用了 Poor Obfuscation 這兩個字,當然還是成功地被 reverse-engineered,成就了這個專案,除了 MS Office 軟體之外,我們現在也可以用程式產生 Office 檔案。

後來在 2008 年,MS終於向 Sourcesense 提交了 ISO/IEC 29500:2008 Office Open XML file formats,等於是開放了 OOXML 的 Office 檔案的標準,POI 就以這個規格,實作了接下來的函式庫,支援這個 OOXML 的標準規格。

HSSF, XSSF

Excel 分為兩種檔案格式,比較舊的是 HSSFWorkbook,檔案格式為 Excel 1997-2003 版的Excel,副檔名是 xls,XSSFWorkbook 是 Excel 2007-10 的版本,副檔名是 xlsx。

基本的階層概念為一個 xlsx 檔案 XSSFWorkbook,裡面有多個工作表 XSSFSheet,每一個 Sheet 下面有一個表格,裡面有多列資料 XSSFRow,每列資料中有多欄資料儲存格 XSSFCell。

文件的資料階層是

XSSFWorkbook -> XSSFSheet -> XSSFRow -> XSSFCell

Read/Write XLSX file in Java

以下的 Java 程式會產生一個 Test.xlsx 檔案,包含兩個工作表。如果要處理的是 xls 檔案,只要把所有 XSSF 開頭的 class 都換成 HSSF 就可以了。

import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.hssf.util.HSSFColor;
import org.apache.poi.ss.usermodel.CellStyle;
import org.apache.poi.ss.usermodel.Font;
import org.apache.poi.xssf.usermodel.XSSFCell;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;

public class PoiTest {
    public static void readXLSXFile() throws IOException {
        InputStream ExcelFileToRead = new FileInputStream("Test.xlsx");
        XSSFWorkbook wb = new XSSFWorkbook(ExcelFileToRead);

        XSSFSheet sheet = wb.getSheetAt(0);
        XSSFRow row;
        XSSFCell cell;

        Iterator rows = sheet.rowIterator();

        while (rows.hasNext()) {
            row = (XSSFRow) rows.next();
            Iterator cells = row.cellIterator();
            while (cells.hasNext()) {
                cell = (XSSFCell) cells.next();

                if (cell.getCellType() == XSSFCell.CELL_TYPE_STRING) {
                    System.out.print(cell.getStringCellValue() + " ");
                } else if (cell.getCellType() == XSSFCell.CELL_TYPE_NUMERIC) {
                    System.out.print(cell.getNumericCellValue() + " ");
                } else {
                    //U Can Handel Boolean, Formula, Errors
                }
            }
            System.out.println();
        }

    }

    public static void writeXLSXFile() throws IOException {

        String excelFileName = "Test.xlsx";//name of excel file

        XSSFWorkbook wb = new XSSFWorkbook();

        Font titlefont = wb.createFont();
        titlefont.setColor(HSSFColor.BLACK.index);//顏色
        titlefont.setBoldweight(Font.BOLDWEIGHT_BOLD); //粗體

        CellStyle styleRow1 = wb.createCellStyle();
        styleRow1.setFillForegroundColor(HSSFColor.LIGHT_GREEN.index);//填滿顏色
        styleRow1.setFillPattern(HSSFCellStyle.SOLID_FOREGROUND);
        styleRow1.setFont(titlefont);//設定字體
        styleRow1.setAlignment(HSSFCellStyle.ALIGN_CENTER);//水平置中
        styleRow1.setVerticalAlignment(HSSFCellStyle.VERTICAL_CENTER);//垂直置中

        //設定框線
        styleRow1.setBorderBottom((short)1);
        styleRow1.setBorderTop((short)1);
        styleRow1.setBorderLeft((short)1);
        styleRow1.setBorderRight((short)1);
        styleRow1.setWrapText(true);//自動換行

        CellStyle styleRow2 = wb.createCellStyle();
        styleRow2.setAlignment(HSSFCellStyle.ALIGN_CENTER);//水平置中
        styleRow2.setVerticalAlignment(HSSFCellStyle.VERTICAL_CENTER);//垂直置中
        styleRow2.setBorderBottom((short)1);
        styleRow2.setBorderTop((short)1);
        styleRow2.setBorderLeft((short)1);
        styleRow2.setBorderRight((short)1);
        styleRow2.setWrapText(true);//自動換行


        String sheetName2 = "工作表2";//name of sheet
        XSSFSheet sheet2 = wb.createSheet(sheetName2);

        XSSFRow titlerow = sheet2.createRow(0);
        for (int c = 0; c < 6; c++) {
            XSSFCell cell = titlerow.createCell(c);
            cell.setCellStyle(styleRow1);
            cell.setCellValue("標題 Cell 0 " + c);
            sheet2.autoSizeColumn(c);   //自動調整欄位寬度
        }

        for (int r = 1; r < 10; r++) {
            XSSFRow row = sheet2.createRow(r);

            for (int c = 0; c < 5; c++) {
                XSSFCell cell = row.createCell(c);
                cell.setCellStyle(styleRow2);

                cell.setCellValue("中文 Cell " + r + " " + c);

                sheet2.autoSizeColumn(c);   //自動調整欄位寬度
            }

            XSSFCell cell = row.createCell(5);
            cell.setCellValue(100);
            sheet2.autoSizeColumn(5);
        }

        String sheetName = "工作表";//name of sheet

        XSSFSheet sheet = wb.createSheet(sheetName);

        for (int r = 0; r < 5; r++) {
            XSSFRow row = sheet.createRow(r);

            for (int c = 0; c < 5; c++) {
                XSSFCell cell = row.createCell(c);

                if( r==0 ) {
                    cell.setCellStyle(styleRow1);
                } else {
                    cell.setCellStyle(styleRow2);
                }

                cell.setCellValue("中文 title " + r + " " + c);

                sheet.autoSizeColumn(c);   //自動調整欄位寬度
            }
        }

        FileOutputStream fileOut = new FileOutputStream(excelFileName);

        wb.write(fileOut);
        fileOut.flush();
        fileOut.close();
    }

    public static void main(String[] args) throws IOException {
        writeXLSXFile();
        readXLSXFile();
    }

}

Read/Write XLSX file in Scala

將上面的 Java 版本的程式翻譯成 Scala 的寫法,其中有個部分要注意,在 Java 程式中,有一段 XSSFCell.CELLTYPESTRING,使用了 XSSFCell 實作的 Cell 介面,在 Cell 介面中定義的常數 CELLTYPESTRING。

雖然 Scala 跟 Java 相容,但是在 Scala 沒辦法直接使用 Java 介面裡面定義的常數,所以我們只有再用 Scala object 再定義一次常數。

import java.io.{FileInputStream, FileOutputStream, IOException, InputStream}
import java.util.Iterator

import org.apache.poi.hssf.util.HSSFColor
import org.apache.poi.ss.usermodel.{CellStyle, Font}
import org.apache.poi.xssf.usermodel.{XSSFCell, XSSFRow, XSSFSheet, XSSFWorkbook}

object Cell {
  val CELL_TYPE_NUMERIC: Int = 0
  val CELL_TYPE_STRING: Int = 1
  val CELL_TYPE_FORMULA: Int = 2
  val CELL_TYPE_BLANK: Int = 3
  val CELL_TYPE_BOOLEAN: Int = 4
  val CELL_TYPE_ERROR: Int = 5
}

object CellStyle {
  val ALIGN_GENERAL:Short = 0
  val ALIGN_LEFT:Short = 1
  val ALIGN_CENTER:Short = 2
  val ALIGN_RIGHT:Short = 3
  val ALIGN_FILL:Short = 4
  val ALIGN_JUSTIFY:Short = 5
  val ALIGN_CENTER_SELECTION:Short = 6
  
  val VERTICAL_TOP:Short = 0
  val VERTICAL_CENTER:Short = 1
  val VERTICAL_BOTTOM:Short = 2
  val VERTICAL_JUSTIFY:Short = 3
  
  val BORDER_NONE:Short = 0
  val BORDER_THIN:Short = 1
  val BORDER_MEDIUM:Short = 2
  val BORDER_DASHED:Short = 3
  val BORDER_HAIR:Short = 7
  val BORDER_THICK:Short = 5
  val BORDER_DOUBLE:Short = 6
  val BORDER_DOTTED:Short = 4
  val BORDER_MEDIUM_DASHED:Short = 8
  val BORDER_DASH_DOT:Short = 9
  val BORDER_MEDIUM_DASH_DOT:Short = 10
  val BORDER_DASH_DOT_DOT:Short = 11
  val BORDER_MEDIUM_DASH_DOT_DOT:Short = 12
  val BORDER_SLANTED_DASH_DOT:Short = 13
  
  val NO_FILL:Short = 0
  val SOLID_FOREGROUND:Short = 1
  val FINE_DOTS:Short = 2
  val ALT_BARS:Short = 3
  val SPARSE_DOTS:Short = 4
  val THICK_HORZ_BANDS:Short = 5
  val THICK_VERT_BANDS:Short = 6
  val THICK_BACKWARD_DIAG:Short = 7
  val THICK_FORWARD_DIAG:Short = 8
  val BIG_SPOTS:Short = 9
  val BRICKS:Short = 10
  val THIN_HORZ_BANDS:Short = 11
  val THIN_VERT_BANDS:Short = 12
  val THIN_BACKWARD_DIAG:Short = 13
  val THIN_FORWARD_DIAG:Short = 14
  val SQUARES:Short = 15
  val DIAMONDS:Short = 16
  val LESS_DOTS:Short = 17
  val LEAST_DOTS:Short = 18
}

object ScalaPoiTest {

  @throws(classOf[IOException])
  def readXLSXFile {
    val ExcelFileToRead: InputStream = new FileInputStream("Test.xlsx")
    val wb: XSSFWorkbook = new XSSFWorkbook(ExcelFileToRead)

    val sheet: XSSFSheet = wb.getSheetAt(0)
    var row: XSSFRow = null
    var cell: XSSFCell = null

    val rows: Iterator[_] = sheet.rowIterator
    while (rows.hasNext) {
      row = rows.next.asInstanceOf[XSSFRow]
      val cells: Iterator[_] = row.cellIterator
      while (cells.hasNext) {
        cell = cells.next.asInstanceOf[XSSFCell]

        if (cell.getCellType == Cell.CELL_TYPE_STRING) {
          System.out.print(cell.getStringCellValue + " ")
        }
        else if (cell.getCellType == Cell.CELL_TYPE_NUMERIC) {
          System.out.print(cell.getNumericCellValue + " ")
        }
        else {
        }
      }
      System.out.println
    }
  }

  @throws(classOf[IOException])
  def writeXLSXFile {
    val excelFileName: String = "Test.xlsx"

    val wb: XSSFWorkbook = new XSSFWorkbook

    val titlefont: Font = wb.createFont
    titlefont.setColor(HSSFColor.BLACK.index)
    titlefont.setBoldweight(Font.BOLDWEIGHT_BOLD)

    val styleRow1: CellStyle = wb.createCellStyle
    styleRow1.setFillForegroundColor(HSSFColor.LIGHT_GREEN.index)
    styleRow1.setFillPattern(CellStyle.SOLID_FOREGROUND)
    styleRow1.setFont(titlefont)
    styleRow1.setAlignment(CellStyle.ALIGN_CENTER)
    styleRow1.setVerticalAlignment(CellStyle.VERTICAL_CENTER)
    styleRow1.setBorderBottom(1)
    styleRow1.setBorderTop(1)
    styleRow1.setBorderLeft(1)
    styleRow1.setBorderRight(1)
    styleRow1.setWrapText(true)

    val styleRow2: CellStyle = wb.createCellStyle
    styleRow2.setAlignment(CellStyle.ALIGN_CENTER)
    styleRow2.setVerticalAlignment(CellStyle.VERTICAL_CENTER)
    styleRow2.setBorderBottom(1)
    styleRow2.setBorderTop(1)
    styleRow2.setBorderLeft(1)
    styleRow2.setBorderRight(1)
    styleRow2.setWrapText(true)


    val sheetName2: String = "Sheet2"
    val sheet2: XSSFSheet = wb.createSheet(sheetName2)
    val titlerow: XSSFRow = sheet2.createRow(0)
      for (c <- 0 to 6) {
          val cell: XSSFCell = titlerow.createCell(c)
          cell.setCellStyle(styleRow1)
          cell.setCellValue("標題 Cell 0 " + c)
          sheet2.autoSizeColumn(c)
      }

    for (r <- 1 to 5) {
      val row: XSSFRow = sheet2.createRow(r)

      for (c <- 0 to 5) {
        val cell: XSSFCell = row.createCell(c)
        cell.setCellValue("中文 Cell " + r + " " + c)
        sheet2.autoSizeColumn(c)
      }

      val cell: XSSFCell = row.createCell(6)
      cell.setCellValue(100)
      sheet2.autoSizeColumn(6)
    }

    val sheetName: String = "Sheet"
    val sheet: XSSFSheet = wb.createSheet(sheetName)

    for (r <- 0 to 4) {
      val row: XSSFRow = sheet.createRow(r)

      for (c <- 0 to 4) {
        val cell: XSSFCell = row.createCell(c)
        cell.setCellValue("中文 Cell " + r + " " + c)
        sheet.autoSizeColumn(c)
      }
    }

    val fileOut: FileOutputStream = new FileOutputStream(excelFileName)
    wb.write(fileOut)
    fileOut.flush
    fileOut.close
  }

  @throws(classOf[IOException])
  def main(args: Array[String]) {
    writeXLSXFile
    readXLSXFile

  }
}

Scala 不能使用 Java Interface constant members

以下這個 stackoverflow 討論中,有談到剛剛在 Scala 版本中遇到的問題:Scala can not resolve inherited Java interface constant members

基本上沒有什麼特殊的解決方案,在 Scala 就是不支援這樣子的寫法。

References

madan712/ReadWriteExcelFile.java

HSSF and XSSF Examples