Class CSVParser
- All Implemented Interfaces:
Closeable,AutoCloseable,Iterable<CSVRecord>
CSVFormat.
The parser works record wise. It is not possible to go back, once a record has been parsed from the input stream.
Creating instances
There are several static factory methods that can be used to create instances for various types of resources:
parse(java.io.File, Charset, CSVFormat)parse(String, CSVFormat)parse(java.net.URL, java.nio.charset.Charset, CSVFormat)
Alternatively parsers can also be created by passing a Reader directly to the sole constructor.
For those who like fluent APIs, parsers can be created using CSVFormat.parse(java.io.Reader) as a shortcut:
for(CSVRecord record : CSVFormat.EXCEL.parse(in)) {
...
}
Parsing record wise
To parse a CSV input from a file, you write:
File csvData = new File("/path/to/csv");
CSVParser parser = CSVParser.parse(csvData, CSVFormat.RFC4180);
for (CSVRecord csvRecord : parser) {
...
}
This will read the parse the contents of the file using the RFC 4180 format.
To parse CSV input in a format like Excel, you write:
CSVParser parser = CSVParser.parse(csvData, CSVFormat.EXCEL);
for (CSVRecord csvRecord : parser) {
...
}
If the predefined formats don't match the format at hands, custom formats can be defined. More information about
customising CSVFormats is available in CSVFormat Javadoc.
Parsing into memory
If parsing record wise is not desired, the contents of the input can be read completely into memory.
Reader in = new StringReader("a;b\nc;d");
CSVParser parser = new CSVParser(in, CSVFormat.EXCEL);
List<CSVRecord> list = parser.getRecords();
There are two constraints that have to be kept in mind:
- Parsing into memory starts at the current position of the parser. If you have already parsed records from the input, those records will not end up in the in memory representation of your CSV data.
- Parsing into memory may consume a lot of system resources depending on the input. For example if you're parsing a 150MB file of CSV data the contents will be read completely into memory.
Notes
Internal parser state is completely covered by the format and the reader-state.
- See Also:
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescription(package private) classprivate static final classHeader information based on name and position. -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate final longLexer offset when the parser does not start parsing at the beginning of the source.private final CSVParser.CSVRecordIteratorprivate final CSVFormatprivate final CSVParser.Headersprivate final LexerA record buffer for getRecord().private longThe next record number to assign.private final Token -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate voidaddRecordValue(boolean lastRecord) voidclose()Closes resources.private CSVParser.HeadersCreates the name to index mapping if the format defines a header.longReturns the current line number in the input stream.Gets the first end-of-line string encountered.Returns a copy of the header map.Returns the header map.Returns a read-only list of header names that iterates in column order.longReturns the current record number in the input stream.Parses the CSV input according to the given format and returns the content as a list ofCSVRecords.private StringhandleNull(String input) Handle whether input is parsed as nullbooleanisClosed()Tests whether this parser is closed.private booleaniterator()Returns the record iterator.(package private) CSVRecordParses the next record from the current point in the stream.static CSVParserCreates a parser for the givenFile.static CSVParserparse(InputStream inputStream, Charset charset, CSVFormat format) Creates a CSV parser using the givenCSVFormat.static CSVParserCreates a CSV parser using the givenCSVFormatstatic CSVParserCreates a parser for the givenString.static CSVParserCreates and returns a parser for the given URL, which the caller MUST close.static CSVParserCreates and returns a parser for the givenPath, which the caller MUST close.stream()Returns a sequentialStreamwith this collection as its source.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface java.lang.Iterable
forEach, spliterator
-
Field Details
-
format
-
headers
-
lexer
-
csvRecordIterator
-
recordList
A record buffer for getRecord(). Grows as necessary and is reused. -
recordNumber
private long recordNumberThe next record number to assign. -
characterOffset
private final long characterOffsetLexer offset when the parser does not start parsing at the beginning of the source. Usually used in combination withrecordNumber. -
reusableToken
-
-
Constructor Details
-
CSVParser
Customized CSV parser using the givenCSVFormatIf you do not read all records from the given
reader, you should callclose()on the parser, unless you close thereader.- Parameters:
reader- a Reader containing CSV-formatted input. Must not be null.format- the CSVFormat used for CSV parsing. Must not be null.- Throws:
IllegalArgumentException- If the parameters of the format are inconsistent or if either reader or format are null.IOException- If there is a problem reading the header or skipping the first record
-
CSVParser
public CSVParser(Reader reader, CSVFormat format, long characterOffset, long recordNumber) throws IOException Customized CSV parser using the givenCSVFormatIf you do not read all records from the given
reader, you should callclose()on the parser, unless you close thereader.- Parameters:
reader- a Reader containing CSV-formatted input. Must not be null.format- the CSVFormat used for CSV parsing. Must not be null.characterOffset- Lexer offset when the parser does not start parsing at the beginning of the source.recordNumber- The next record number to assign- Throws:
IllegalArgumentException- If the parameters of the format are inconsistent or if either reader or format are null.IOException- If there is a problem reading the header or skipping the first record- Since:
- 1.1
-
-
Method Details
-
parse
Creates a parser for the givenFile.- Parameters:
file- a CSV file. Must not be null.charset- The Charset to decode the given file.format- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new parser
- Throws:
IllegalArgumentException- If the parameters of the format are inconsistent or if either file or format are null.IOException- If an I/O error occurs
-
parse
public static CSVParser parse(InputStream inputStream, Charset charset, CSVFormat format) throws IOException Creates a CSV parser using the givenCSVFormat.If you do not read all records from the given
reader, you should callclose()on the parser, unless you close thereader.- Parameters:
inputStream- an InputStream containing CSV-formatted input. Must not be null.charset- The Charset to decode the given file.format- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new CSVParser configured with the given reader and format.
- Throws:
IllegalArgumentException- If the parameters of the format are inconsistent or if either reader or format are null.IOException- If there is a problem reading the header or skipping the first record- Since:
- 1.5
-
parse
Creates and returns a parser for the givenPath, which the caller MUST close.- Parameters:
path- a CSV file. Must not be null.charset- The Charset to decode the given file.format- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new parser
- Throws:
IllegalArgumentException- If the parameters of the format are inconsistent or if either file or format are null.IOException- If an I/O error occurs- Since:
- 1.5
-
parse
Creates a CSV parser using the givenCSVFormatIf you do not read all records from the given
reader, you should callclose()on the parser, unless you close thereader.- Parameters:
reader- a Reader containing CSV-formatted input. Must not be null.format- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new CSVParser configured with the given reader and format.
- Throws:
IllegalArgumentException- If the parameters of the format are inconsistent or if either reader or format are null.IOException- If there is a problem reading the header or skipping the first record- Since:
- 1.5
-
parse
Creates a parser for the givenString.- Parameters:
string- a CSV string. Must not be null.format- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new parser
- Throws:
IllegalArgumentException- If the parameters of the format are inconsistent or if either string or format are null.IOException- If an I/O error occurs
-
parse
Creates and returns a parser for the given URL, which the caller MUST close.If you do not read all records from the given
url, you should callclose()on the parser, unless you close theurl.- Parameters:
url- a URL. Must not be null.charset- the charset for the resource. Must not be null.format- the CSVFormat used for CSV parsing. Must not be null.- Returns:
- a new parser
- Throws:
IllegalArgumentException- If the parameters of the format are inconsistent or if either url, charset or format are null.IOException- If an I/O error occurs
-
addRecordValue
private void addRecordValue(boolean lastRecord) -
close
Closes resources.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Throws:
IOException- If an I/O error occurs
-
createEmptyHeaderMap
-
createHeaders
Creates the name to index mapping if the format defines a header.- Returns:
- null if the format has no header.
- Throws:
IOException- if there is a problem reading the header or skipping the first record
-
getCurrentLineNumber
public long getCurrentLineNumber()Returns the current line number in the input stream.ATTENTION: If your CSV input has multi-line values, the returned number does not correspond to the record number.
- Returns:
- current line number
-
getFirstEndOfLine
Gets the first end-of-line string encountered.- Returns:
- the first end-of-line string
- Since:
- 1.5
-
getHeaderMap
Returns a copy of the header map.The map keys are column names. The map values are 0-based indices.
Note: The map can only provide a one-to-one mapping when the format did not contain null or duplicate column names.
- Returns:
- a copy of the header map.
-
getHeaderMapRaw
Returns the header map.- Returns:
- the header map.
-
getHeaderNames
Returns a read-only list of header names that iterates in column order.Note: The list provides strings that can be used as keys in the header map. The list will not contain null column names if they were present in the input format.
- Returns:
- read-only list of header names that iterates in column order.
- Since:
- 1.7
- See Also:
-
getRecordNumber
public long getRecordNumber()Returns the current record number in the input stream.ATTENTION: If your CSV input has multi-line values, the returned number does not correspond to the line number.
- Returns:
- current record number
-
getRecords
Parses the CSV input according to the given format and returns the content as a list ofCSVRecords.The returned content starts at the current parse-position in the stream.
- Returns:
- list of
CSVRecords, may be empty - Throws:
IOException- on parse error or input read-failure
-
handleNull
Handle whether input is parsed as null- Parameters:
input- the cell data to further processed- Returns:
- null if input is parsed as null, or input itself if input isn't parsed as null
-
isClosed
public boolean isClosed()Tests whether this parser is closed.- Returns:
- whether this parser is closed.
-
isStrictQuoteMode
private boolean isStrictQuoteMode()- Returns:
- true if the format's
QuoteModeisQuoteMode.ALL_NON_NULLorQuoteMode.NON_NUMERIC.
-
iterator
Returns the record iterator.An
IOExceptioncaught during the iteration are re-thrown as anIllegalStateException.If the parser is closed a call to
Iterator.next()will throw aNoSuchElementException. -
nextRecord
Parses the next record from the current point in the stream.- Returns:
- the record as an array of values, or
nullif the end of the stream has been reached - Throws:
IOException- on parse error or input read-failure
-
stream
Returns a sequentialStreamwith this collection as its source.- Returns:
- a sequential
Streamwith this collection as its source. - Since:
- 1.9.0
-