As I was about to hack it up into something usable as a library I happened to be perusing some Scala docs and was reminded about Scala's built-in combinators. I gave it a shot even though there are others out there, though the existing docs/examples are pretty rough. I have to say after some effort it's pretty clean and far shorter than my ANTLR grammar at 23 lines (if you don't count the "parseCSV" convenience methods):
import scala.util.parsing.combinator.RegexParsers trait CSVParser extends RegexParsers { import scala.util.parsing.input.CharSequenceReader import scala.util.parsing.input.StreamReader import java.io._ override def skipWhitespace = false protected def records = repsep(record, """\r?\n|\r""".r) protected def record: Parser[List[String]] = repsep(field, ",".r) protected def field: Parser[String] = quoted_field | unquoted_field protected def quoted_field: Parser[String] = """"(""|[^"])*"""".r ^^ { s: String => s.substring(1, s.length()-1).replaceAll("\"\"", "\"") } protected def unquoted_field: Parser[String] = """[^,"\r\n]*""".r /** * Returns a list of CSV records, each a list of strings. If there were no records found or there was an error, None is returned */ def parseCSV(reader: scala.util.parsing.input.Reader[Elem]): Option[List[List[String]]] = { parseAll(records, reader) match { case s: Success[List[List[String]]] => Some(s.result) case _ => None } } def parseCSV(input: CharSequence): Option[List[List[String]]] = parseCSV(new CharSequenceReader(input)) def parseCSV(input: InputStream): Option[List[List[String]]] = parseCSV(StreamReader(new InputStreamReader(input))) def parseCSV(reader: Reader): Option[List[List[String]]] = parseCSV(StreamReader(reader)) def parseCSV(file: File): Option[List[List[String]]] = { val fr = new java.io.FileReader(file) try { parseCSV(fr) } finally { fr.close } } }Even with convenience wrapper methods, coming in at 48 lines to get to something readily usable is pretty impressive. I really do like ANTLR, but at this point I can't justify the effort of hacking up my grammar to get to something usable (in java).
Scala 1, ANTLR 0.