As I was about to hack it up into something usable as a library I happened to be perusing some Scala docs and was reminded about Scala's built-in combinators. I gave it a shot even though there are others out there, though the existing docs/examples are pretty rough. I have to say after some effort it's pretty clean and far shorter than my ANTLR grammar at 23 lines (if you don't count the "parseCSV" convenience methods):
import scala.util.parsing.combinator.RegexParsers
trait CSVParser extends RegexParsers {
import scala.util.parsing.input.CharSequenceReader
import scala.util.parsing.input.StreamReader
import java.io._
override def skipWhitespace = false
protected def records = repsep(record, """\r?\n|\r""".r)
protected def record: Parser[List[String]] = repsep(field, ",".r)
protected def field: Parser[String] = quoted_field | unquoted_field
protected def quoted_field: Parser[String] = """"(""|[^"])*"""".r ^^ {
s: String => s.substring(1, s.length()-1).replaceAll("\"\"", "\"")
}
protected def unquoted_field: Parser[String] = """[^,"\r\n]*""".r
/**
* Returns a list of CSV records, each a list of strings. If there were no records found or there was an error, None is returned
*/
def parseCSV(reader: scala.util.parsing.input.Reader[Elem]): Option[List[List[String]]] = {
parseAll(records, reader) match {
case s: Success[List[List[String]]] => Some(s.result)
case _ => None
}
}
def parseCSV(input: CharSequence): Option[List[List[String]]] = parseCSV(new CharSequenceReader(input))
def parseCSV(input: InputStream): Option[List[List[String]]] = parseCSV(StreamReader(new InputStreamReader(input)))
def parseCSV(reader: Reader): Option[List[List[String]]] = parseCSV(StreamReader(reader))
def parseCSV(file: File): Option[List[List[String]]] = {
val fr = new java.io.FileReader(file)
try {
parseCSV(fr)
} finally {
fr.close
}
}
}Even with convenience wrapper methods, coming in at 48 lines to get to something readily usable is pretty impressive. I really do like ANTLR, but at this point I can't justify the effort of hacking up my grammar to get to something usable (in java).Scala 1, ANTLR 0.
1 comment:
Yawwwwwwnnn!! :)
Post a Comment