|
[
Permlink
| « Hide
]
kenneth westelinck added a comment - 07/Apr/06 07:10 AM
Attached you can find working source, based on CSVReader and CSVWriter from opencsv (http://opencsv.sourceforge.net/
If a CSV file can be read into a JDBC RecordSet, the rest could be handled by MULE-756.
Yes, but then you will need a RecordSet to CSV converter. I think my approach is easier, since it just transforms a CSV file to a list of String[] or String[][] and then I simply let the XStream serialize the result and vice versa.
I've taken a good look at some of the available CSV parsers and opencsv does look like a good choice.
Holger, have you checked http://servingxml.sourceforge.net/
Andrew, thanks for the link to ServingXML - looks very interesting but also way more complicated than a "simple" CSV parser. The things I looked out for were mostly:
While ServingXML could probably be used as a substitute for any other CSV reader/writer it looks like a lot more work to integrate, and its heavy bias towards XML is not always a good thing. Sometimes a List of Strings really works just fine, especially if you want to populate a regular object. Yes, it looks powerful when used on its own, but may be too much for a fine-grained service, just wanted to run it by you. We will definitely have a plain Csv2Xml converter without all the bells and whistles, and keep ServingXML in mind (in case a need for smth like that arises).
Looks like things are coming together: http://jakarta.apache.org/commons/sandbox/csv/
Aims to unify all the different implementations that are out there, based on OpenCSV. About time.. Still waiting for the contributor's agreement. Ross?
CLA received so this can go into sandbox for further development. thanks
Initial commit to the sandbox in r3061, with created POM, 1.4-backported and cleaned up code. Still needs some more love and especially proper testing.
Can I help on this? If so, please give me some hints.
A couple things that I found during inital code read -
However none of this is really critical since it's not going to be in rc5 - maybe rc6 if we do one, more likely 1.3 final. If you find the time for some patches or more tests it would be appreciated. You can attach anything to this issue or start a csv-test project in the new contributor repo, see http://xircles.codehaus.org/projects/mule/repo/contrib One test I'd like to see is stability, memory and performance behaviour with a big CSV (or XML) file: >10 megabytes at least, more if you find the time. These test resources should probably be generated dynamically or be stored in compressed form and unzipped on the fly. It would be nice if the transformer could handle an arbitrarily big XML file and directly pipe it into a CSV file on disk without using loads of memory. However we don't have stream support for transformers in the official API yet, so this may or may not be possible for now; let us know what you find out. Thanks Hi Kenneth,
Thanks for the submission. This is now avaialble at http://sven.codehaus.org/mule-contrib/modules/csv You should have commit access to this an to build the module, use Maven 2.0.4 mvn install Just for the record:
I've deleted the obsolete attachment to avoid confusion.
Since I find myself doing this, I'll take it
However, I have a transformer I've used in the past to do simple CSV/XML transformations. Will post it for review when I locate it. This has worked out well so far. I've done a bit of rework in some areas, and added a few options. After proper testing, will add it to trunk
Very good news! Please make sure the existing test (the little main driver in the sandbox) is added as proper AbstractTransformerTestCase, though if proper round-tripping is not possible for some reason, a "normal" test is of course fine too.
Btw, OpenCSV can also deal with JDBC RecordSets - maybe this helps with the items mentioned at the top of this issue? Thanks I have completed my refactoring of this transformer. It is now three transformers: CSVToList, ListToCSV and MapToCSV. Having it serve xstream directly was too restrictive. Returning a List (of Maps) allows us to split with the array splitter, which fits at least two user scenarios I'm aware of. You can always convert to XML later.
I'm also putting this in the file transport. I know it doesn't quite fit, but it now certainly now doesn't fit with the xml module. In most cases, people will be dealing the csv files anyhow. Closed in r4810. I have been convinced by Holger that is it logical to put these three CSV transformers in their own module. We had a discussion about it on Friday. If you disagree, talk to the man! This commit has a test case. I also update the site docs.
A note on speed: I tested with an 80-column, 7000-line CSV file. My test was reading the file in, converting with CSVToMapList, splitting with the FilteringListMessageSplitter, sending each Map to a JMS queue, reading each Map back out, convering back to CSV with MapToCSV and storing in a separate file (this was a client scenario, that's why I followed this process). Total time to do this, and write the 7000 files, was under 1 minute. So OpenCSV and these transformers seem to function quite nicely. reopen for fix-version
Reopening because it has some problems. The idea of using Maps with the column number as key will not work as implemented, since the values of a Map can (and will!) be iterated in undefined order, depending on JDK version. Theoretically you could use TreeMaps or sort things by key but all that is way too complicatred. IMHO it would be easier to just use Lists (i.e. a List of List elements) which have implicit ordering.
r5027.
Much of this comes out of Holger's & my conversation. Renamed transformers to try to minimize confusion. Better handling of bad input data. Ordering of output data determined by labels, whether provided or extracted. Added tests to make sure a sloppy CSV file is correctly transformed. Holger, please open another JIRA if you want to do your MapToXML transformer in the xml module. I don't think we need a special CSVToXml transformer, since we can just chain them. Actually the current implementation is still 80% not what I suggested since it still uses !"§$% HashMaps as general data structure
Reopening since it is still incomplete and has the wrong title.
Don't forget there are use cases for sending the Maps!!! We have to have a way to tie labels to values, especially when the output is split up. You can add a CSVToList, but don't remove CSVToMaps.
Don't worry, there is nothing wrong with having CSVToMaps; the abstractions by CSVInput/OutputParser just make it very difficult to extend. Also using a Writer for all output is too restrictive.
After talking to several people at MuleCon I have come to the conclusion that rewriting our own flat file support is too much work (even when using OpenCSV) and likely not good enough for many people's needs. Despite being on holiday I have read through the documentation & APIs of PZFileReader (http://pzfilereader.sourceforge.net/
The new module could e.g. be named module-flatfile or something similar. Dirk, for further development I suggest you remove module-csv from trunk and into the sandbox where you can go wild.
I don't want to be a nag, but does PZFileReader support the following output file:
"C1","C2", "C3" "val1",12,9.0 "val2", 5,3.5 I have to interface with an application that only reads the above format. So columns containing text should be placed between "-characters (or whatever character is used as text delimiter). Columns containing numeric values should not contain quotes. OpenCSV does not support this, so I've used ServingXML for this. Also, from what I can read, PZFileReader only supports parsing CSV files (hence its name I guess I realize ServingXML is less lightweight, but it supports almost anything, even reading/writing EDI files (see MULE-758). PZFileReader also supports flat files with fixed length or delimited records, just check the docs. This is exactly why I got interested in it in the first place.
Oh and btw there will still be a ServingXML transformer (thanks for starting on that!) but we need to start somewhere and got a lot of feedback at MuleCon. Both are hammers but just like with every good toolbox there's one for nails and one for walls.
True, ServingXML is a sledge hammer
I am still wondering, however, how we will write CSV/delimited files, since PZFileReader is only for reading/parsing files. The idea was to have CsvToXml and vice versa. Output is indeed a bit limited and will probably have to be written as extension to the PZ* classes and contributed back to PZFileReader; however it should be easy to write since all the necessary information is contained in the meta-data and output is essentially always the same. The samples contain a simple CSV writer and any other delimiter can be substituted easily.
setting fix-version to 1.4.1
Descoping the 1.4.1, unset Fix Version for some issues.
This is not going to be implemented as part of the regular Mule codebase. There's the flatfiles project on MuleForge
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||