使用java将XML文件转换为CSV文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21413978/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 08:36:23  来源:igfitidea点击:

Convert an XML file to CSV file using java

javaxmlcsv

提问by Emre801

I need help understanding the steps involved in converting an XML file into a CSV file using java. Here is an example of an XML file

我需要帮助理解使用 java 将 XML 文件转换为 CSV 文件所涉及的步骤。这是一个 XML 文件的示例

<?xml version="1.0"?>
<Sites>
<Site id="101" name="NY-01" location="New York">
    <Hosts>
        <Host id="1001">
           <Host_Name>srv001001</Host_Name>
           <IP_address>10.1.2.3</IP_address>
           <OS>Windows</OS>
           <Load_avg_1min>1.3</Load_avg_1min>
           <Load_avg_5min>2.5</Load_avg_5min>
           <Load_avg_15min>1.2</Load_avg_15min>
        </Host>
        <Host id="1002">
           <Host_Name>srv001002</Host_Name>
           <IP_address>10.1.2.4</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>1.4</Load_avg_1min>
           <Load_avg_5min>2.5</Load_avg_5min>
           <Load_avg_15min>1.2</Load_avg_15min>
        </Host>
        <Host id="1003">
           <Host_Name>srv001003</Host_Name>
           <IP_address>10.1.2.5</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>3.3</Load_avg_1min>
           <Load_avg_5min>1.6</Load_avg_5min>
           <Load_avg_15min>1.8</Load_avg_15min>
        </Host>
        <Host id="1004">
           <Host_Name>srv001004</Host_Name>
           <IP_address>10.1.2.6</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>2.3</Load_avg_1min>
           <Load_avg_5min>4.5</Load_avg_5min>
           <Load_avg_15min>4.2</Load_avg_15min>
        </Host>     
    </Hosts>
</Site>
</Sites>

and here is the resulting CSV file.

这是生成的 CSV 文件。

site_id, site_name, site_location, host_id, host_name, ip_address, operative_system, load_avg_1min, load_avg_5min, load_avg_15min
101, NY-01, New York, 1001, srv001001, 10.1.2.3, Windows, 1.3, 2.5, 1.2
101, NY-01, New York, 1002, srv001002, 10.1.2.4, Linux, 1.4, 2.5, 1.2
101, NY-01, New York, 1003, srv001003, 10.1.2.5, Linux, 3.3, 1.6, 1.8
101, NY-01, New York, 1004, srv001004, 10.1.2.6, Linux, 2.3, 4.5, 4.2

I was thinking of using a DOM parser to read the xml file. The problem I have with that is I would need to specify specific elements in to code by name, but I want it to be able to parse it without doing that.

我正在考虑使用 DOM 解析器来读取 xml 文件。我遇到的问题是我需要按名称在代码中指定特定元素,但我希望它能够在不这样做的情况下解析它。

Are there any tools or libraries in java that would be able to help me achieve this.

java中是否有任何工具或库可以帮助我实现这一目标。

If I have a XML file of this format below and want to add the value of the InitgPty in the same row with MSgId (Pls note :InitgPty is in the next tag level, so it prints the value in the next row)

如果我在下面有一个这种格式的 XML 文件,并且想将 InitgPty 的值与 MSgId 添加到同一行中(请注意:InitgPty 位于下一个标记级别,因此它会在下一行打印该值)

<?xml version="1.0"?>
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>XYZ07/ABC</MsgId>
<NbOfTxs>100000</NbOfTxs>
<InitgPty>
<Nm>XYZ</Nm>
</InitgPty>

采纳答案by Guy Gavriely

here's a working example, data.xmlhas your data:

这是一个工作示例,data.xml有您的数据:

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.w3c.dom.Document;

class Xml2Csv {

    public static void main(String args[]) throws Exception {
        File stylesheet = new File("src/main/resources/style.xsl");
        File xmlSource = new File("src/main/resources/data.xml");

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse(xmlSource);

        StreamSource stylesource = new StreamSource(stylesheet);
        Transformer transformer = TransformerFactory.newInstance()
                .newTransformer(stylesource);
        Source source = new DOMSource(document);
        Result outputTarget = new StreamResult(new File("/tmp/x.csv"));
        transformer.transform(source, outputTarget);
    }
}

style.xsl

样式.xsl

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
<xsl:for-each select="//Host">
<xsl:value-of select="concat(Host_Name,',',IP_address,',',OS,Load_avg_1min,',',Load_avg_5min,',',Load_avg_15min,'&#xA;')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

output:

输出:

Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
srv001001,10.1.2.3,Windows1.3,2.5,1.2
srv001002,10.1.2.4,Linux1.4,2.5,1.2
srv001003,10.1.2.5,Linux3.3,1.6,1.8
srv001004,10.1.2.6,Linux2.3,4.5,4.2

回答by Jono

Three steps:

三个步骤:

  1. Parse the XML file into a java XML libraryobject.
  2. Retrieve relevant datafrom the object for each row.
  3. Write the results to a text file using native java functions, saving with *.csv extension.
  1. 将 XML 文件解析为 java XML 库对象。
  2. 从对象中检索每一行的相关数据
  3. 使用本机 java 函数将结果写入文本文件,以 *.csv 扩展名保存。

回答by Pedantic

Your best best is to use XSLTto "transform" the XML to CSV. There are some Q/As on so (like here) that cover how to do this. The key is to provide a schema for your source data so the XSLT transform process knows how to read it so it can properly format the results.

最好的方法是使用XSLT将 XML“转换”为 CSV。有一些 Q/As(比如这里)涵盖了如何做到这一点。关键是为您的源数据提供一个模式,以便 XSLT 转换过程知道如何读取它,以便它可以正确地格式化结果。

Then you can use Xalanto input the XML, read the XSLT and output your results.

然后您可以使用Xalan输入 XML,读取 XSLT 并输出您的结果。

回答by injecteer

your file looks really flat and simple. You don't necessarily need an XML parser to convert it. Just parse it with LineNumberReader.readLine()and use regexpto extract specific fields.

您的文件看起来非常扁平和简单。您不一定需要 XML 解析器来转换它。只需解析它LineNumberReader.readLine()并用于regexp提取特定字段。

Another option is to use StAX, a streaming API for XML processing. It's pretty simple and you don't need to load the whole document in RAM.

另一种选择是使用StAX,一种用于 XML 处理的流式 API。这非常简单,您无需将整个文档加载到 RAM 中。

回答by Lochrann

The answer has already been provided by Pedantic (using the DOM-like approach {Document Object Model}) and Jono (with the SAX-like approach this time) in January.

Pedantic(使用类似 DOM 的方法 {Document Object Model})和 Jono(这次使用类似 SAX 的方法)在 1 月份已经提供了答案。

My opinion is that both methods work well for small files but the latter works better with big XML files. You didn't mention the actual size of your XML files but you should take this into account.

我的观点是这两种方法都适用于小文件,但后者更适用于大 XML 文件。您没有提到 XML 文件的实际大小,但您应该考虑到这一点。

Whatever method is used a specific program (which would detect special tags tailored to your local XML) will be easier to write but won't work without code adaptations for another XML flavor, while a more generic program will be harder to devise but will work for all XML files. You said you wanted to be able to parse a file without specifying specific element names so I guess the generic approach is what you prefer, and I agree with that, but please note that it's easier said than done. Indeed, I had the same problem on january too, implying this time a big XML file (>>100Mo) and I was surprised that nothing was available over the Internet so far. Turning frustration into something better is always a good thing so I decided to deal with that specific problem in the most generic way by myself, with a special concern for the big-XML-file-issue.

无论使用什么方法,特定程序(它会检测为您的本地 XML 定制的特殊标签)将更容易编写,但如果没有针对另一种 XML 风格的代码改编将无法工作,而更通用的程序将更难设计但可以工作适用于所有 XML 文件。你说你希望能够在不指定特定元素名称的情况下解析文件,所以我猜通用方法是你喜欢的,我同意这一点,但请注意,说起来容易做起来难。事实上,我在 1 月也遇到了同样的问题,这意味着这次是一个很大的 XML 文件 (>>100Mo),我很惊讶到目前为止 Internet 上没有任何可用的内容。把挫折变成更好的东西总是一件好事,所以我决定自己以最通用的方式处理那个特定的问题,特别关注大 XML 文件问题

You might be interested to know that the generic Java library I wrote, which is now published as free software, converted your XML file into CSV the way you expected (in -x -u mode {please refer to the documentation for further information}).

您可能有兴趣知道我编写的通用 Java 库(现已作为免费软件发布)以您预期的方式将您的 XML 文件转换为 CSV(在 -x -u 模式下{请参阅文档以获取更多信息}) .

So the answer to the last part of your question is: yes, there is at least one library which will help you achieve your goal, mine, which is named "XML2CSV-Generic-Converter". There might be other ones of course, and better ones certainly, but I couldn't pick any decent (free) one by myself.

所以你问题的最后一部分的答案是:是的,至少有一个库可以帮助你实现你的目标,我的,它被命名为“XML2CSV-Generic-Converter”。当然可能还有其他的,当然还有更好的,但我自己无法选择任何像样(免费)的。

I won't provide any link here to comply with Peter Foti 's judicious remark - but if you key "XML2CSV-Generic-Converter" in your favorite search engine you should find it easily.

我不会在此处提供任何链接以符合 Peter Foti 的明智评论 - 但如果您在最喜欢的搜索引擎中键入“XML2CSV-Generic-Converter”,您应该很容易找到它。

回答by Amol

http://beanio.org/2.1/docs/reference/index.html#RecordsThis is one of the Quick and robust solution.

http://beanio.org/2.1/docs/reference/index.html#Records这是快速而强大的解决方案之一。