XML文件的解析

DOM/SAX/JDOM/DOM4J

Posted by Claire on June 12, 2020

XML文件的解析

mybatis的学习当中,配置的加载传统的是通过解析用户配置的文件,获取到数据库连接与对象实体映射关系的,书中有描述到DOM\SAX\StAX 几种方案,一方面是夯实基础,一方面是了解三种方法的优缺点,今天就学习一下XML文件的解析

一、XML的解析方式

  • DOM 解析XML底层接口之一,跨平台,跨语言
  • SAX 解析XML底层接口之一
  • Jdom/dom4J 基于底层API的封装,Java语言,更方便便捷

二、DOM解析

DOM 解析的原理:树形结构,依赖内存加载文件,树在内存中持久化存储,映射成Document对象,解析DOM树,识别父节点、当前节点、兄弟节点和子节点,在节点间游历,就能够获取到整棵树的数据。操作简易,消耗大,适用于小文档或者需要频繁修改文档的场景。

使用JDK自带的DocumentBuilderFactory进行解析

//生成工厂实例
  DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
//创建builder实例
        DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
        //读取文件内容,加载入Document对象
        Document document = documentBuilder.parse("store.xml");
        //根据Name获取节点
        NodeList productList = document.getElementsByTagName("product");
       for(int i =0;i<productList.getLength();i++){
           Node product = productList.item(i);
           Node firstChild = product.getFirstChild();
           Node lastChild = product.getLastChild();
           String nodeName = product.getNodeName();
           short nodeType = product.getNodeType();
           String nodeValue = product.getNodeValue();
           //获取节点内容
           NamedNodeMap namedNodeMap = product.getAttributes();
           for(int j=0;j<namedNodeMap.getLength();j++){
               Node node = namedNodeMap.item(j);
               String name = node.getNodeName();
               String value = node.getNodeValue();
               short type = node.getNodeType();
               System.out.println("name:"+name+";value:"+value);
           }
           //获取子节点内容
           NodeList childNodes = product.getChildNodes();
           for(int k =0 ;k<childNodes.getLength();k++){
               Node item = childNodes.item(k);
               String itemNodeName = item.getNodeName();
               short itemNodeType = item.getNodeType();
               String itemNodeValue = item.getNodeValue();
               System.out.println("itemName:"+itemNodeName+";itemValue:"+itemNodeValue);

               NodeList children = item.getChildNodes();
               for(int m =0;m<children.getLength();m++){
                   Node item1 = children.item(m);
                   System.out.println("itemName:"+item1.getNodeName()+";itemValue:"+item1.getNodeValue());
               }
           }
       }

使用XPath解析

        DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
        Document document = documentBuilder.parse("store.xml");
        List<Product> products = new ArrayList<>();
        XPathFactory xPathFactory = XPathFactory.newInstance();
        XPath xPath = xPathFactory.newXPath();
        NodeList nodeList = (NodeList)xPath.evaluate("store/product",document, XPathConstants.NODESET);
        for(int i=0 ;i <nodeList.getLength();i++){
            Node node = nodeList.item(i);
            Node nodeAttr = node.getAttributes().item(0);
            Integer id = Integer.valueOf(nodeAttr.getNodeValue());
            String name = String.valueOf(xPath.evaluate("name", node, XPathConstants.STRING));
            Double price = (Double)xPath.evaluate("price", node, XPathConstants.NUMBER);
            Integer inventory =((Double)xPath.evaluate("inventory", node, XPathConstants.NUMBER)).intValue();

            Product product = new Product();
            product.setId(id);
            product.setName(name);
            product.setPrice(price);
            product.setInventory(inventory);
            products.add(product);

        }
        products.forEach(System.out::println);

三、SAX解析

SAX解析原理:基于事件的模型,不需要将全部文档加载入内存,只需要用户读取的时候加载相应的内容片段即可,响应快,消耗小,适用于大型文档读取的场景。

使用JDK自带的SAXParserFactory

  • 创建一个用于解析的Handler, 重写父类的关于片段解析的代码
public class SaxHandler extends DefaultHandler {
    private List<Product> products = null;
    private Product product;
    private String currentTag = null;
    private String currentValue = null;
    private String nodeName = null;

    public SaxHandler(String nodeName) {
        this.nodeName = nodeName;
    }

    public List<Product> getProducts() {
        return products;
    }


    @Override
    public void startDocument() throws SAXException {
        // 读到一个开始标签会触发
        super.startDocument();

        products = new ArrayList<Product>();
    }

    @Override
    public void endDocument() throws SAXException {
        //自动生成的方法存根
        super.endDocument();
    }

    @Override
    public void startElement(String uri, String localName, String name,
                             Attributes attributes) throws SAXException {
        //文档的开头调用
        super.startElement(uri, localName, name, attributes);

        if (name.equals(nodeName)) {
            product = new Product();
        }
        if (attributes != null && product != null) {
            for (int i = 0; i < attributes.getLength(); i++) {
                if (attributes.getQName(i).equals("id")) {
                    product.setId(Integer.valueOf(attributes.getValue(i)));
                }
            }
        }
        currentTag = name;
    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {
        //处理在XML文件中读到的内容
        super.characters(ch, start, length);

        if (currentTag != null && product != null) {
            currentValue = new String(ch, start, length);
            if (!currentValue.trim().equals("") && !currentValue.trim().equals("\n")) {
                if (currentTag.equals("name")) {
                    product.setName(currentValue);
                } else if (currentTag.equals("price")) {
                    product.setPrice(Double.valueOf(currentValue));
                }else if(currentTag.equals("inventory")){
                    product.setInventory(Integer.valueOf(currentValue));
                }
            }
        }
        currentTag = null;
        currentValue = null;
    }

    @Override
    public void endElement(String uri, String localName, String name)
            throws SAXException {
        // 结束标签的时候调用
        super.endElement(uri, localName, name);

        if (name.equals(nodeName)) {
            products.add(product);
        }
    }

}
  • 创建一个用于接收的对象(别的数据结构也可)
public class Product {
    private Integer id;
    private String name;
    private Double price;
    private Integer inventory;

    public Integer getId() {
        return id;
    }

    public void setId(Integer id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public Double getPrice() {
        return price;
    }

    public void setPrice(Double price) {
        this.price = price;
    }

    public Integer getInventory() {
        return inventory;
    }

    public void setInventory(Integer inventory) {
        this.inventory = inventory;
    }

    @Override
    public String toString() {
        return "Product{" +
                "id=" + id +
                ", name='" + name + '\'' +
                ", price=" + price +
                ", inventory=" + inventory +
                '}';
    }
}
  • 读取文件,并使用Handler进行解析
       SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
        SAXParser saxParser = saxParserFactory.newSAXParser();
        SaxHandler saxHandler = new SaxHandler("product");
        saxParser.parse(new InputSource("store.xml"), saxHandler);
        List<Product> products = saxHandler.getProducts();
        products.forEach(System.out::println);

四、JDOM/Dom4J

  • JDOM的目的是成为Java特定文档模型,它简化与XML的交互并且比使用DOM实现更快

  • Dom4J,是JDOM 的一种智能分支,合并了许多超出基本 XML 文档表示的功能

  • 以下为Dom4j为例
  • 添加maven依赖
       <dependency>
            <groupId>dom4j</groupId>
            <artifactId>dom4j</artifactId>
            <version>1.6.1</version>
        </dependency>
  • 读取xml文件
        SAXReader saxReader = new SAXReader();
        Document document = saxReader.read(new File("store.xml"));
  • 处理解析逻辑
        Element rootElement = document.getRootElement();
        List<Element> elements = rootElement.elements();
        List<Product> products = new ArrayList<>();
        for(Element element : elements){
            Integer id = Integer.valueOf(element.attributeValue("id"));
            String name = element.element("name").getText();
            Double price = Double.valueOf(element.element("price").getText());
            Integer inventory = Integer.valueOf(element.elementText("inventory"));
            Product product = new Product();
            product.setId(id);
            product.setName(name);
            product.setPrice(price);
            product.setInventory(inventory);
            
            products.add(product);
        }
        products.forEach(System.out::println);

五、StAX解析

它是与SAX类似的流式模式来解析XML文件,但是SAX采用的“推模式”来解析,即事件由计解析器产生,应用程序调用通过回调的方式获取结果,StAX采用“拉模式”,解析器控制解析进程,应用程序控制解析进程的推进,可以简化应用处理XML文档的代码,并决定何时停止解析,而且可以同时处理多个XML文档

采用JDK自带的XMLInputFactory

StAX解析查询方式有多种:
1. 基于光标
2. 基于迭代模型
3. 基于过滤器
4. 基于xpath
5. 使用 XMLStreamWriter 创建文档
6. 通过 Transformer 更新节点信息
7. StAX如果想定制事件,那就会比较复杂,手动实现XMLEvent的每一个方法,定制EventReader和XMLEventWriter
  • 简单示例 : 基于光标、基于迭代器、基于过滤器
//=====================StAX 流 光标模型==========================//
        XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
        FileReader fileReader = new FileReader("store.xml");
        xmlInputFactory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
        XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(fileReader);
        List<Product> products = new ArrayList<>();

        while (xmlStreamReader.hasNext()) {
            int type = xmlStreamReader.next();
            if (XMLStreamConstants.START_DOCUMENT == type) {
            } else if (XMLStreamConstants.START_ELEMENT == type) {
                String name = xmlStreamReader.getName().toString();
                if ("product".equals(name)) {
                    QName key = xmlStreamReader.getAttributeName(0);
                    if ("id".equals(key.toString())) {
                        Product product = new Product();
                        product.setId(Integer.valueOf(xmlStreamReader.getAttributeValue(0)));
                        products.add(product);
                    }
                } else if ("name".equals(name)) {
                    Product product = products.get(products.size() - 1);
                    product.setName(xmlStreamReader.getElementText());
                } else if ("price".equals(name)) {
                    Product product = products.get(products.size() - 1);
                    product.setPrice(Double.valueOf(xmlStreamReader.getElementText()));
                } else if ("inventory".equals(name)) {
                    Product product = products.get(products.size() - 1);
                    product.setInventory(Integer.valueOf(xmlStreamReader.getElementText()));
                }
            } else if (XMLStreamConstants.CHARACTERS == type) {
//                System.out.println(xmlStreamReader.getText().trim());
            } else if (XMLStreamConstants.END_ELEMENT == type) {
                // System.out.println("/"+xmlStreamReader.getName());
            } else if (XMLStreamConstants.END_DOCUMENT == type) {

            }
        }
        products.forEach(System.out::println);

        //=====================StAX 迭代模型==========================//

        XMLEventReader xmlEventReader = xmlInputFactory.createXMLEventReader(fileReader);
        while (xmlEventReader.hasNext()) {
            XMLEvent xmlEvent = xmlEventReader.nextEvent();
            if (xmlEvent.isStartDocument()) {

            } else if (xmlEvent.isStartElement()) {

            } else if (xmlEvent.isEndElement()) {

            } else if (xmlEvent.isEndDocument()) {

            }
        }

               //=====================StAX 过滤器模型==========================//
        XMLEventReader reader = xmlInputFactory.createFilteredReader(xmlInputFactory.createXMLEventReader(fileReader),
                new EventFilter() {
                    @Override
                    public boolean accept(XMLEvent event) {
                        //返回true表示会显示
                        if(event.isStartElement()) {
                            String name = event.asStartElement().getName().toString();
                            if(name.equals("price")) {
                                return true;
                            }
                        }
                        return false;
                    }
                });
        //后续依旧照常处理

六、几种方法的对比

种类 JDom Dom4J SAX StAX
形式 基于内存 基于内存 基于流 基于流
功能 基础 加强 基础 加强
  • 基于内存:基于树形结构存储,大文件占用内存,能够维系树节点之间关联,遍历&增删&切换节点都比较遍历,适用于树形结构性对象,节点内容修改较多,节点间关联逻辑复杂的场景

  • 基于流:基于事读取,内存友好,选取性读取,无需内容全部加载,大文件能够轻松快速获取节点内容,功能多样,编写逻辑较复杂

All codes are here:this page