Java 解析 XML
来自: http://www.importnew.com/17918.html
XML解析技术有两种 DOM SAX
- DOM方式
根据XML的层级结构在内存中分配一个树形结构,把XML的标签,属性和文本等元素都封装成树的节点对象
- 优点: 便于实现增 删 改 查
- 缺点: XML文件过大可能造成内存溢出
- SAX方式
采用事件驱动模型边读边解析:从上到下一行行解析,解析到某一元素, 调用相应解析方法
- 优点: 不会造成内存溢出,
- 缺点: 查询不方便,但不能实现 增 删 改
不同的公司和组织提供了针对DOM和SAX两种方式的解析器
- SUN的jaxp
- Dom4j组织的dom4j(最常用:如Spring)
- JDom组织的jdom
关于这三种解析器渊源可以参考java解析xml文件四种方式.
JAXP 解析
JAXP是JavaSE的一部分,在javax.xml.parsers包下,分别针对dom与sax提供了如下解析器:
Dom
- DocumentBuilder
- DocumentBuilderFactory
SAX
- SAXParser
- SAXParserFactory
示例XML如下,下面我们会使用JAXP对他进行增 删 改 查操作
- config.xml
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE beans SYSTEM "constraint.dtd"> <beans> <bean id="id1" class="com.fq.domain.Bean"> <property name="isUsed" value="true"/> </bean> <bean id="id2" class="com.fq.domain.ComplexBean"> <property name="refBean" ref="id1"/> </bean> </beans>
- constraint.dtd
<!ELEMENT beans (bean*) > <!ELEMENT bean (property*)> <!ATTLIST bean id CDATA #REQUIRED class CDATA #REQUIRED > <!ELEMENT property EMPTY> <!ATTLIST property name CDATA #REQUIRED value CDATA #IMPLIED ref CDATA #IMPLIED>
JAXP-Dom
/** * @author jifang * @since 16/1/13下午11:24. */ public class XmlRead { @Test public void client() throws ParserConfigurationException, IOException, SAXException { // 生成一个Dom解析器 DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); // 解析XML文件 Document document = builder.parse(ClassLoader.getSystemResourceAsStream("config.xml")); // ... } }
DocumentBuilder的parse(String/File/InputSource/InputStream param)方法可以将一个XML文件解析为一个Document对象,代表整个文档.Document(org.w3c.dom包下)是一个接口,其父接口为Node, Node的其他子接口还有Element Attr Text等.
- Node
Node 常用方法 | 释义 |
---|---|
Node appendChild(Node newChild) | Adds the node newChild to the end of the list of children of this node. |
Node removeChild(Node oldChild) | Removes the child node indicated by oldChild from the list of children, and returns it. |
NodeList getChildNodes() | A NodeList that contains all children of this node. |
NamedNodeMap getAttributes() | A NamedNodeMap containing the attributes of this node (if it is an Element) or null otherwise. |
String getTextContent() | This attribute returns the text content of this node and its descendants. |
- Document
Document 常用方法 | 释义 |
---|---|
NodeList getElementsByTagName(String tagname) | Returns a NodeList of all the Elements in document order with a given tag name and are contained in the document. |
Element createElement(String tagName) | Creates an element of the type specified. |
Text createTextNode(String data) | Creates a Text node given the specified string. |
Attr createAttribute(String name) | Creates an Attr of the given name. |
Dom查询
- 解析<bean/>标签上的所有属性
public class XmlRead { private Document document; @Before public void setUp() throws ParserConfigurationException, IOException, SAXException { document = DocumentBuilderFactory.newInstance().newDocumentBuilder() .parse(ClassLoader.getSystemResourceAsStream("config.xml")); } @Test public void client() throws ParserConfigurationException, IOException, SAXException { NodeList beans = document.getElementsByTagName("bean"); for (int i = 0; i < beans.getLength(); ++i) { NamedNodeMap attributes = beans.item(i).getAttributes(); scanNameNodeMap(attributes); } } private void scanNameNodeMap(NamedNodeMap attributes) { for (int i = 0; i < attributes.getLength(); ++i) { Attr attribute = (Attr) attributes.item(i); System.out.printf("%s -> %s%n", attribute.getName(), attribute.getValue()); // System.out.println(attribute.getNodeName() + " -> " + attribute.getTextContent()); } } }
- 打印XML文件所有标签名
@Test public void client() { list(document, 0); } private void list(Node node, int depth) { if (node.getNodeType() == Node.ELEMENT_NODE) { for (int i = 0; i < depth; ++i) System.out.print("\t"); System.out.println("<" + node.getNodeName() + ">"); } NodeList childNodes = node.getChildNodes(); for (int i = 0; i < childNodes.getLength(); ++i) { list(childNodes.item(i), depth + 1); } }
Dom添加节点
- 在第一个<bean/>标签下添加一个<property/>标签,最终结果形式:
<bean id="id1" class="com.fq.domain.Bean"> <property name="isUsed" value="true"/> <property name="name" value="simple-bean">新添加的</property> </bean>
/** * @author jifang * @since 16/1/17 下午5:56. */ public class XmlAppend { // 文档回写器 private Transformer transformer; // xml文档 private Document document; @Before public void setUp() throws ParserConfigurationException, IOException, SAXException { document = DocumentBuilderFactory.newInstance().newDocumentBuilder() .parse(ClassLoader.getSystemResourceAsStream("config.xml")); } @Test public void client() { // 得到第一bean标签 Node firstBean = document.getElementsByTagName("bean").item(0); /** 创建一个property标签 **/ Element property = document.createElement("property"); // 为property标签添加属性 // property.setAttribute("name", "name"); // property.setAttribute("value", "feiqing"); Attr name = document.createAttribute("name"); name.setValue("name"); property.setAttributeNode(name); Attr value = document.createAttribute("value"); value.setValue("simple-bean"); property.setAttributeNode(value); // 为property标签添加内容 //property.setTextContent("新添加的"); property.appendChild(document.createTextNode("新添加的")); // 将property标签添加到bean标签下 firstBean.appendChild(property); } @After public void tearDown() throws TransformerException { transformer = TransformerFactory.newInstance().newTransformer(); // 写回XML transformer.transform(new DOMSource(document), new StreamResult("src/main/resources/config.xml")); } }
注意: 必须将内存中的DOM写回XML文档才能生效
Dom更新节点
- 将刚刚添加的<property/>修改如下
<property name="name" value="new-simple-bean">simple-bean是新添加的</property>
@Test public void client() { NodeList properties = document.getElementsByTagName("property"); for (int i = 0; i < properties.getLength(); ++i) { Element property = (Element) properties.item(i); if (property.getAttribute("value").equals("simple-bean")) { property.setAttribute("value", "new-simple-bean"); property.setTextContent("simple-bean是新添加的"); break; } } }
Dom删除节点
- 删除刚刚修改的<property/>标签
@Test public void client() { NodeList properties = document.getElementsByTagName("property"); for (int i = 0; i < properties.getLength(); ++i) { Element property = (Element) properties.item(i); if (property.getAttribute("value").equals("new-simple-bean")) { property.getParentNode().removeChild(property); break; } } }
JAXP-SAX
SAXParser实例需要从SAXParserFactory实例的newSAXParser()方法获得, 用于解析XML文件的parse(String uri, DefaultHandler dh)方法没有返回值,但比DOM方法多了一个事件处理器参数DefaultHandler:
- 解析到开始标签,自动调用DefaultHandler的startElement()方法;
- 解析到标签内容(文本),自动调用DefaultHandler的characters()方法;
- 解析到结束标签,自动调用DefaultHandler的endElement()方法.
Sax查询
- 打印整个XML文档
/** * @author jifang * @since 16/1/17 下午9:16. */ public class SaxRead { @Test public void client() throws ParserConfigurationException, IOException, SAXException { SAXParser parser = SAXParserFactory.newInstance().newSAXParser(); parser.parse(ClassLoader.getSystemResourceAsStream("config.xml"), new SaxHandler()); } private class SaxHandler extends DefaultHandler { @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { System.out.print("<" + qName); for (int i = 0; i < attributes.getLength(); ++i) { String attrName = attributes.getQName(i); String attrValue = attributes.getValue(i); System.out.print(" " + attrName + "=" + attrValue); } System.out.print(">"); } @Override public void characters(char[] ch, int start, int length) throws SAXException { System.out.print(new String(ch, start, length)); } @Override public void endElement(String uri, String localName, String qName) throws SAXException { System.out.print("</" + qName + ">"); } } }
- 打印所有property标签内容的Handler
private class SaxHandler extends DefaultHandler { // 用互斥锁保护isProperty变量 private boolean isProperty = false; private Lock mutex = new ReentrantLock(); @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { if (qName.equals("property")) { mutex.lock(); isProperty = true; } } @Override public void characters(char[] ch, int start, int length) throws SAXException { // 只有被锁定之后才有可能是true if (isProperty) { System.out.println(new String(ch, start, length)); } } @Override public void endElement(String uri, String localName, String qName) throws SAXException { if (qName.equals("property")) { try { isProperty = false; } finally { mutex.unlock(); } } } }
注: SAX方式不能实现增 删 改操作.
Dom4j解析
Dom4j是JDom的一种智能分支,从原先的JDom组织中分离出来,提供了比JDom功能更加强大,性能更加卓越的Dom4j解析器(比如提供对XPath支持).使用Dom4j需要在pom中添加如下依赖:
<dependency> <groupId>dom4j</groupId> <artifactId>dom4j</artifactId> <version>1.6.1</version> </dependency>
示例XML如下,下面我们会使用Dom4j对他进行增 删 改 查操作:
config.xml <?xml version="1.0" encoding="utf-8"?> <beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.fq.me/context" xsi:schemaLocation="http://www.fq.me/context http://www.fq.me/context/context.xsd"> <bean id="id1" class="com.fq.benz"> <property name="name" value="benz"/> </bean> <bean id="id2" class="com.fq.domain.Bean"> <property name="isUsed" value="true"/> <property name="complexBean" ref="id1"/> </bean> </beans>
- context.xsd
<?xml version="1.0" encoding="utf-8"?> <schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.fq.me/context" elementFormDefault="qualified"> <element name="beans"> <complexType> <sequence> <element name="bean" maxOccurs="unbounded"> <complexType> <sequence> <element name="property" maxOccurs="unbounded"> <complexType> <attribute name="name" type="string" use="required"/> <attribute name="value" type="string" use="optional"/> <attribute name="ref" type="string" use="optional"/> </complexType> </element> </sequence> <attribute name="id" type="string" use="required"/> <attribute name="class" type="string" use="required"/> </complexType> </element> </sequence> </complexType> </element> </schema>
/** * @author jifang * @since 16/1/18下午4:02. */ public class Dom4jRead { @Test public void client() throws DocumentException { SAXReader reader = new SAXReader(); Document document = reader.read(ClassLoader.getSystemResource("config.xml")); // ... } }
与JAXP类似Document也是一个接口(org.dom4j包下),其父接口是Node, Node的子接口还有Element Attribute Document Text CDATA Branch等
- Node
Node 常用方法 | 释义 |
---|---|
Element getParent() | getParent returns the parent Element if this node supports the parent relationship or null if it is the root element or does not support the parent relationship. |
- Document
Document 常用方法 | 释义 |
---|---|
Element getRootElement() | Returns the root Elementfor this document. |
- Element
Element 常用方法 | 释义 |
---|---|
void add(Attribute/Text param) | Adds the given Attribute/Text to this element. |
Element addAttribute(String name, String value) | Adds the attribute value of the given local name. |
Attribute attribute(int index) | Returns the attribute at the specified indexGets the |
Attribute attribute(String name) | Returns the attribute with the given name |
Element element(String name) | Returns the first element for the given local name and any namespace. |
Iterator elementIterator() | Returns an iterator over all this elements child elements. |
Iterator elementIterator(String name) | Returns an iterator over the elements contained in this element which match the given local name and any namespace. |
List elements() | Returns the elements contained in this element. |
List elements(String name) | Returns the elements contained in this element with the given local name and any namespace. |
- Branch
Branch 常用方法 | 释义 |
---|---|
Element addElement(String name) | Adds a new Element node with the given name to this branch and returns a reference to the new node. |
boolean remove(Node node) | Removes the given Node if the node is an immediate child of this branch. |
Dom4j查询
- 打印所有属性信息:
/** * @author jifang * @since 16/1/18下午4:02. */ public class Dom4jRead { private Document document; @Before public void setUp() throws DocumentException { document = new SAXReader() .read(ClassLoader.getSystemResource("config.xml")); } @Test @SuppressWarnings("unchecked") public void client() { Element beans = document.getRootElement(); for (Iterator iterator = beans.elementIterator(); iterator.hasNext(); ) { Element bean = (Element) iterator.next(); String id = bean.attributeValue("id"); String clazz = bean.attributeValue("class"); System.out.println("id: " + id + ", class: " + clazz); scanProperties(bean.elements()); } } public void scanProperties(List<? extends Element> properties) { for (Element property : properties) { System.out.print("name: " + property.attributeValue("name")); Attribute value = property.attribute("value"); if (value != null) { System.out.println("," + value.getName() + ": " + value.getValue()); } Attribute ref = property.attribute("ref"); if (ref != null) { System.out.println("," + ref.getName() + ": " + ref.getValue()); } } } }
Dom4j添加节点
在第一个<bean/>标签末尾添加<property/>标签
<bean id="id1" class="com.fq.benz"> <property name="name" value="benz"/> <property name="refBean" ref="id2">新添加的标签</property> </bean>
/** * @author jifang * @since 16/1/19上午9:50. */ public class Dom4jAppend { //... @Test public void client() { Element beans = document.getRootElement(); Element firstBean = beans.element("bean"); Element property = firstBean.addElement("property"); property.addAttribute("name", "refBean"); property.addAttribute("ref", "id2"); property.setText("新添加的标签"); } @After public void tearDown() throws IOException { // 回写XML OutputFormat format = OutputFormat.createPrettyPrint(); XMLWriter writer = new XMLWriter(new FileOutputStream("src/main/resources/config.xml"), format); writer.write(document); } }
我们可以将获取读写XML操作封装成一个工具, 以后调用时会方便些:
/** * @author jifang * @since 16/1/19下午2:12. */ public class XmlUtils { public static Document getXmlDocument(String config) { try { return new SAXReader().read(ClassLoader.getSystemResource(config)); } catch (DocumentException e) { throw new RuntimeException(e); } } public static void writeXmlDocument(String path, Document document) { try { new XMLWriter(new FileOutputStream(path), OutputFormat.createPrettyPrint()).write(document); } catch (IOException e) { throw new RuntimeException(e); } } }
在第一个<bean/>的第一个<property/>后面添加一个<property/>标签
<bean id="id1" class="com.fq.benz"> <property name="name" value="benz"/> <property name="rate" value="3.14"/> <property name="refBean" ref="id2">新添加的标签</property> </bean>
public class Dom4jAppend { private Document document; @Before public void setUp() { document = XmlUtils.getXmlDocument("config.xml"); } @Test @SuppressWarnings("unchecked") public void client() { Element beans = document.getRootElement(); Element firstBean = beans.element("bean"); List<Element> properties = firstBean.elements(); //Element property = DocumentHelper // .createElement(QName.get("property", firstBean.getNamespaceURI())); Element property = DocumentFactory.getInstance() .createElement("property", firstBean.getNamespaceURI()); property.addAttribute("name", "rate"); property.addAttribute("value", "3.14"); properties.add(1, property); } @After public void tearDown() { XmlUtils.writeXmlDocument("src/main/resources/config.xml", document); } }
Dom4j修改节点
将id1 bean的第一个<property/>修改如下:
<property name="name" value="翡青"/>
@Test @SuppressWarnings("unchecked") public void client() { Element beans = document.getRootElement(); Element firstBean = beans.element("bean"); List<Element> properties = firstBean.elements(); Element property = DocumentFactory.getInstance() .createElement("property", firstBean.getNamespaceURI()); property.addAttribute("name", "rate"); property.addAttribute("value", "3.14"); properties.add(1, property); }
Dom4j 删除节点
删除刚刚修改的节点
@Test @SuppressWarnings("unchecked") public void delete() { List<Element> beans = document.getRootElement().elements("bean"); for (Element bean : beans) { if (bean.attributeValue("id").equals("id1")) { List<Element> properties = bean.elements("property"); for (Element property : properties) { if (property.attributeValue("name").equals("name")) { // 执行删除动作 property.getParent().remove(property); break; } } break; } } }
Dom4j实例
在Java 反射一文中我们实现了根据JSON配置文件来加载bean的对象池,现在我们可以为其添加根据XML配置(XML文件同前):
/** * @author jifang * @since 16/1/18下午9:18. */ public class XmlParse { private static final ObjectPool POOL = ObjectPoolBuilder.init(null); public static Element parseBeans(String config) { try { return new SAXReader().read(ClassLoader.getSystemResource(config)).getRootElement(); } catch (DocumentException e) { throw new RuntimeException(e); } } public static void processObject(Element bean, List<? extends Element> properties) throws ClassNotFoundException, IllegalAccessException, InstantiationException, NoSuchFieldException { Class<?> clazz = Class.forName(bean.attributeValue(CommonConstant.CLASS)); Object targetObject = clazz.newInstance(); for (Element property : properties) { String fieldName = property.attributeValue(CommonConstant.NAME); Field field = clazz.getDeclaredField(fieldName); field.setAccessible(true); // 含有value属性 if (property.attributeValue(CommonConstant.VALUE) != null) { SimpleValueSetUtils.setSimpleValue(field, targetObject, property.attributeValue(CommonConstant.VALUE)); } else if (property.attributeValue(CommonConstant.REF) != null) { String refId = property.attributeValue(CommonConstant.REF); Object object = POOL.getObject(refId); field.set(targetObject, object); } else { throw new RuntimeException("neither value nor ref"); } } POOL.putObject(bean.attributeValue(CommonConstant.ID), targetObject); } }
注: 上面代码只是对象池项目的XML解析部分,完整项目可参考git@git.oschina.net:feiqing/commons-frame.git
XPath
XPath是一门在XML文档中查找信息的语言,XPath可用来在XML文档中对元素和属性进行遍历.
表达式 | 描述 |
---|---|
/ | 从根节点开始获取( /beans :匹配根下的 <beans/> ; /beans/bean :匹配 <beans/> 下面的 <bean/> ) |
// | 从当前文档中搜索,而不用考虑它们的位置( //property : 匹配当前文档中所有 <property/> ) |
* | 匹配任何元素节点( /* : 匹配所有标签) |
@ | 匹配属性(例: //@name : 匹配所有 name 属性) |
[position] | 位置谓语匹配(例: //property[1] : 匹配第一个 <property/> ; //property[last()] : 匹配最后一个 <property/> ) |
[@attr] | 属性谓语匹配(例: //bean[@id] : 匹配所有带id属性的标签; //bean[@id='id1'] : 匹配所有id属性值为’id1’的标签) |
谓语: 谓语用来查找某个特定的节点或者包含某个指定的值的节点.XPath的语法详细内容可以参考W3School XPath 教程.
Dom4j对XPath的支持
默认的情况下Dom4j并不支持XPath, 需要在pom下添加如下依赖:
<dependency> <groupId>jaxen</groupId> <artifactId>jaxen</artifactId> <version>1.1.6</version> </dependency>
Dom4jNode接口提供了方法对XPath支持:
方法 |
---|
List selectNodes(String xpathExpression) |
List selectNodes(String xpathExpression, String comparisonXPathExpression) |
List selectNodes(String xpathExpression, String comparisonXPathExpression, boolean removeDuplicates) |
Object selectObject(String xpathExpression) |
Node selectSingleNode(String xpathExpression) |
XPath实现查询
- 查询所有bean标签上的属性值
/** * @author jifang * @since 16/1/20上午9:28. */ public class XPathRead { private Document document; @Before public void setUp() throws DocumentException { document = XmlUtils.getXmlDocument("config.xml"); } @Test @SuppressWarnings("unchecked") public void client() { List<Element> beans = document.selectNodes("//bean"); for (Element bean : beans) { System.out.println("id: " + bean.attributeValue("id") + ", class: " + bean.attributeValue("class")); } } }
XPath实现更新
- 删除id=”id2”的<bean/>
@Test public void client() { Node bean = document.selectSingleNode("//bean[@id=\"id2\"]"); bean.getParent().remove(bean); }
参考:
Dom4j的使用
Java 处理 XML 的三种主流技术及介绍