Task: You have huge XML file (300 Mb), you need to get out(filter out/extract/grab) it only one tag with specific value.
Under Ubuntu 12.04 install xslt processor:
Structure of huge XML (hugeFile.xml):
Create extract.xsl file:
Execute XSLT processor (12 seconds):
Under Ubuntu 12.04 install xslt processor:
sudo apt-get install xsltproc
Structure of huge XML (hugeFile.xml):
<?xml version="1.0" encoding="ISO-8859-1"?> <Feed ExtractDate="07/25/2013" ExtractTime="15:30:15">
..... a lot of companies information<COMPANY ... LegalName="MyCompany" .....>
..... a lot of inner tags .....
</COMPANY>
..... a lot of companies information
</Feed>
Create extract.xsl file:
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <xsl:element name="TagToExtract"> <xsl:apply-templates select="//COMPANY[@LegalName='MyCompany']" /> </xsl:element> </xsl:template> <xsl:template match="//COMPANY[@LegalName='MyCompany']"> <xsl:copy> <xsl:apply-templates select="@*|node()" /> </xsl:copy> </xsl:template> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()" /> </xsl:copy> </xsl:template> </xsl:stylesheet>
Execute XSLT processor (12 seconds):
xsltproc extract.xsl hugeFile.xml > 1.xml