Importing data into Google Sheets is easy thanks to the many IMPORT functions. One of the most useful functions for retrieving data from websites is the IMPORTXML Google Sheets function. This article covers everything you need to know about using XML files in Google Sheets, including a detailed step-by-step guide with screenshots. If you want to know more, read on.
Contents
- 1 What is XML?
- 2 Ways to Import XML to Google Sheets
- 3 What is the Google Sheet IMPORTXML Function, and When Should You Use It?
- 4 Syntax for the IMPORTXML Google Sheets Function
- 5 What is XPath?
- 6 How to Use IMPORTXML in Google Sheets
- 7 Frequently Asked Questions
- 8 Wrapping Up the Google Sheets IMPORTXML Guide
What is XML?
XML, like the HTML language, stands for eXtensible Markup Language. However, there is one difference between the two. The XML language has no predefined tags to use. Instead, you can define your own tags that meet your needs.
This makes XML a highly customizable language, allowing you to store data in any format you want to store, share, and find. Another advantage of using XML is that XML is a standardized language, so data can be transmitted and shared across multiple platforms and systems over the Internet or locally. Recipients can still parse the data because of the standard syntax.
Ways to Import XML to Google Sheets
Here are some ways to import XML into Google Sheets:
- Manual Import: Using this method requires copying and pasting the contents of the XML file into a Google Sheet. This approach is relatively ineffective and should only be used when dealing with small amounts of data.
- IMPORTXML Function: Google Sheets’ IMPORTXML function imports structured data from multiple sources including XML, HTML, and TSV.
- Google Apps Script: Works great for writing a little custom code to load XML data into Google Sheets. However, scripting and configuration requires more technical expertise, so it can be difficult for beginners.
What is the Google Sheet IMPORTXML Function, and When Should You Use It?
You can import XML into Google Sheets using the IMPORTXML Google Sheets function. Data can come from structured data such as HTML, CSV, XML, TSV, RSS, and ATOM feeds.
You can get information from an XML field enclosed in HTML tags like this: and. One of the best use cases for the XML import feature in Google Sheets is to use it to import data that is publicly available on the internet. This means you can scrape data from other websites.
Syntax for the IMPORTXML Google Sheets Function
Before we look at the IMPORTXML Google Sheets example, let’s see how the function formula works. The formula syntax for importing XML into Google Sheets is:
=IMPORTXML(link, xpath_query)
This formula has two parameters that are required for it to work these are:
- link: This parameter defines the web page link that the formula will check. URL parameter values must be enclosed in quotes or contain a reference to the cell containing the URL.
- xpath_query: This is the query to run on your data. XPath is used as an expression language to specify parts of XML documents. The value of this parameter must be enclosed in quotation marks. This can also be a reference to a cell containing text. More on this below.
What is XPath?
XPath is a path language for XML. It uses a non-XML syntax and provides a flexible way of dealing with different parts of an XML document. Additionally, it can be used to check addressed nodes in a document to see if they follow a pattern.
Before we get into the details, let’s take a look at some HTML basics:
- All information on the website is stored in XML format and displayed using HTML.
- for example, the xpath_query “/h1/@title” will give you all the “title” attributes found in the h1 tags available on the page.
- These XPath arguments can be modified to match your website’s search criteria.
Some of the most commonly used XPath Google Sheets queries include:
- //h1 – This query is used to get all h1’s in the document. You can change the numbers in your query to represent different header types.
- //title – //h1 – This query is used to get all titles in the document.
- //meta[@name=’description’]/@content – This query is used to collect meta descriptions within the document.
- //@href – This query is used to collect all links in the document.
- //link[@rel=’canonical’]/@href – This query is used to collect all canonicals in the document.
- //*[@itemtype]/@itemtype – This query is used to collect the schema types in the document.
- //*[@hreflang] – This query is used to scrape the hreflang inside the document.
If you’re using Google Chrome, you can use the DevTools feature to import XPaths into Google Sheets. To do this, right-click and select Inspect from the drop-down menu. This will bring up the Elements panel. Then right-click the highlighted element, click Copy, and then click Copy XPath.
If entering manually, be careful of typos to avoid errors.
How to Use IMPORTXML in Google Sheets
Now that you know how the XML Google Sheets Import function works, let’s see how to use the Google Sheets XML Import function.
Importing an XML Table
This example imports data from a table into a spreadsheet. Use this Wikipedia page as an example. Import the table into the “Main Species” section.
To do this you need to follow these steps:
- Open your browser and go to the website. Be sure to use a modern browser that can inspect the code of web pages. This example uses the chrome-based Microsoft Edge.
- Scroll down to find the table and right click on the XML element. In the dropdown menu, select the Inspect option.
- This will open the page’s source code in the sidebar on the right side of the screen. Here you can see how the data is tagged. It is recommended to right click on the exact element. However, as you hover your cursor over different elements, your browser highlights them on her web page so you know which one to choose.
- in this case, the data we want to retrieve is a table, so we write …. If you want to write this in an expression, write “//tr”
- Now go to your spreadsheet and click on the cell where you want to enter the formula.
- The first part of the IMPORTXML expression =IMPORTXML(
- Here you need to enter the first parameter, the link to the web page from which you want to import the XML data. In this case, write “https://en.wikipedia.org/wiki/Fish_farming”. Be sure to add quotation marks.
- Then add a comma (,) to separate the two parameters.
- Create an XPath query for the second parameter. In this example, we saw the required query in step 4. Write it as “//tr” and add the quotes.
- Add square brackets to close the formula and press the Enter button to execute the formula.
Running formulas can add extra data to your spreadsheet. You can nest this function in other Google Sheets functions to clean up your data.
However, for small tables like this, there is another simpler solution. Simply select and copy the data you want to save in your spreadsheet. Then click the cell where you want to paste the data and paste. This will remove the formula from the data. This means that the formula will not update when changes are made to the web page.
Related reference: Limitations of Google Sheets
Importing an XML Class
Similar to what you did in the previous method, you can use IMPORTXML to import XML classes from your website into your spreadsheet. In this example, we have another page from Wikipedia and extract the header from it. Here are the steps you need to follow for that:
- Open your browser and go to the website.
- Right-click an XML element. In the dropdown menu, select the Inspect option.
- look for the name of the class of data you selected. In this example, the class is named mw-headline and resides in . The XPath tag used here is “//span[@class=’mw-headline’]”. Be sure to add quotation marks.
- Now go to your spreadsheet and click on the cell where you want to enter the formula. Write the first part of the IMPORTXML expression (=IMPORTXML(.
- Here you need to enter the first parameter, the link to the web page from which you want to import the XML data. In this case, write “https://ja.wikipedia.org/wiki/Meat”. Be sure to add quotation marks.
- Then add a comma (,) to separate the two parameters.
- Create an XPath query for the second parameter. In this example, we saw the required query in step 4. Write it as “//span[@class=’mw-headline’]” and be sure to add the quotation marks.
- Add square brackets to close the formula and press the Enter button to execute the formula.
Frequently Asked Questions
What Is the Google Sheets IMPORTXML Function?
XML stands for eXtensible Markup Language. It is a language similar to the HTML language. However, there is one difference between the two. The XML language does not have predefined tags to use. Instead, you can define your own tags that meet your needs.
The IMPORTXML function in Google Sheets imports structured data from multiple sources. This includes XML, HTML and TSV.
How Does the IMPORTXML Function Work in Google Sheets?
IMPORTXML is a feature that comes with Google Sheets. Can be used to collect publicly accessible structured data from websites. IMPORTXML allows you to import data from structured data such as CSV, XML, RSS, TSV, ATOM XML feeds, and HTML.
The formula syntax for importing XML into Google Sheets is:
=IMPORTXML(link, xpath_query)
This formula has two parameters that are required for it to work. The first link is used to define the web page link that the formula checks. The second query is the xpath_query, the query to run on the data. For example, use “//h2” to find all H2 headings from a web page
What Are Some Common XPaths?
Here are some commonly used XPaths you should know:
- a “//@href” query can retrieve all links on a web page.
- To extract all internal links on a web page, use “//a[contains(@href, ‘example.com’)]/@href”.
- To extract all external links on a web page, use “//a[not(contains(@href, ‘example.com’))]/@href.
- use the “//title” tag for the title of the page.
- for meta descriptions, use the “//meta[@name=’description’]/@content” tag.
Wrapping Up the Google Sheets IMPORTXML Guide
After reading this IMPORTXML Google Sheets guide, you should have a thorough understanding of how to scrape XML data from a website into Google Sheets. Having worked on this function, it should be much easier to learn other import expressions. It’s often best to start early by learning as much as you can about Google Sheets before problems arise. That way we can resolve the issue more quickly. With that in mind, check out our Google Sheets ranking of the best online courses to get a head start.