How to scrape sites with multiple pages (pagination)

How to scrape sites with multiple pages (pagination)

How do I handle pagination?

Introduction

Have you ever tried to scrape data from multiple pages at once? It can be a tedious and time-consuming process, especially if you’re dealing with a large number of pages. But with TaskMagic, you can automate this process and save yourself a lot of time and effort. In this tutorial, we’ll show you how to use TaskMagic to scrape data from multiple pages with just a few clicks. By the end of this tutorial, you’ll be able to extract hundreds of pieces of information quickly and easily.

 
☝️🤓 Fun fact on pagination!
🤓
Pagination is the process of dividing a big document like a book or a website into smaller parts so that it’s easier to read. It can also refer to the numbers or marks used to indicate the sequence of pages (as of a book) or the number and arrangement of pages or an indication of these. An example of pagination is when you’re reading a long article on a website and it’s divided into multiple pages. You can click on the numbers at the bottom of the page to go to the next page and continue reading.

Video tutorial on how to do it.

 

Or read the documentation down below.

To start this we first need to capture our automation¹. Click on the Capture Steps button to start to record. Make sure we are in the workspace we desire².

 
Notion image
 

After you click on the Capture Steps button, you'll see a countdown¹. Lets wait for the 3 seconds on the count down to finish. You’ll also see a pop-up window. That’s where we’ll navigate².

 
Notion image
 

How to handle Amazons Captcha.

 

Once the 3 second count down finished, let’s go to the URL¹ we want to scrape. In this case it’s amazon. But we have a Captcha² ³ ⁴ when we go to amazon. Down bellow we’ll elaborate on how to work around this.

⚠️
So, in this situation, we want to capture the captcha like you normally would in any browser. First we read what we need to input, after we see what we need to do to pass the captcha, lets input the answer³. When you are confident of the answer, submit it⁴ and you’ll go to the amazon homepage.
 
Notion image
 

Once we’ve passed the captcha, we can now start browsing Amazon¹ like we normally would. So for this example, we’re going o be scraping dresses because we want to know the price point for some of these dresses on Amazon. We go to the search bar² and we type down dress³ (only because we’re looking for dresses, if you are looking for “books” you would write down “books”). Our last step would be to enter our query, we can press the Enter button on our keyboard, or we can click on the button that inputs this query, in this case Go.

 
Notion image
 

Once we’ve clicked on the Enter button on our keyboard (or clicked theGobutton), you’ll see the results of what you typed in the search bar¹. Now we want to scrape the name and price of each article on this page. To do that we’re going to our Recording Bar and we click on Scrape: List².

 
Notion image
 

Because we’re going to be scraping a list (more information here Scraping a list vs individual text), we have to select 2 elements that are exactly the same. For this example we choose the names¹ and the ratings*. Because we are scrapping a list, it is important that WE DO NOT click on any Ad's* (or anything featured). It is important because they are NOT the same exact elements as every item on the page. Once you start scrapping, you’ll see the information stored on the Recording Bar¹.

 
Notion image
 

Once we are happy with our scrape¹ we Scroll down just like we normally would navigating any page and look for theNext² button. After you found the Next² button, we click on it.

 
Notion image
 

Once you see the next page¹ you are all set. You can now click theStop recording¹ button. You are now done with pagination! But, we still have a couple of adjustments to make so our automation can loop the amount of times you’d like!

 
Notion image
 

When you click theStop recording button you’ll see this pop-up. We can now send all this information to a Google Sheet¹ page, or to a Webhook ². in this example, we’re going to be sending this information to a Google Sheet¹page.

 
Notion image
 

Make sure your information is correct and that you’re on the desired page. Check the Spreadsheet name¹, page¹ and header¹. If everything is good to go, click onLooks good²

 
Notion image

In this case you used a URL.

If you go to your Google Sheet, on the top right corner you’ll this share button.

Notion image

In this option you can see a link icon on the bottom left of the pop up. This button is to copy the link. You’ll also have to share the automation to the following email: automations@taskmagic.com

 
Notion image

Make sure your information is correct and that you’re on the desired page. Check the Spreadsheet name¹, page¹ and header¹. If everything is good to go, click onLooks good²

 
Notion image
 

After we set up out sheet, you’ll see the automation page. Here you can do a ton of magic to your automation but the only piece of magic we need for this to work is our Setup trigger¹ button. If you want more information go to Setup trigger: What it does and how to use it.

 
Notion image
 

Once we’re in the Setup trigger menu, we have a lot of options, but for this tutorial we just need the Loop through data¹ option, so we click on that option.

 
Notion image
 

Once you enter the Loop through data option, you’ll see the Add in Google Sheet Datapage. This is just in case you want to set up a variable ¹ (-> for more information click here <-) so we can type in different items instead of the same item (like if we also want to search deodorant, shampoo etc etc, but in one automation) but we are going to Skip² for now.

 
Notion image
 

Once you’re done with the Add in Google Sheet Datapage you’ll see the Setup Loop Schedule page. This page is where you’ll set up the amount of times you want the automation to loop, lets say I want to scrape 5 pages I would write down “5[number]” in I want to use 5 [number] times ¹. Once you’re satisfied with the numbers of pages you want to scrape, click on Continue².

 
Notion image
 

Once you’re done with the Setup Loop Schedule you’ll see the Schedule to runpage. This page is where you’ll set up a schedule if that's what you desire¹. If you don't want to schedule a run (like me) you can just click on the Continue² button.

 
Notion image
 

Remember that captcha we had in the beginning? Well, it’s time to address it! We have two options. We can either Allow error¹ on these steps, or we can Delete Step² from the captcha capture steps (in this case we Allow error¹ or Delete Step² for all steps 1 - 4). Either of them work in this case because a captcha is a one time thing. Once we use the browser once, the information gets stored only in this automation.

 
Notion image
 

We double check the whole automaton to see if our steps were captured correctly¹. And we clean up any duplicate or unnecessary steps² (in this case I erased Step 5 and Step 6 because I already had Dress in Step 3 and enter in Step 4).

 
Notion image
 

Once our automation is clear and ready to go, we have one last step to make this magic happen. We need to step up our loop trigger¹. To set our trigger, we need to look for our first Scrape we did on the automation and the step on when we Click on the next button because we are going to be scraping and pressing next in a loop¹.

 
Notion image
 

In this case, the first scrape was aScrape many and it’s in step 5.

 
Notion image
 

And the step when we click the Next button was on step 9.

 
Notion image
 

So we set our loop from step 5¹ through step 9¹.

 
Notion image
 

Once your loop¹ is set, the only thing left is to Play steps² and see the magic happen!

 
Notion image
 

Watch the automation play¹ and go through all the pages you want².

 
Notion image
 

Do you see this message¹? That means you’re good to go and you can check your Google Sheet page².

 
Notion image
 

And there you have it, the secret sauce to conquering the world of web scraping across multiple pages – TaskMagic! We've taken you on a whirlwind tour through this nifty tool, showing you how to breeze through those pesky Captchas, scrape lists like a pro, and even set up your own looping magic.

Gone are the days of tediously flipping through pages and jotting down data. With TaskMagic, you can sip your coffee while it does the heavy lifting. So, whether you're hunting for the trendiest dresses on Amazon or the coolest books in town, TaskMagic has your back.

Now you're equipped to dive into the world of data extraction armed with a newfound superpower. Go ahead, hit that play button, watch the automation unfold, and see your Google Sheets light up with info. Let's scrape, shall we? Happy hunting!

Did this answer your question?
😞
😐
🤩