Deep scraping on a podcast website using loop action and loop/list trigger in one flow

Deep scraping on a podcast website using loop action and loop/list trigger in one flow

Automating Deep Scraping on a Podcast Website: A Step-by-Step Guide


Have you ever found yourself wanting to extract a large amount of data from a website but dreaded the manual process of going through each page and collecting the information one by one? Well, fret no more because TaskMagic can make this tedious task a breeze. In this blog post, we will delve into the world of deep scraping on a podcast website using the Loop Action and Loop-List Trigger features of Task Magic.

Scraping Every Name on the Cards

The first step in this automation process involves scraping all the names from the cards on the website. By adding a scrape list step, Task Magic allows us to extract all the names present on the cards, giving us a comprehensive list of items to click on. After confirming this step, we proceed to click on the cards and scrape the details from each one, ensuring that no information is left behind.

Implementing the Loop Step

Once we have collected all the names, the next crucial step is to add a loop that will iterate over each item we have scraped. By passing the scrape list step into the loop, Task Magic enables us to automate the process of clicking into each card, scraping the necessary information, and then moving on to the next item seamlessly.

Recording and Refining Steps

Within the loop, we record specific actions such as clicking, scraping details, and closing pop-ups to streamline the automation process. By meticulously documenting each step, we ensure that the automation flow is efficient and error-free, saving us valuable time and effort.

Handling Pagination and Next Page Navigation

As we progress through scraping all the names and details, we encounter the need to handle pagination and navigate to the next page for additional information. Task Magic simplifies this process by allowing us to record a click step to access the next page icon and ensure that our automation flow continues seamlessly across multiple pages.

Setting Up the List Trigger for Iterative Scraping

To automate the process of moving through multiple pages and scraping a new list of individuals, we utilize the List Trigger feature of Task Magic. By setting the loop to run a specific number of times corresponding to the total pages on the website, we can efficiently scrape all the data without the need for manual intervention.

Conclusion

In conclusion, automating deep scraping on a podcast website using tools like Task Magic can significantly enhance productivity and efficiency when dealing with large amounts of data. By following the step-by-step guide outlined in this blog post, you can streamline the scraping process, eliminate repetitive tasks, and focus on more strategic aspects of your project. Embrace automation and revolutionize the way you extract information from websites effortlessly. Happy scraping!


Video


Steps

Step 1- Click on Scrape a List

Notion image
 

Step 2- Select the name and click on confirm

Notion image
 

Step 3- Click on Loop to record a loop step

Notion image
 

Step 4-Select step 2 and click on save

Notion image
 

Step 5- Click on green + icon

Notion image
 

Step 6- Click on Click button

Notion image
 

Step 7-Select the name and click on confirm

Notion image
 

Step 8- Click on Scrape button

Notion image
 

Step 9- Now scrape all the details of Donald and click on confirm

Notion image
 

Step 10- Again code a click step

Notion image
 

Step 11- Click on X icon and click on confirm

Notion image
 

Step 12- Now go back and click on recod

Notion image
 

Step 13- Click on Green + Icon

Notion image
 

Step 14- Click the + icon and click on click to record next step

Notion image
 

Step 15- Click the next page button — Click on Confirm

Notion image
 

Step 16- Click on three dots and and update selectors

Notion image
 

Step 17- Enter the selector as next page and click on Save

Notion image
 

Step 18- We also can add delay step

Notion image
 

Step 19- Click on trigger

Notion image
 

Step 20- Click on List

Notion image
 

Step 21- Write the number of pages we want to loop — Click on continue

Notion image

VIDEO TRANSCRIPT

Okay, so first part of this is going to be scraping all of the names from these cards. Let me just quickly show you, in Chrome at least, what I'm making here. So, we are going to want to scrape every single person. We need to go through every single page. And we need to click each card and scrape the details from that card, uh, just to, you know, scrape everything.

So here is that. So first step is going to be adding a scrape a list step, and this is going to allow us to get all of the names on the card, which is also giving us a list of all the items we want to click. So when I confirm this step after clicking the first and the second names in the list right here, Donald and Alexander.

This is going to highlight the rest of them, and we can confirm this step. This is a scrapeless step that adds and finds all of these elements on the page. All of the names. Next what we can do is we can add a loop step that is going to loop over every item we just scraped. So we pass it the scrapeless step, and then we can save that.

So now we go to a page, we scrape all the names, and we add a loop. Now inside of the loop, we need to record a couple of steps. We need to click into it, and then we need to scrape, and then we need to close the pop up. So first step is going to be clicking plus, and then click, and then I can select the first one on the page.

When I select Donald here, Task Magic will figure out that you're trying to iterate over all of the ones on the page. So after we click Donald, we need to scrape his details, so I'll click the plus icon, and then scroll down to scrape. And we're gonna get his name, we don't need to get his name again, we already have that.

I'm going to get his title, the bio, and then instead of getting individual sessions, we're just going to get the entire sessions block. Um, I don't really know the purpose of this, but, um, that's just what I'm doing. At least if you want to get really creative with this, you can, um, but you can also just break this up this way.

If it's important. Let me know. I'll adjust this tutorial. So we can confirm that step and that scrapes all of our details. Next, we need to close this pop up so that we can move on to the next item. So I'll record a click step of the X icon here, and then I can close this or confirm that. And that is it.

So let me quickly check over this. So we Go to a page, scrape all the names, then we're going to loop over all of the names. We're going to click Donald, scrape the sessions he has, his bio, his title. It looks like I might have recorded that twice. I don't know what I did there. Anyways, and then we click the close button to close that pop up.

After we do that, Task Magic is going to start on the next item. Uh, because of the loop and it's going to click on Alexander, whoever the next person was, I'm going to go back into the recording window. Um, I just wanted to kind of see all those steps laid out again, and we can add in the pagination for this actual like next page section here.

So what our automation is doing is it's scraping every name, then it's clicking the name, scraping details, and clicking the close icon. After it does it for Donald, this loop step is going to automatically tell it to move on to Alexander and so on. After this loop step finishes though, we need to record a click step of this next page icon here.

So I'll click the plus icon. I need to click the plus icon. That's outside of the loop step. It's a little bit easier to see this when you expand the window. Um, but we have the loop here, which has these steps. And then under it is going to be the scrape or the main flow. So we want to add this as step four, not something like step 3.

  1. So I'll click plus. And then a click step. And then I'm going to click the next page button here now, because this website layout changes, we're going to want to quickly update this selector. And it was really easy. In this case, I can click the three dots and then advanced settings. And then, um, that's not what I meant to do.

Yes. Three dots and then update selectors. And then we're going to grab this title equals next page selector, and we're going to copy and paste this to make it our first and second option. So I'll paste that here and here, and then I'll delete these extra selectors and then save this. And this will make sure that we always click that next page button, which we can test by clicking the play icon a bunch in here,

uh, trying to think. We may want to add a delay after clicking next, just so that we have just a little bit of time for this page to load, and that is going to be all. Go to the page, scrape a list, loop over all of them, and then click next page. Now to make our next page click loop, we need to add the list trigger.

So I'll click setup, and then I'll select list, and then the amount of times I want to loop is going to be the amount of pages there are here, which is 23. So I'll enter 23 here, Continue. And then we need to adjust our slider. Now I'll explain why after it's going to be this, basically what we want to do is we want to go to the podcast show and then we want to scrape a list, right?

I won't repeat all of this a ton. We loop over all of those cards and then we click next page. After we click next page, we don't want to reload that website. We just want to scrape a new list because we have a new list of people. And that's what this loop does. It runs the first step, runs the second step, and so on until the fifth.

Um, then it will go back to step two, which re scrapes the items. Which also reruns this loop, which then allows us to iterate through every single person and every single page.

Did this answer your question?
😞
😐
🤩