Scraping Results From a Page (Building 2 Automations)
Scraping Results From a Page (Building 2 Automations)
Unleashing the Power of Automation: Scraping Horse Data with Task Magic
Are you looking to streamline your data collection process and automate repetitive tasks efficiently? In today's digital age, automation has become a game-changer for businesses and individuals seeking to boost productivity and save time. In this blog post, we will delve into the world of automation by exploring a step-by-step guide on how to scrape horse data from a website using Task Magic, a powerful automation tool.
The Power of Automation
Automation is revolutionizing the way we work by allowing us to automate mundane and repetitive tasks, freeing up valuable time to focus on more strategic and creative endeavors. From data scraping to workflow automation, the possibilities are endless when it comes to harnessing the power of automation tools like Task Magic.
Building the Automation
Our journey begins with the task of scraping horse data from a website. Using Task Magic, we can create a sophisticated automation process that not only extracts the desired data but also organizes it efficiently for further analysis. The script outlines a detailed procedure on how to achieve this seamlessly.
Step 1: Scraping Horse Information
The first step involves scraping all the necessary details about the horses from the website. By exporting the names and links to a Google sheet, we can then loop over the data to retrieve additional information about each horse. This process ensures that we have a comprehensive database of horse details at our disposal.
Step 2: Setting Up the Scraping Automation
The script illustrates how to set up the scraping automation using Task Magic. By defining specific steps such as clicking on thoroughbreds, scraping names and links, and recording page navigation, we can automate the data collection process effectively. Task Magic simplifies these tasks, making it easy to create a seamless automation workflow.
Step 3: Enhancing Data Collection
Once the initial scraping automation is completed, the script guides us on setting up a second automation to further enhance the data collection process. By looping over the URLs from the Google sheet and extracting detailed information about each horse, we can create a dynamic system that captures all the necessary data seamlessly.
Streamlining the Workflow
By leveraging automation tools like Task Magic, we can streamline our workflow and significantly reduce the time and effort required for manual tasks. The ability to automate data scraping processes not only boosts efficiency but also ensures accuracy and consistency in the extracted data.
Conclusion
In conclusion, automation is transforming the way we approach tasks that were once time-consuming and laborious. By following the steps outlined in the script, we can harness the power of automation to scrape horse data efficiently and effectively. Task Magic provides a user-friendly platform for creating automation processes that simplify complex tasks and optimize workflow efficiency.
Unleash the power of automation with Task Magic and revolutionize the way you work. Start automating your tasks today and experience the transformative impact of streamlined workflows and increased productivity.
Video
Steps
Step 1-Enter web address in Prompt window—Click on Save go to URL
Step 2- Click on Click option for recording
Step 3-Click on thoroughbreds—Click on Confirm
Step 4- Click on Scrape a list
Step 5- Click on Horses name and Link to scrape in Column —Click on Confirm
Step 6- Click on Click step
Step 7-Click on Click arrow—Click on Confirm Button
Step 8- Click on Trigger
Step 9- Click on List- Click on Continue
Step 10- Enter the excel row number—Click on Continue
Step 11- Open google spreadsheet- Write Horse in excel and select-Click on Share-Send it to taskmagic.com—Click on Done
Step 12-Copy the spreadsheet link
Step 13- Click on Send to Google Sheets—Paste the link in URL —Select Sheet 1—Click on Looks good.
Step 14-Click on Rename automation—Rename it—Click on Save
Step 15- Click on Build—Click on New automation
Step 16-Click on Web- Click on continue
Step 17-Grab the link from sheet—Paste it in URL—Save go to URL
Step 18-Click on Scrape single again
Step 19-Select Horse details one by one —Click on Add new column every time—Click on Confirm
Step 20-Add new sheet—Add details in excel column
Step 21-Again Click on Trigger—Click on list—Click on Continue
Step 22-Paste the sheet link—Select sheet—Click on looks good
Step 23-Click on Use from a google sheet—Select automation row—Select loop row—Tick on Loop row—Click on Continue
Step 24- Go to Page—Select @Links—Click Save
Step 25-Now click on Play steps to run the automation
VIDEO TRANSCRIPT
Okay. So we are going to build an automation that scrapes, oh, scrapes all of the horses from this website. So we'll see here that we have a list. We have a bunch of names, their price, things like this, and then we can click to see more information about the horse with some information, some photos and things like that.
So the way that we're going to build this is we're going to use a Google to export all of the names and the links. To the horse. So like this name and then this link, cause clicking this opens a link here. We're going to export all of that to a Google sheet. And then what we can do is we can loop over that Google sheet, go to every page and grab the extra details that we want.
So what this looks like is if I go and I grab the URL, this is all I'm going to need is this URL that I want to record with, and then I go to task magic. And I click new automation, web, and then we don't need to use cookies for this. We can start building this.
So first step is going to be going to this new vacations website. So I'm going to enter that URL, then click save and go to page. And then we'll see that on the right here, we have now gone to this page. We have all of the horse here. And if we scroll down, we have the next page buttons that we can click to load more of them, which will come in handy a little bit later.
So first step is going to be, as soon as we load this page, we're going to want to click, um, for this particular use case, we wanted to just scrape thoroughbreds. So I'm going to click thoroughbreds, then click confirm. And we'll see that that clicks this on the page. Now, Next is going to be scraping a list of all of the names and links.
So I'll click scrape a list. Then I'm going to click on the name here and then I'll click on the second name and we'll see that this finds the rest of the records on that page automatically for us. If it didn't, we could just continue clicking until everything that we want to highlight In that column has been highlighted.
So in this case, it was just the name of the horse. Now I'll click add new column so that we can scrape the link to the horse. I'm going to go ahead and click the first button here, and we'll see that this found the rest of the results on the page. All we need to do is click the three dots here. That's a little hidden.
And then click scrape links. And now we're grabbing the specific link to a specific horse. Then I'll confirm the step. Lastly, we only need to record a click of the next page button. That way we can loop over, um, every page of listings here. So I'll click, click, and then I'll record this click step, but confirm, and we'll see it do that click step after we record that.
And now we are on page two. So that's it for setting up the scraping automation. As far as recording steps, all we need to do inside of task magic, we can dismiss these actually is set up our list trigger. So to do that, we're assuming, um, that we're going to be having a Google sheet here, and this is the Google sheet.
We're going to send all of that data to, um, I guess the list trigger actually plays into both backtracking a little bit, sorry. Let us set up our trigger. Um, and then we'll set up sending to this Google sheet. Okay. Apologies for that. Yeah. First, we're going to be using the list trigger as a really easy way to handle clicking this next page button.
So all we'll do here is click here, click list. And then we can click continue anyways, since yes, we want to use this. And then here for the amount of times we want to loop, we're just going to enter the amount of pages on the website. Um, which in that case was four, we could enter five. It's okay. If we're wrong here, we just want to make sure that we, um, enter enough that it clicks through every page that was there.
This can be improved to use some other things. Um, but this is a very simple way to do it. which works perfectly fine in this use case. Next, what we want to do is we want to adjust our loop so it only repeats the scraping and the clicking of the next page button. Um, and it does not repeat loading the URL every single time.
So to do that, we're going to drag this page. Backslider. All the way to the end there because we always want it to run this next page. Click. Then the first slider is going to go to the scrape a list step, which is where we are grabbing the name for every horse that's on the page. So our loop will look like this.
We're going to run step one, run step two. Um, Okay. Then we run step three, step four, click next page, and we restart from this scrape step, um, which is going to be scraping the next page of results five times because we put five in our loop trigger here, which means it runs these steps five times. So it clicks next page that five times.
Okay. So now that that's done, I am going to set up sending this to my Google sheet. So I have a Google sheet here. I'm going to just name this horse for now. Um, and then we need to share this with automations at task magic. com. So I'll just quickly answer that and then click send. Now we grab the URL from the Google sheet and we're going to paste this inside of task magic.
So I'll click back into task magic, click, send to Google sheets. And then I'll paste my URL here and then I'll select my worksheet and I'll see the headers that I have and the steps that will go to the corresponding header. We'll click looks good and we are good to go. So now when I click play steps, this is going to go to this website.
Click thoroughbreds, scrape a list, scrape the links, click next page, and then go back to scraping a list five times. That way we go through every page of results here.
So there's it scraping the name after it clicked thoroughbreds. Next, it's going to grab the link. I'll just scroll down a little bit so we can see that get highlighted.
And then we click next page. We're scraping the names again. That green border might not show that well on this button, but if we go back into the app, we'll see that there were 12 results scraped for both. Uh, scrape steps there. Next page has been loaded and we will just continue scraping these. I'll just pause until the results are in our Google sheet now, since it's just going to repeat this for the rest of the pages.
Okay. So now that this automation is finished, it ran through all of our steps. It completed the browser closed. Everything was great. Uh, we will see the names and the link to every horse in this worksheet. Now, what matters here is that we want to grab all of the details from this page inside here. So this page is what we would see if we had clicked on a horse, um, to see its results.
So if I click view more, or if I go to this page, it's going to load all of these results. So we're going to set up a second automation that loops over our Google sheet and it grabs the rest of the details there. So first, I'm going to rename this just so I don't lose this scrape all, and then we're going to make another automation that loops over all of them.
So to do that, I'll click back into my workspace here, clicking on the left, clicking up here, clicking home, whatever you want to do, and then click new automation web. And then not on this one, because again, we're not using cookies here. Now, what we're planning to do is we're going to loop over. The URLs from our Google sheet.
So we have a URL like this in our Google sheet, and that's what we're going to be visiting to grab the details. You can see how there's something different. There's a different number for every horse here, which is why it's loading a different horse each time or the correct horse for that matter. So to record this, we only need to record it like we're doing this perfectly once.
So I'm going to grab this link so that I have an example to work with. And then I'll click save and go to URL. This step is going to be updated later on to use a variable instead of just always going to this page. I'll go ahead and close these since that doesn't matter. And now I'm back in just my chromium window.
Which has this specific page opened and I can add my scrape steps to learn more about the horse. So I'll click plus scrape single, and then I'll click on whatever I want to scrape like this. So I'll click here, click on Nigeria, then click add new column. If it had a barn name, I would want to scrape it.
So we'll record that click step. Now, um, if we really needed to, we could find a horse with this barn name, but. Um, this will work for this example, since it'll know to always grab this box, then I'll click add new column. We'll get mayor. Oh, I missed 2021. If we mess up our selection, we can just reclick because we're still working in that same column.
We only need to click add new column when we want to move on to the next step. So 2021 mayor 16. 2, uh, and whatever else we wanted to grab here. That was important. I'll just wrap up with the government grabbing the city. So that's that. Those are all of our scrape steps. Um, and that's really all we need for this automation.
Um, of course, if we wanted to grab images and things like that, we can just set up additional steps to grab the other details here. So I'll click I'm done. And then all we need to do next is set up our loop that loops over our Google sheet list. Whoops, wrong one again. Our Google sheet list, and it goes to every page and then it's going to export all of those details to this other Google sheet that I'm making here.
Um, I'm just going to enter a bunch of empty headers here. Obviously, you should have something more descriptive for your flow. Um, but here we have a bunch of headers, which is going to allow us to export all of those steps I had just recorded. So, to set this up, I'm going to click in my trigger. I'm going to select list, because that's what I want to build as a list trigger.
I'll continue through that. And now we just set up our Google sheet connection and how many times we want to loop. So I'm going to click use list from a Google sheet. Since we want to loop over all of these details from this page, I'll click set up sheet connection, and then I'm going to enter the Google sheet here.
This is a sheet that maybe we have to share with automations that if it's a different one, um, or if we haven't shared that before, but we did that in the previous automation, then we want to read from this worksheet with all of our results. And we'll see that those headers are there too. And again, we can just double check that.
Um, by clicking in here now back to editing our automation, we'll see that we have a couple new options here, such as which row we want to start at and how many rows we want to loop. So these three settings are really important depending on how you want to run this automation. If I want this to grab the details for every single, um, Item in this Google sheet, then I should be setting it up to loop the same amount of rows as I have here.
So this is starting at two and it's going down until 41. Um, so we should make this loop 40 times because the numbers are going to be off by one because of the headers. So I'll enter 40 here. And what this is going to do is this is going to start at row two and it's going to run 40 rows worth of information, which ends up stopping us at row 41 there.
So then if we wanted to only run, let's say 10 at a time, we could make this 10. And the first time this runs, it'll run 10 times, which will end up changing this number to 12. After it runs again, it'll be 2232 and so on. So this number has the ability to update this number. That way you can move down a list.
If you don't want to do that, you can enter two as your starting row here. And then you can say, stop row from incrementing. And this will always run with the same 10 rows of data every time. And this example, um, we just want to scrape all of these results one time so that we can start building our list.
And then we're going to end up having this automation, um, run repetitively, uh, down the road. So right now, just to initialize everything that we need, get all of these results now and get started with something, I'm going to make it start at row two and run for 40 rows. Now, when we connected this Google sheet, we also added some headers, um, from this Google sheet as variables.
So anything from our Google sheet that was in this first row, its data can be used as a variable, which is how we're going to build what we're building here. The first go to page step instead of being new vacations. org is going to be Uh, links, and then we'll click save. What that does is that makes sure that when this runs and it grabs these 10 rows of data, it replaces it with the links value instead.
So the first time this runs, if it's grabbing 40 rows of data, the first run is going to be using this row of data. If we say at links, it would replace with this. If we say at name, it would replace with this. After it runs all of these steps and it starts its next loop, which we'll go over, it's going to be using this data.
And now when we say at links, it posts with this or with this for at name and so on until it moves its way down the list. Now to make the list trigger, do what we want in a loop over the correct steps. We need to update this to go all the way from the beginning. To the end, what this means is it's going to go to the URL from our Google sheet.
Then it's going to scrape a bunch of different items here. Then it's going to restart from step one with the next URL and repeat these scrape steps. And that's going to allow us to move our way down this list until we finally export all of them to this Google sheet by setting up our send to Google sheet function.
So I'll copy the Google sheet again. I'll click send to Google sheets. paste that there, select the worksheet, we want to send to, and we'll see the data that is going to be sent to the correct columns. If I click, looks good, we're good to go. And now we can run this automation. I'm going to just make this loop five times for the purposes of a video and then I'll click play steps.
So what this will do is this is going to use the first five rows of data from this Google sheet and loop over each URL here until it exports all of those scrape details to my new Google sheet.
All right. I think this is the last one. So we see on the left, it's scraping all these details. This URL is changing after it finishes scraping, which is our step one step running. And then at the end of this, we will see all of those details show up in our Google sheet. So here we go again, remember that one horse didn't have a barn name, which is why this is empty.
These other ones look like they do. Um, but that is the automation. If we were to run for more loops, which again, I edited right here to be five, this would have grabbed more details instead of just those five. But we can see that the names Nigeria more than tough. Uh, I do what I do, et cetera, is all matching up to what is being scraped on this side.
So that's our automation to scrape all of these horse details. We can set up some additional logic with apps and things like that so that when new rows are added, we can check for duplicates and automatically kind of filter our database to make sure it does. Yeah.