Vigyata.AI
Is this your channel?

Scraping the web with the help of AI - NodeJS/Puppeteer Tutorial

6.7K views· 198 likes· 18:25· Jul 27, 2023

🛍️ Products Mentioned (3)

Start scraping with Bright Data ► https://brdta.com/developerfilip GitHub Project Link ► https://github.com/FilipGrebowski/job-scraper-app Maximize your job search with Bright Data's scraping browser and AI -ChatGPT. This video explores how to extract hidden salary information from IT job listings at scale, seamlessly overcoming captchas and IP bans. Streamlining Puppeteer coding with AI simplifies data extraction, enhancing efficiency. Discover the power of AI and advanced scraping tools in transforming industrial-scale data collection for your job search. INQUIRIES AND COLLABORATIONS ► grebowskifilip@gmail.com 0:00 Learn about web scraping 0:29 The SCRAPING PLAN 1:36 How will we scrape job data? 2:43 The scraping PROBLEM 3:14 What is Bright Data? 3:49 Creating our Next.js Project 5:12 How to write your FIRST SCRAPER SCRIPT 6:56 The project overview 10:30 LETS SCRAPE!! 12:20 DATA (LOVE) 16:38 Using ChatGPT to write puppeteer code TRICK MY SOCIALS --------------------------------------------------------------------------------------------------------------------- Follow me on Twitter ► https://www.twitter.com/developerfilip Follow me on Instagram ► https://www.instagram.com/developerfilip/ Check me out on GitHub ► https://github.com/FilipGrebowski MUSIC BY ► @epidemicsound #developer #webscraping #data

About This Video

The internet holds a huge amount of valuable data—and to us as engineers it’s honestly beautiful. In this video I show you how to harness that data at industrial scale using Node.js, Puppeteer, and Bright Data’s Scraping Browser. The goal is simple (and it’s my biggest pet peeve): job ads that hide salary info. So we build a small app that scrapes indeed.com, extracts relevant job details, and only returns listings that match your criteria and actually include a salary. I walk through spinning up a Next.js project, refactoring the folder structure the way I like it, and wiring an API route that runs the scraper. Then we connect Puppeteer (puppeteer-core) to Bright Data via a browser WebSocket endpoint, so we can avoid the usual scraping pain: IP bans, bot detection, and captchas killing your workflow. You’ll see the scraper fill a CSV in real time, scale from small runs to hundreds/thousands of results, and then apply filtering (like requiring a £ sign) so “full-time” doesn’t pretend to be a salary. To finish, I show my favorite little trick: using ChatGPT to write Puppeteer extraction code. You copy an element’s HTML from DevTools, paste it into ChatGPT, and it can generate the selector logic for you—saving you from staring at nausea-inducing HTML when you just want the data.

Frequently Asked Questions

🎬 More from Developer Filip