Web Scraping with Cloud Scraper in Google Cloud Functions



Cloud Scraper is a library for Node.js for web scraping and other web related requests for use on sites that are protected by Cloud Flare, and the library is based on the Request library so it's naturally compact, simple, and easy to use.

The reason that Cloudflare sites are different is because they are protected with additional security, when you load a cloudflare site it loads a small page with a mathematical challenge, this is usually completed by your browser, however because we are not using a web browser the result is not the page we were intending to load.

Cloudscraper saves us time and effort by automatically loading the cloudflare page, completing the mathematical challenge, then loading the actual page we requested.

The following diagram illustrates this process.


NOTE: CloudScraper has now been officially depreciated, you may find that it still works for some sites but your results may vary and this library is no longer supported. If I find a suitable alternative I will update this page. 


How to Install Cloudscraper in Google Cloud Functions


  1. Navigate to cloud.google.com
  2. Launch the Google Cloud Shell from the top right, you may also read Launching the Google Cloud Shell Console
  3. Once the shell console loads type npm install -g npm to ensure npm is uptodate
  4. Next we need to install request and request-promise which are dependencies for cloudscraper.
  5. Type npm install --save request
  6. Next, type npm install --save request-promise
  7. Now we will install cheerio, a library to help fetch content from the page, to install cheerio type in npm install cheerio
  8. Finally we will install cloudscraper, type npm install cloudscraper
  9. You should now have Request Promise, Cheerio, and cloudscraper installed, time to create your first project using my free templates to get you started.

Create a Node.js project with Cloudscraper


  1. Create a new Google Cloud function
  2. Copy the following code into the index.js and package.js files like follows:
  3. Code to copy embedded below, you can also find the code on my GitHub repository

  4. Click Create, your function may take a couple of minutes to create, you should get a green tick once it has finished.
  5. Click the URL in the Trigger tab to test that your code works, it should fetch the page and render it on screen, this will tell you whether the code is working. NOTE: The lukestoolkit blog doesn't use Cloudflare, therefore you will receive a blank response.
  6. Now it's your turn to start developing, review the documentation for cloudscraper linked below.





Recommended Reading

Was this helpful?

Yes No


Comments