Using Headless Chrome to Convert Website Div Content into Text String or ASCII

Updated by Tim Rabbetts on Thu, 01/02/2024

Converting the content of a website's div element to text or ASCII format can be useful in various scenarios, ranging from data extraction to automated testing. One approach to achieve this is by using Headless Chrome, a powerful tool that allows running Chrome browser in a headless environment, making it ideal for web scraping and automation tasks. To begin, ensure you have Google Chrome and the required dependencies installed on your system. Then, let's dive into the process of using Headless Chrome to extract the div content and convert it to text or ASCII format. 1. Setting up the environment: Firstly, install the necessary dependencies by running the following commands: ``` npm install puppeteer ``` 2. Writing the Headless Chrome script: Create a new JavaScript file, let's call it `divToText.js`, and insert the following code: ```javascript const puppeteer = require('puppeteer'); (async () => { // Launch Headless Chrome const browser = await puppeteer.launch(); const page = await browser.newPage(); // Navigate to the target website await page.goto('https://example.com'); // Replace with your target website // Extract the div content const divContent = await page.evaluate(() => { const divElement = document.querySelector('div'); // Replace 'div' with your desired div selector return divElement.textContent; }); // Close Headless Chrome await browser.close(); // Output the div content console.log(divContent); // You can modify this to print or further process the content // Embed the div content within HTML tags const htmlContent = `

${divContent}

`; console.log(htmlContent); })(); ``` Remember to replace `'https://example.com'` with the target website of your choice. Additionally, modify `'div'` to reflect the specific div element you want to extract the content from. 3. Running the script: Now, in your terminal, navigate to the directory where your `divToText.js` file is located and execute the script using the Node.js runtime environment: ``` node divToText.js ``` After running the script, you will see the extracted div content both as plain text and embedded within HTML p tags. By using Headless Chrome and Puppeteer, you can easily automate the extraction of div content from websites and convert it to different formats, such as plain text or ASCII. This powerful tool greatly simplifies the process, allowing you to efficiently extract the required information for your specific use case.