How to get the title of a remote web page using javascript and NodeJS

Octocat Save up to a workweek a year by efficiently managing your coding bookmarks, aka #codingmarks, on Share your favorites with the community and they will be published weekly on Github. Help us build the programming-resources location - Star

When I add a new bookmark to my bookmarks collection, I set the title of the new bookmark, most of the time, the same as the title of the web page being bookmark - I assume the authors have put some thought into it. To make this happen automatically, I am using a technique called web scraping1. I cannot do it in front end (angular), since most of the URLs are outside of the domain. So the magic happens in back end, in NodeJS, with the help of a library called cheerio2, thank you Matthew for that. Cheerio is a fast, flexible, and lean implementation of core jQuery3 designed specifically for the server. Read on to learn how this works.

Octocat Source code for is available on Github - frontend and backend.

Check the dependencies

In the package.json make sure you have the cheerio and request dependencies:

     "name": "",
     "version": "1.0.0",
     "private": true,
     "scripts": {
       "start": "node ./bin/www"
     "dependencies": {
       "body-parser": "~1.15.1",
       "cheerio": "latest",
       "cookie-parser": "~1.4.3",
       "debug": "~2.2.0",
       "express": "~4.13.4",
       "jade": "~1.11.0",
       "keycloak-connect": "2.5.0",
       "mongoose": "~4.3.7",
       "mongoose-unique-validator": "1.0.2",
       "morgan": "~1.7.0",
       "request": "latest",
       "serve-favicon": "~2.3.0",
       "showdown": "^1.6.4"
     "devDependencies": {
       "nodemon": "^1.10.2"

Request4 is a simplified HTTP request client for NodeJS. It is designed to be the simplest way possible to make http calls.


var express = require('express');
var request = require('request');
var cheerio = require('cheerio');
var router = express.Router();
var Bookmark = require('../models/bookmark');
var MyError = require('../models/error');

router.get('/scrape', function(req, res, next) {
    request(req.query.url, function (error, response, body) {
      if (!error && response.statusCode == 200) {
        const $ = cheerio.load(body);
        const webpageTitle = $("title").text();
        const metaDescription =  $('meta[name=description]').attr("content");
        const webpage = {
          title: webpageTitle,
          metaDescription: metaDescription

What happens?

The URL of the webpage comes as a url query parameter under the `/scrape resource. Then the web page with the URL is requested. If there is no error and we receive an HTTP 200 OK status the body of the response is loaded into cheerio.

With Cheerio we need to pass in the HTML document.

Once loaded we call the text()5 function to get the content of the title element and also identify the meta description content. We return them both in a custom webpage object. If I am not satisfied with the title or the description I can edit it manually after.

If you found this useful, please star it, share it and improve it:

Octocat Source code for is available on Github - frontend and backend.

A note on web scraping from Wikipedia - “the legality of web scraping varies across the world. In general, web scraping may be against the terms of use of some websites, but the enforceability of these terms is unclear6. Since I am using the method to get the title for bookmarking and then reference back the link, Ì think I don’t do anything illegal, but be wary…


Adrian Matei

Adrian Matei
Life force expressing itself as a coding capable human being

How I save a workweek a year by efficiently managing my codingmarks

Finding a desired link, you already visited, can be a tedious job and sometimes even frustrating. It normally takes 30 seconds or more to look for a link the traditional way, and I do that at least 20 times a day. With it takes about 10 seconds, so I get a time saving of around 20 * 20 / 60 = 6.66 minutes a day. Over a year's time this translates to around 40+ hours of saved time. Continue reading

New codingmarks published in week 52 of 2018

Published on January 01, 2019