Dave Tucker

Calculating the Value of a Book Collection with Python

Posted on February 28, 2016
Stacks of books

I have a number of used books I'm interested in selling.  While I'm happy to buy used books online for 1 cent, selling them at that price is really not worth it to me.  I needed a way to evaluate a large number of books and see what they might be worth, both individually and as a group.

I found the site bookfinder.com and saw that it provides good price data based on a book's ISBN.  However, doing this for each individual book would take too long.  I decided to script the approach.  Initially I toyed with the Amazon API for getting book information, but I was having trouble getting good data as a single ISBN was returning multiple books, with no great way for me to determine which was the one I wanted.

The code below requests pricing data from bookfinder.com, scrapes the prices (and title) and computes the minimum, maximum, mean, and median used price for each book.  I was surprised how few lines of code it took to do this.  Most of the hardest work was done through the BeautifulSoup library.  The results are printed in CSV format so I can open the data in excel and play with it.

The program is executed as such: cat isbns.txt | get_book_values.py > results.csv

As a result, I know I have 45 books to sell worth at least $282.79, or $430.45 on average.


import sys
import numpy as np
from decimal import *
from bs4 import BeautifulSoup
import urllib2

# Parses HTML to extract a list of decimal used book prices
def extract_name_and_prices(html):
    bs = BeautifulSoup(html, 'html.parser')

    # Find the title span and extract its text
    title = bs.findAll('span', { 'itemprop': 'name' })[0].text

    # Get the td area for used prices (not the one for new prices)
    used_prices_area = bs.findAll('td', {'valign': 'top', 'align': 'left'})[1]

    # Sanity check
    if not used_prices_area.h3.text.startswith('Used books'):
        raise Exception('Failed to find used prices area!')
    # Get the spans that actually hold the price data
    price_areas = used_prices_area.table.findAll('span', { 'class': 'results-price' } )
    # Convert each price to a decimal value and add to list
    prices = [Decimal(price.a.text.strip('$')) for price in price_areas]
    return title, prices

# Calculates statistical properties of a price list
def calc_price_data(price_list):
    result = dict()

    result['Min'] = np.amin(price_list)
    result['Max'] = np.amax(price_list)
    result['Mean'] = np.mean(price_list)
    result['Median'] = np.median(price_list)

    return result

# Grabs HTML from bookfinder.com for a given ISBN
def get_price_html(isbn):
    URL_BASE = 'http://www.bookfinder.com/search/?keywords=%(isbn)s&new_used=*&lang=en&st=sh&ac=qr&submit='
    url = URL_BASE % {'isbn': isbn}

    result = urllib2.urlopen(url).read()
    return result 

# Print the header row for the CSV results
def print_csv_header():

# Print a row of the CSV results
def print_csv_row(isbn, title, price_data):
    sys.stdout.write(isbn + ',')
    sys.stdout.write('"' + title +'",')
    sys.stdout.write(str(price_data['Min']) + ',')
    sys.stdout.write(str(price_data['Max']) + ',')
    sys.stdout.write(str(price_data['Mean']) + ',')

# Get title and price statistics for a list of ISBNs
def process_and_print_isbn(isbn):
        html = get_price_html(isbn)
        title, prices = extract_name_and_prices(html)
        price_data = calc_price_data(prices)
        print_csv_row(isbn, title, price_data)

for isbn in sys.stdin:

Results look like this:

9781413321180,"Nolo's Essential Guide to Buying Your First Home (Nolo's Essential Guidel to Buying Your First House)",11.76,22.74,18.8328,19.05
9780895262479,"Coolidge, An American Enigma",4.06,8.18,5.7048,5.17
9780812979275,"Buffett: The Making of an American Capitalist",6.12,9.04,7.3008,6.88
9781598808360,"Rick Steves' Mediterranean Cruise Ports",3.48,4.97,4.0176,4.00