The Legislator

Poking around the NYTimes API, I noticed they had information on recently introduced bills to congress.  Personally, I think it’s hilarious and amazing that legitimate bills being brought to congress consist of things like A bill to establish the composition known as America the Beautiful as the national anthem side by side with Authorization for the Use of Military Force against the Islamic State of Iraq and the Levant.  These are both real bills, going through the senate at roughly the same time.  The language used for titling bills is pretty interesting.  Bold action mixes with the mundane in bill titles like Robocall Enforcement Improvements Act of 2014.  Some titles are simply undecipherable unless you’re in the know: SCRUB Act of 2014.

So I thought it’d be fun to try to take actual language used in these bills, mix them around, and come up with new bill names.  But I didn’t just want to randomly grab words out of a hat, so I tried to give some structure to the sentences.  This meant identifying the parts of speech of each word from each bill title.  Once the words were properly catagorized into their parts of speech, I could create a structure for my python based congress-person to spit out bill title after bill title.

Backing up a bit, the first step was to actually get bill titles to work with.  So I used the python requests library to make a call to the NYTimes API.

import requests
import json

with open('cred.json') as cred_file:
    creds = json.load(cred_file)

recentlyIntroducedBillsURL = 'http://api.nytimes.com/svc/politics/v3/us/legislative/congress/113/senate/bills/introduced.json'

#Take url endpoint, return json data
def fetchData(url, key):
    r = requests.get(url + '?api-key=' + key)
    return r.json()

data = fetchData(recentlyIntroducedBillsURL, creds['CONGRESS_KEY'])
bills = data['results'][0]['bills']   #extract bills from data structure

I loaded my API credentials from an external file using the json module.  Then using the appropriate NYTimes API endpoint, could pass the url and my creds to a function that would make the call, then return the json data.  This endpoint returned 20 bills introduced to congress in 2014.  Extracting the bills as a list would allow me to get at their titles .

Next all titles were extracted and added to an array:

titles = []
for bill in bills:
    titles.append(bill['title'])

Next all the titles were broken up and all the words placed into a single array:

# Decompose lists of strings into one big
# array where each index = one word
def soupify(obj):
    wordSoup = []
    for item in obj:
        # If this item is a string:
        if isinstance(item, basestring):
            arr = item.split()
            for s in arr:
                wordSoup.append(s)
        # If the item is an array, we have to handle it differently
        else:
            print 'notstring '

    return wordSoup

wordSoup = soupify(titles)

I envisioned my soupify function being flexible, so I could put either lists of lists, or lists of strings, or whatever into it, and it would break up all the individual words into one single array.  I didn’t implement it fully, but what’s it’s checking for now is just that each line is a string, then breaking up that string and adding each word to a master array.

Time to categorize those words. Python provides a library called nltk, which was relatively easy to get up and running with. After about 15 minutes of confusion, I managed to install nltk and use it to categorize my list of words using

import nltk

# Expects an array of words
def categorize(soup):
    d = {}
    text = nltk.Text(soup)      # Turn word soup into text
    tags = nltk.pos_tag(text)   # Tag each word with POS

    # Break individual pairs into one large dict
    # categorized by POS (part of speech)
    for pair in tags:
        # group words with like tags into arrays within a dict
        tag = pair[1]
        if tag in d:
            # tag already exists, just append new word
            d[tag].append(pair[0])
        else:
            # Tag doesn't yet exist, create an array
            d[tag] = [pair[0]]
    return d

words = categorize(wordSoup)    # Create dict of categorized words

Finally, I had to come up with a structure for my Bill creation algorithm.  Looking at my list of words, I picked an order of verbs, nouns, adjectives, and and articles that I thought would make sense.  It was helpful to create a helper function to get a random word from my dictionary by feeding it tags that I wanted to choose from.  For instance, I could say “give me a random word with either the tag ‘NN’ (noun), or the tag ‘NNP’ (noun, proper)”.  legislate() is what finally builds the Bill name.  The structure is controlled by an array of arrays, where each index of the parent array represents one word in the final sentence.  So in this example, the first word is a random selection from any of the tags VB, VBG, or VBD.  The second word is a noun with any of the tags NN, NNPS, NNS, NNP.  And so on.  The basic structure ends up being: Verb -> Noun -> Preposition or Conjuction -> Adjective -> Noun -> Preposition -> Determiner -> Noun

def getRandomWord(d, tagList):
    l = []
    for tag in tagList:
        for word in d[tag]:
            l.append(word)

    return random.choice(l)

def legislate(wds):
    # Structure:
        # 1. VBD/VB/VBG -> NN/NNPS/NNS/NNP -> IN -> JJ -> NN/NNPS/NNS/NNP -> IN -> DT -> VBG 

    sentenceStructure = [
            ['VB', 'VBG', 'VBD'], 
            ['NN', 'NNPS', 'NNS', 'NNP'], 
            ['IN'], 
            ['JJ'], 
            ['NN', 'NNPS', 'NNS', 'NNP'], 
            ['IN'], 
            ['DT'], 
            ['NN', 'NNPS', 'NNS', 'NNP']
    ]

    sentence = ''
    for i in range(0, 8):
        currentWord = getRandomWord(wds, sentenceStructure[i])
        if currentWord[-1] == '.':
            currentWord = currentWord[:len(currentWord) - 1]
        if i == 0:
            sentence += currentWord.capitalize() + ' '
        else:
            sentence += currentWord.lower() + ' '

    print sentence

And here’s some sample output:

Guarding land of national technology as the big
Remove delinquency in certain enforcement against the land
Provide trading of certain brothers of the filibuster
Improve compassionate of categorical southeastern of the forest
Allow beautiful in certain a of the act
Restore resolution of categorical brothers of the executive

Leave a Reply

Your email address will not be published. Required fields are marked *