RWET Assignment 1

Playing Around

For this assignment, I tried a few different things.  Firstly, just to get comfortable with python, I spent time essentially recreating a Unix command, tr, just to gain some familiarity with Python syntax and structure.  It took about an hour, but eventually I was able to get a utility that perfomed roughly the same function as tr – which is replace characters with other characters.  Entire code for this can be seen athe bottom of this post, but the important aspects I learned from it were reading arguments from sys.argv, general loop structure, reading and writing files without redirecting input using ‘<‘, and basic string functions.  The result is that you could run:

python tr.py 'pmg' '123' fruit.txt

where fruit.txt is a list of fruit:

apple
pear
mango

and the output would be

a11le
1ear
2an3o

The Actual Assignment

Mimicking a Unix command wasn’t the assignment, but it was a good exercise for me.  For the actual assignment, I went a bit morbid and collected the opening paragraphs from the wikipedia pages for the 10 leading causes of death in the U.S.  I then piped the text into a script that jumps over words (the number of which is settable at runtime).

Here’s the code – it’s pretty straightforward.  For each line, it loops through all the words, and if the word is at an index that is evenly divisible by our input parameter, we keep that word, otherwise we ignore it.  Once it’s all done, we join all the saved words into a string and print it out.

# empty array to hold all new lines
newLines = []
for line in sys.stdin:
    # Break this line into array based on spaces
    lineArr = line.split(' ')

    # Array to hold this line's words 
    newLineArr = []

    # Iteration counter
    wordIndex = 0
    for word in lineArr:
        # If the index is divisible by our parameter, add the word 
        if wordIndex % skipValue == 0:
            newLineArr.append(lineArr[wordIndex])
        wordIndex += 1

    # Append the entire, altered line to our container array
    newLines.append(newLineArr)

for line in newLines:
    # Join the array items into a string
    line = ' '.join(line)
    # Print to stdout - making it pipeable to other unix commands
    print line 

Now for the output. After running all the text I pulled from Wikipedia through this little script, I found a few that I liked the most.
The entry on Alzheimers seemed exceedingly appropriate due to the fragmented nature of the output from this script.

Some choice output:
Alzheimers – Every fourth word

No reverse some symptoms.[2] rely assistance, burden the social, economic are to living improve behavioural due antipsychotics not to little increased death.[11][12]

Original

No treatments stop or reverse its progression, though some may temporarily improve symptoms.[2] Affected people increasingly rely on others for assistance, often placing a burden on the caregiver; the pressures can include social, psychological, physical, and economic elements.[9] Exercise programmes are beneficial with respect to activities of daily living and can potentially improve outcomes.[10] Treatment of behavioural problems or psychosis due to dementia with antipsychotics is common but not usually recommended due to there often being little benefit and an increased risk of early death.[11][12]

Cancer – Every third word

Tobacco the about cancer 10% to poor of and alcohol.[1][4] include exposure radiation, pollutants.[5] developing 20% are infections hepatitis C, papilloma These at by genes cell.[6] such are cancer 5–10% are genetic from parents.[7] be certain symptoms tests.[1] then investigated imaging by

Original

Tobacco use is the cause of about 22% of cancer deaths.[1] Another 10% is due to obesity, a poor diet, lack of physical activity, and consumption of alcohol.[1][4] Other factors include certain infections, exposure to ionizing radiation, and environmental pollutants.[5] In the developing world nearly 20% of cancers are due to infections such as hepatitis B, hepatitis C, and human papilloma virus (HPV).[1] These factors act, at least partly, by changing the genes of a cell.[6] Typically many such genetic changes are required before cancer develops.[6] Approximately 5–10% of cancers are due to genetic defects inherited from a person's parents.[7] Cancer can be detected by certain signs and symptoms or screening tests.[1] It is then typically further investigated by medical imaging and confirmed by biopsy.[8]

Suicide – Every fourth word

Views been existential religion, meaning Abrahamic suicide God belief of samurai seppuku a for a Sati, by expected to her either pressure and attempted illegal, in It offense In 21st the has rare medium kamikaze have a tactic.[11] from sui oneself".

Original

Views on suicide have been influenced by broad existential themes such as religion, honor, and the meaning of life. The Abrahamic religions traditionally consider suicide an offense towards God due to the belief in the sanctity of life. During the samurai era in Japan, seppuku was respected as a means of atonement for failure or as a form of protest. Sati, a practice outlawed by the British Raj, expected the Indian widow to immolate herself on her husband's funeral pyre, either willingly or under pressure from the family and society.[10] Suicide and attempted suicide, while previously illegal, are no longer in most Western countries. It remains a criminal offense in many countries. In the 20th and 21st centuries, suicide in the form of self-immolation has been used on rare occasions as a medium of protest, and kamikaze and suicide bombings have been used as a military or terrorist tactic.[11] The word is from Latin suicidium, from sui caedere, "to kill oneself".

Suicide – Every fifth word

Suicide intentionally Suicide as the frequently disorder disorder, alcoholism, well as interpersonal prevention to as mental and crisis is effectiveness.[4]

Original

Suicide is the act of intentionally causing one's own death. Suicide is often carried out as a result of despair, the cause of which is frequently attributed to a mental disorder such as depression, bipolar disorder, schizophrenia, borderline personality disorder,[1] alcoholism, or drug abuse,[2] as well as stress factors such as financial difficulties, troubles with interpersonal relationships, and bullying.[3] Suicide prevention efforts include limiting access to method of suicide such as firearms and poisons, treating mental illness and drug misuse, and improving economic conditions. Although crisis hotlines are common, there is little evidence for their effectiveness.[4]

Lastly, I thought it would be interesting to see how a piece of text degrades as you take more and more words out of it. Starting with this unaltered piece of text,each successive version increases step amount by one

1: Influenza spreads around the world in a yearly outbreak, resulting in about three to five million cases of severe illness and about 250,000 to 500,000 deaths.[1] In the Northern and Southern parts of the world outbreaks occur mainly in winter while in areas around the equator outbreaks may occur at any time of the year.[1] Death occurs mostly in the young, the old and those with other health problems.[1] Larger outbreaks known as pandemics are less frequent.[4] In the 20th century three influenza pandemics occurred: Spanish influenza in 1918, Asian influenza in 1958, and Hong Kong influenza in 1968, each resulting in more than a million deaths.[9] The World Health Organization declared an outbreak of a new type of influenza A/H1N1 to be a pandemic in June 2009.[10] Influenza may also affect other animals, including pigs, horses and birds.[11]

2: Influenza around world a outbreak, in three five cases severe and 250,000 500,000 In Northern Southern of world occur in while areas the outbreaks occur any of year.[1] occurs in young, old those other problems.[1] outbreaks as are frequent.[4] the century influenza occurred: influenza 1918, influenza 1958, Hong influenza 1968, resulting more a deaths.[9] World Organization an of new of A/H1N1 be pandemic June Influenza also other including horses birds.[11]

3: Influenza the a resulting three million severe about 500,000 the Southern the occur winter areas equator occur time year.[1] mostly young, and other Larger as less the three occurred: in influenza and influenza each more million World declared of type A/H1N1 a June may other pigs, birds.[11]

4: Influenza world outbreak, three cases and 500,000 Northern of occur while the occur of occurs young, those problems.[1] as frequent.[4] century occurred: 1918, 1958, influenza resulting a World an new A/H1N1 pandemic Influenza other horses

5: Influenza in in million and deaths.[1] Southern outbreaks while equator any Death young, with outbreaks less century Spanish influenza Kong resulting million Organization a A/H1N1 in also pigs,

6: Influenza a three severe 500,000 Southern occur areas occur year.[1] young, other as the occurred: influenza influenza more World of A/H1N1 June other birds.[11]

7: Influenza yearly five about Northern outbreaks areas at occurs and outbreaks In occurred: in 1968, million an influenza June animals,

8: Influenza outbreak, cases 500,000 of while occur occurs those as century 1918, influenza a an A/H1N1 Influenza horses

9: Influenza resulting severe the occur equator year.[1] and as three influenza each World type June pigs,

10: Influenza in and Southern while any young, outbreaks century influenza resulting Organization A/H1N1 also

11: Influenza about 250,000 the the Death other In 1918, each Organization to other

12: Influenza three 500,000 occur occur young, as occurred: influenza World A/H1N1 other

13: Influenza to In winter of with the in a type also

14: Influenza five Northern areas occurs outbreaks occurred: 1968, an June

15: Influenza million Southern equator young, less influenza million A/H1N1 pigs,

20: Influenza and while young, century resulting A/H1N1

25: Influenza deaths.[1] any less resulting in

30: Influenza Southern young, influenza A/H1N1

35: Influenza outbreaks outbreaks million

Code for tr.py

import sys

# I want to imitate tr
# ex. tr 'abcd' '1234' file - changes all a's in file to 1, all b's to 2, all c's to 3 and all d's to 4

# So, I need 3 inputs, so use sys.argv 
if len(sys.argv) < 4:
    raise ValueError('number of cli arguments does not match. Must pass: instream outStream inFile outFile')

# Get all cli inputs - these three are MANDATORY params
inString = sys.argv[1]
outString = sys.argv[2]
inFile = sys.argv[3]

# Read infile, copy each line to lines
with open(inFile, 'r') as i:
    lines = i.readlines()

# map input stream to output stream
# If outString has fewer characters than inString, extend the last char until lengths match
if len(outString) < len(inString):
    lastChar = outString[-1]
    for i in range(len(outString), len(inString)):
        outString = outString + lastChar

# Iterate through all lines in input text, replacing characters with appropriate counterpart
newLines = []
for line in lines:
    # Keep track of number of loops
    currentIndex = 0

    # Iterate through all characters in this particular line
    for char in line:

        # If the character matches something from our filter string
        if char in inString:

            # Get the index where this character exists in our filter string
            idx = inString.index(char)

            # Replace character and assign to a new var
            # Strings are IMMUTABLE so I have to always create a new string
            line = line[:currentIndex] + outString[idx] + line[currentIndex+1:] 

        # up the index of the current character
        currentIndex += 1

    # Append newly created lines to a new array
    newLines.append(line)


# Optional CLI input: file to write to
if len(sys.argv) == 5:
    outFile = sys.argv[4]     # If I want to have python write to a file, pass this last arg
    with open(outFile, 'w') as o:
        for line in newLines:
            o.write(line)
else:
    for line in newLines:
        line = line.strip()
        print line

Leave a Reply

Your email address will not be published. Required fields are marked *