For this assignment, I tried a few different things. Firstly, just to get comfortable with python, I spent time essentially recreating a Unix command, tr, just to gain some familiarity with Python syntax and structure. It took about an hour, but eventually I was able to get a utility that perfomed roughly the same function as tr – which is replace characters with other characters. Entire code for this can be seen athe bottom of this post, but the important aspects I learned from it were reading arguments from sys.argv, general loop structure, reading and writing files without redirecting input using ‘<‘, and basic string functions. The result is that you could run:
python tr.py 'pmg' '123' fruit.txt
where fruit.txt is a list of fruit:
apple pear mango
and the output would be
a11le 1ear 2an3o
The Actual Assignment
Mimicking a Unix command wasn’t the assignment, but it was a good exercise for me. For the actual assignment, I went a bit morbid and collected the opening paragraphs from the wikipedia pages for the 10 leading causes of death in the U.S. I then piped the text into a script that jumps over words (the number of which is settable at runtime).
Here’s the code – it’s pretty straightforward. For each line, it loops through all the words, and if the word is at an index that is evenly divisible by our input parameter, we keep that word, otherwise we ignore it. Once it’s all done, we join all the saved words into a string and print it out.
# empty array to hold all new lines newLines =  for line in sys.stdin: # Break this line into array based on spaces lineArr = line.split(' ') # Array to hold this line's words newLineArr =  # Iteration counter wordIndex = 0 for word in lineArr: # If the index is divisible by our parameter, add the word if wordIndex % skipValue == 0: newLineArr.append(lineArr[wordIndex]) wordIndex += 1 # Append the entire, altered line to our container array newLines.append(newLineArr) for line in newLines: # Join the array items into a string line = ' '.join(line) # Print to stdout - making it pipeable to other unix commands print line
Now for the output. After running all the text I pulled from Wikipedia through this little script, I found a few that I liked the most.
The entry on Alzheimers seemed exceedingly appropriate due to the fragmented nature of the output from this script.
Some choice output:
Alzheimers – Every fourth word
No reverse some symptoms. rely assistance, burden the social, economic are to living improve behavioural due antipsychotics not to little increased death.
No treatments stop or reverse its progression, though some may temporarily improve symptoms. Affected people increasingly rely on others for assistance, often placing a burden on the caregiver; the pressures can include social, psychological, physical, and economic elements. Exercise programmes are beneficial with respect to activities of daily living and can potentially improve outcomes. Treatment of behavioural problems or psychosis due to dementia with antipsychotics is common but not usually recommended due to there often being little benefit and an increased risk of early death.
Cancer – Every third word
Tobacco the about cancer 10% to poor of and alcohol. include exposure radiation, pollutants. developing 20% are infections hepatitis C, papilloma These at by genes cell. such are cancer 5–10% are genetic from parents. be certain symptoms tests. then investigated imaging by
Tobacco use is the cause of about 22% of cancer deaths. Another 10% is due to obesity, a poor diet, lack of physical activity, and consumption of alcohol. Other factors include certain infections, exposure to ionizing radiation, and environmental pollutants. In the developing world nearly 20% of cancers are due to infections such as hepatitis B, hepatitis C, and human papilloma virus (HPV). These factors act, at least partly, by changing the genes of a cell. Typically many such genetic changes are required before cancer develops. Approximately 5–10% of cancers are due to genetic defects inherited from a person's parents. Cancer can be detected by certain signs and symptoms or screening tests. It is then typically further investigated by medical imaging and confirmed by biopsy.
Suicide – Every fourth word
Views been existential religion, meaning Abrahamic suicide God belief of samurai seppuku a for a Sati, by expected to her either pressure and attempted illegal, in It offense In 21st the has rare medium kamikaze have a tactic. from sui oneself".
Views on suicide have been influenced by broad existential themes such as religion, honor, and the meaning of life. The Abrahamic religions traditionally consider suicide an offense towards God due to the belief in the sanctity of life. During the samurai era in Japan, seppuku was respected as a means of atonement for failure or as a form of protest. Sati, a practice outlawed by the British Raj, expected the Indian widow to immolate herself on her husband's funeral pyre, either willingly or under pressure from the family and society. Suicide and attempted suicide, while previously illegal, are no longer in most Western countries. It remains a criminal offense in many countries. In the 20th and 21st centuries, suicide in the form of self-immolation has been used on rare occasions as a medium of protest, and kamikaze and suicide bombings have been used as a military or terrorist tactic. The word is from Latin suicidium, from sui caedere, "to kill oneself".
Suicide – Every fifth word
Suicide intentionally Suicide as the frequently disorder disorder, alcoholism, well as interpersonal prevention to as mental and crisis is effectiveness.
Suicide is the act of intentionally causing one's own death. Suicide is often carried out as a result of despair, the cause of which is frequently attributed to a mental disorder such as depression, bipolar disorder, schizophrenia, borderline personality disorder, alcoholism, or drug abuse, as well as stress factors such as financial difficulties, troubles with interpersonal relationships, and bullying. Suicide prevention efforts include limiting access to method of suicide such as firearms and poisons, treating mental illness and drug misuse, and improving economic conditions. Although crisis hotlines are common, there is little evidence for their effectiveness.
Lastly, I thought it would be interesting to see how a piece of text degrades as you take more and more words out of it. Starting with this unaltered piece of text,each successive version increases step amount by one
1: Influenza spreads around the world in a yearly outbreak, resulting in about three to five million cases of severe illness and about 250,000 to 500,000 deaths. In the Northern and Southern parts of the world outbreaks occur mainly in winter while in areas around the equator outbreaks may occur at any time of the year. Death occurs mostly in the young, the old and those with other health problems. Larger outbreaks known as pandemics are less frequent. In the 20th century three influenza pandemics occurred: Spanish influenza in 1918, Asian influenza in 1958, and Hong Kong influenza in 1968, each resulting in more than a million deaths. The World Health Organization declared an outbreak of a new type of influenza A/H1N1 to be a pandemic in June 2009. Influenza may also affect other animals, including pigs, horses and birds.
2: Influenza around world a outbreak, in three five cases severe and 250,000 500,000 In Northern Southern of world occur in while areas the outbreaks occur any of year. occurs in young, old those other problems. outbreaks as are frequent. the century influenza occurred: influenza 1918, influenza 1958, Hong influenza 1968, resulting more a deaths. World Organization an of new of A/H1N1 be pandemic June Influenza also other including horses birds.
3: Influenza the a resulting three million severe about 500,000 the Southern the occur winter areas equator occur time year. mostly young, and other Larger as less the three occurred: in influenza and influenza each more million World declared of type A/H1N1 a June may other pigs, birds.
4: Influenza world outbreak, three cases and 500,000 Northern of occur while the occur of occurs young, those problems. as frequent. century occurred: 1918, 1958, influenza resulting a World an new A/H1N1 pandemic Influenza other horses
5: Influenza in in million and deaths. Southern outbreaks while equator any Death young, with outbreaks less century Spanish influenza Kong resulting million Organization a A/H1N1 in also pigs,
6: Influenza a three severe 500,000 Southern occur areas occur year. young, other as the occurred: influenza influenza more World of A/H1N1 June other birds.
7: Influenza yearly five about Northern outbreaks areas at occurs and outbreaks In occurred: in 1968, million an influenza June animals,
8: Influenza outbreak, cases 500,000 of while occur occurs those as century 1918, influenza a an A/H1N1 Influenza horses
9: Influenza resulting severe the occur equator year. and as three influenza each World type June pigs,
10: Influenza in and Southern while any young, outbreaks century influenza resulting Organization A/H1N1 also
11: Influenza about 250,000 the the Death other In 1918, each Organization to other
12: Influenza three 500,000 occur occur young, as occurred: influenza World A/H1N1 other
13: Influenza to In winter of with the in a type also
14: Influenza five Northern areas occurs outbreaks occurred: 1968, an June
15: Influenza million Southern equator young, less influenza million A/H1N1 pigs,
20: Influenza and while young, century resulting A/H1N1
25: Influenza deaths. any less resulting in
30: Influenza Southern young, influenza A/H1N1
35: Influenza outbreaks outbreaks million
Code for tr.py
import sys # I want to imitate tr # ex. tr 'abcd' '1234' file - changes all a's in file to 1, all b's to 2, all c's to 3 and all d's to 4 # So, I need 3 inputs, so use sys.argv if len(sys.argv) < 4: raise ValueError('number of cli arguments does not match. Must pass: instream outStream inFile outFile') # Get all cli inputs - these three are MANDATORY params inString = sys.argv outString = sys.argv inFile = sys.argv # Read infile, copy each line to lines with open(inFile, 'r') as i: lines = i.readlines() # map input stream to output stream # If outString has fewer characters than inString, extend the last char until lengths match if len(outString) < len(inString): lastChar = outString[-1] for i in range(len(outString), len(inString)): outString = outString + lastChar # Iterate through all lines in input text, replacing characters with appropriate counterpart newLines =  for line in lines: # Keep track of number of loops currentIndex = 0 # Iterate through all characters in this particular line for char in line: # If the character matches something from our filter string if char in inString: # Get the index where this character exists in our filter string idx = inString.index(char) # Replace character and assign to a new var # Strings are IMMUTABLE so I have to always create a new string line = line[:currentIndex] + outString[idx] + line[currentIndex+1:] # up the index of the current character currentIndex += 1 # Append newly created lines to a new array newLines.append(line) # Optional CLI input: file to write to if len(sys.argv) == 5: outFile = sys.argv # If I want to have python write to a file, pass this last arg with open(outFile, 'w') as o: for line in newLines: o.write(line) else: for line in newLines: line = line.strip() print line