Trigger warning Introduction Getting and cleaning data Trends in violence by region Is South America more dangerous for transgender people, or just for all people? Country level analysis Proportion of murder victims that are transgender Number of transgender victims by age Conclusions Trigger warning This is an exploratory data analysis of murders of transgender people. The data contains graphic descriptions of violence against transgender people.
2018 was a crazy year for me. A move, a new job, new career path, and many more ups and downs. And through all of this, was the soundtrack to my life: audiobooks. I listened to over 50 books this year, and the good news is most were excellent! So without further ado, here’s my favorite books that I read (listened to) in 2018. Note that these are not the best books released in 2018, just whichever books I read this year that I loved.
Introduction Getting data with rgbif Data cleaning Data wrangling Make the animation Another example with Kudzu Introduction Since I discovered GBIF, I’ve been hooked. What is GBIF? From their website: “GBIF—the Global Biodiversity Information Facility—is an international network and research infrastructure funded by the world’s governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth.” In 2018, GBIF passed the mark of one billion occurence records, which is just incredible.
Introduction Trials and tribulations The solution Introduction Drama, intrigue, arrogance, dashed hopes, rock-bottom, perseverance, and eventual triumph, this post has it all! It starts with me watching Rachael Tatman’s recent live-coding video, and ends with a thrilling race-to-the-bottom between two pathetically slow functions. What lies ahead: many a WTF moment, lots of trial and error, and some useful tidyverse data wrangling tips.
Rachael Tatman is a data scientist at Kaggle, and does these awesome live coding sessions every Friday.
Disclamer: I’m a trained microbiologist/biochemist, which means most of my bioinformatics knowledge was self-taught. What you’re about to see may not be pretty; the code might be janky or the workflow inefficient. But I have gone through countless hours of googleing, reading, and trial/error to learn this, and it works pretty well for me, so it might for you too. Let me know if you spot errors or have suggestions for improvement!
The more I use the tidyverse in my R coding, the more I ask myself: does Hadley Wickham hate dogs, or does he just need help with dog-related package names? See, of the packages Hadley has developed for the tidyverse, there are two that have cat-inspired names (forcats and purrr) but zero that pay homage to man’s best friend. It’s not like doggo names are hard to think of for R packages it took me 30 seconds to come up with baRk and woofR**.
I’ve been going through the job application cycle recently, which meant updating my CV. You can write a CV with Microsoft Word, but I find it exceptionally frustrating to do any sort of fancy formatting in Word, and more imporantly, I want my CV to be a page on my website (not just a downloadable file), that has the responsiveness expected of any modern webpage. I found this excellent HTML/CSS template from Thomas Hardy, and decided it was the aesthetic I was going for.
This week I’ve been ploughing through final figure revisions for a big paper that’s been a couple years in the making 👏👏👏. Everything was going (relatively) smooth until I got to a tree I was trying to plot with associated barcharts. The idea was to summarize some data on major clades in this tree by putting barcharts of summary statistics aligned with each major clade to the side of the tree.
Recently, as part of my work characterizing plant cell walls, I needed to express a few proteins that would serve as molecular probes. I read a couple of papers and the boss man gave me the green light. The first step is to find the protein sequence and have the gene synthesized so that we can transform it into some E. coli and start expression tests. So to order the synthesized gene I went to the paper that described the protein, scoured the methods, and found that–as expected–they didn’t give the sequence, they simply referred to a previous paper, which referred to a previous paper, and after going down the rabbit hole I finally found the original reference.
1 Introduction 1.1 Essential Tools and Basic Knowledge 1.1.1 Fasta Format 1.1.2 Notepad++ 1.1.3 BioEdit 1.1.4 Unix Terminal 1.1.5 R 1.2 Getting Data 1.2.1 Keyword Search 1.2.2 BLAST 1.3 Cleaning Data 1.3.1 Concatenate Sequences 1.3.2 Remove Duplicate Sequences 1.3.3 Remove False Hits 1.3.4 Trim Extra Domains 1.3.5 Clean up Names 1.4 Conclusions Disclamer: I’m a trained microbiologist/biochemist, which means most of my bioinformatics knowledge was self-taught.