Posts

Speaking like a president

Natural language processing on the first 2016 presidential debate

The first debate in the 2016 presidential race was held on September 26. It’s no secret that Clinton and Trump are running on drastically different platforms, but how do they compare when it comes to their speech patterns and word choice? To quantify this, I dug into the data, using the debate transcript and natural language processing. I measured the sentiment of Clinton’s and Trump’s responses, and examined how emotional their words were throughout the debate. I also looked at each candidate’s most commonly used adjectives. Building off the work of Alvin Chang at Vox, I was also able to examine how the speech patterns of Clinton and Trump each changed when directly responding to and when skirting the questions. Sentiment Using the Google Cloud Natural Language API, I measured the sentiment of each candidate’s answers. The polarity of a response is a measure of how positive or negative it is, and the magnitude indicates how much emotion the words convey. The chart below shows the polarity of each candidate’s responses, weighted by the magnitude. Trump and Clinton matched each other’s polarity for the first half of the debate, but after his defense of stop-and-frisk around 9:50 PM, Trump’s words became much more negative. Throughout the rest of the debate — during the questions on birtherism, cyber security, homegrown terrorism, nuclear weapons, and Clinton’s looks and stamina — Clinton became more positive and Trump more negative. The combination of polarity and magnitude gives us the best understanding of each line’s overall sentiment, and each candidate’s most positive and negative responses are posted here. Braggadocios, and other adjectives I was also interested in the adjectives each candidate used most frequently during the debate. Using syntax analysis to extract each word’s part of speech, I identified the most-used adjectives of each candidate. Answers vs non-answers As Chang found, the candidates spent a lot of time not answering Holt’s questions — 48% of Clinton’s words and a whopping 69% of Trump’s words were used in non-answers — and using the data Chang compiled, I was able to look at how the candidate’s speech patterns differed when answering and not answering the questions. Sentence subjects (“I alone can fix it”) Using part-of-speech tagging, I also identified the subjects of each candidate’s sentences. Clinton was more inclusive in her words, but only when directly responding to questions — using the plural “we” more frequently than the singular “I” — and the the opposite was true for her when avoiding a response. Trump, on the other hand, was always more likely to use “I” over “we”. Non-answer phrases The words each candidate used when directly answering the questions are all, unsurprisingly, highly related to the questions Holt asked. What’s interesting here are the topics the candidates defaulted to when avoiding a response. A handful of my findings didn’t make it into this post. If you’re interested in more, there’s some additional analysis, including multiple classification models, in the project’s GitHub repo. The text of this article (excluding this sentence) has polarity -0.4 and magnitude 15.5, so despite my best efforts it’s leaning slightly negative. Many thanks to Alvin Chang and Vox for their permission to use their annotated transcript, and to Kelsey Scherer for designing the charts and lead image. Analysis was performed in R. Plots were generated using ggplot2, and then styled by Scherer using Sketch. The sentiment scores, part of speech tags, and all of the other NLP datasets can be found in the GitHub repo. ...

Mapping the frozen yogurt shop closest to each Manhattan apartment

I love frozen yogurt. When I first moved to New York three years ago, I lived only 1/8th of a mile from the closest froyo shop. The convenience of this 4-minute walk is something I neither appreciated nor utilized enough at the time. After moving to Harlem last year, it’s been harder than ever to satisfy my near-constant craving for this cold candy soup — I’m now a 24-minute walk to the nearest frozen yogurt. As someone who loves data and has too much time to spare, I decided to find the locations in Manhattan with highest and lowest froyo densitiy. Inspired by Ben Wellington’s work on I Quant NY, I calculated the distance from every lot in Manhattan to the nearest froyo shop and mapped it out. https://brianweinstein.cartodb.com/viz/27dd05e0-2486-11e6-98ba-0e98b61680bf/embed_map The highest density of froyo is right around West 33rd St. and 8th Ave., with three shops within a 1-block radius. The lowest density is right in Harlem. The red circle on the map shows the location farthest from frozen yogurt. The record belongs to 700 Esplanade Gardens Plaza, a co-op right by the 145th St. stop on the 3-train, with a 51-minute trek across Manhattan to the Pinkberry by Columbia. The map shows all of the froyo shops in Manhattan, and you can click on any lot to find the distance to the closest shop. R code posted here. All distances in the map are measured using great-circle distance (i.e., ”as the crow flies”), according to the law of cosines. Frozen yogurt locations were found via the Google Places Nearby Search API. The API returned some non-froyo-exclusive shops like Ben and Jerry’s, which I kept in the dataset since they technically serve some frozen yogurt (although we all know these shops don’t really count). I only included froyo shops that were in Manhattan, so some lots may have a closer shop than the one listed if we include those in other boroughs. Manhattan lot locations are from PLUTO. The map was created using CartoDB. Tons of inspiration for this came from Ben Wellington’s work on I Quant NY. ...

Wave Equation

The wave equation is a partial differential equation that describes the propagation of various types of waves. The equation appears throughout many fields in physics, including acoustics, fluid dynamics, electromagnetism, and quantum mechanics. With some modifications, it can even describe the spread of traffic jams on busy highways! The one-dimensional equation was first discovered by d’Alembert in 1746 as he studied how vibrations propagated through a string, and the two- and three-dimensional equations were solved soon after by Euler during his study of acoustics. The simulations above show the propagation of a disturbance on a two-dimensional surface for two different sets of boundary conditions [1] [2]. Mathematica code posted here. ...

Platonic Solids

A Platonic solid is a polyhedron where (1) each face is the same regular polygon, and (2) each vertex joins the same number of faces. The Platonic solids are highly symmetrical, and, in three dimensions, only five such solids can exist: the tetrahedron, cube, octahedron, dodecahedron, and icosahedron. This was first proven in Euclid’s Elements around 300 B.C., and has since been more rigorously proven using the Euler characteristic. The proofs are relatively easy to follow, and if you’re interested you can check them out both here and here. Mathematica code: pSolids={"Tetrahedron","Cube","Octahedron","Dodecahedron","Icosahedron"} Manipulate[Graphics3D[ {Opacity[0.8],Rotate[PolyhedronData[pSolids[[n]],"Faces"],th,{0,0,1}], Opacity[0],Circumsphere[PolyhedronData[pSolids[[n]], "VertexCoordinates"][[1;;4]]]}, Boxed->False,SphericalRegion->True],{n,1,5,1},{th,0,2\[Pi]}] ...

Lonely Runner Conjecture

Imagine n runners on a circular track of length 1. The runners start from the same spot at the same time, and each has a distinct, constant speed. A runner is considered “lonely” whenever it is a distance of at least 1/n from every other runner. The Lonely Runner Conjecture (LRC) states that each runner will eventually, at some point in time, be lonely. Said differently, the LRC states that for each runner, the spacing around it will eventually be greater than or equal to the spacing it would experience if the all of the runners were equally distributed around the track. The conjecture has been proven to be true for 7 or fewer runners, but, interestingly enough, has never been proven to work for all cases of 8 or more runners. [In my 8-runner simulation above, I’ve only shown that it works for a specific set of runner speeds — I haven’t proven that it works for all sets of speeds.] In the GIFs above, an arc appears around a runner whenever the runner is lonely, and the color of a runner fades after it’s been lonely at least once. Mathematica code posted here. Additional sources not linked above: [1] [2] [3] ...