import { useState } from 'react';
import './Article.css'
import MakeYourOwn from './components/MakeYourOwn/MakeYourOwn';
import WordCloud from './components/WordCloud';
import WordLines from './components/WordLines';

const Article = () => {

  const Heading = props => (
    <div className='heading' {...props}>
      {props.children}
    </div>
  )

  const Content = ({ as='p', ...props }) => {
    let As = as
    return (
      <As className='content' {...props}>
        {props.children}
      </As>
    )
  }

  const Quote = ({children}) => `‘‘${children}’’`

  const Link = ({ href, children }) => (href !== undefined ? <a className='dark-link' href={href} target='_blank'>{children}</a> : children)

  const AboutData = ({children, startsOpen=false}) => {
    const [isOpen, setOpen] = useState(startsOpen)
    return (
      <>
        <Content as='div' style={{ marginBottom: 5, userSelect: 'none' }}>
          <div
            className='dark-link'
            onClick={() => setOpen(!isOpen)}
            style={{ textAlign: 'center' }}
          >About the data in this visual</div>
        </Content>
        <Content as='div' style={{ fontSize: '20px', color: '#e2e5ff', transitionDuration: '0.5s', opacity: isOpen ? 1 : 0 }}>
          {isOpen && children}
        </Content>
        <br />
      </>
    )
  }

  return (
    <div id='app'>
      <div id='page'>
        <div id='title'>
          Twitter and the 2020 Election
        </div>
        <div id='subtitle'>
          Visualizing Trump and Biden's language on Twitter in the year leading up to the 2020 presidential election
        </div>
        <div id='author'>
          Roy Nehoran · November 20, 2020
        </div>
        <br />
        <br />
        <Content>
          In recent years, social media has been at the forefront of the political conversation. Given the prevalence of mail-in ballots during the pandemic, Trump and Biden's messages had to reach voters' homes, and social media was an invaluable tool to accomplish this.
        </Content>
        <Content>
          Among the popular social media platforms, Twitter stood out during the 2016 election and throughout Trump's time in office. A single tweet could <Link href='https://www.cnn.com/2013/04/23/tech/social-media/tweet-ripple-effect/index.html'>cause a stock market plunge</Link>, or <Link href='https://www.bbc.com/news/world-asia-china-38167022'>jeopardize relations with countries key to US foriegn relations</Link>. In the 2020 election, both Biden and Trump vigorously campaigned on Twitter, each with an average of over 17 tweets a day, and up to 85 tweets the day before the election. Even though candidates don't typically run their own social media accounts, their Twitter activity can represent the main concerns and messages to their core online audience. By looking into what they posted on Twitter leading up to the 2020 election, we might be able to pull some interesting insights about the differences between the campaigns, and their changing focus during that time.
        </Content>
        <Content as='div'>
          To tackle this, I analyzed two datasets of tweets, filtered to the year leading up to the election (beginning of November 2019 to the end of October 2020):
          <ul>
            <li><Link href='https://www.kaggle.com/rohanrao/joe-biden-tweets'>tweets posted by @JoeBiden</Link>, Joe Biden's official Twitter account, and</li>
            <li><Link href='https://www.thetrumparchive.com/'>tweets posted by @realDonaldTrump</Link>, Donald Trump's official Twitter account</li>
          </ul>
        </Content>
        <Heading>Usage of words over time</Heading>
        <Content>
          First, let's break up all of the tweets into individual words and map out the top 100 used by Biden and Trump, combined, over each month in the past year. From this, we might be able to get a sense of what candidates were concerned with and when, including each other. Try clicking on the words on the right or searching for your own. You can also focus in on a month by clicking on it in the plot.
        </Content>
        <WordLines chartHeight={650} />
        <AboutData>
          The words in this chart are gleaned from tweets by removing all punctuation, lowercasing, and removing stopwords (defined below).
          <div style={{ height: 15 }} />
          The y-axis of the plot is the count of each word from both Biden and Trump's Twitter accounts, divided by the total number of tweets by both for each month, multiplied by 1000, between November 2019 and October 2020. In other words, this is the number of mentions per 1,000 tweets. This is to remove bias caused by months with more tweets (i.e. the month before the election).
          <div style={{height: 15}}/>
          Words in the word list on the right are shown in order of raw number of mentions, along with that number beside them. Words are sorted by this value, and only the top 100 are shown or are searchable. If a month is selected, the top 100 updated to that month, and the number beside each word is the number of mentions for that month, and their rank and order are for that month. Words are outlined by the color of the candidate that used that word more, by raw count (red for Trump, blue for Biden). Once a word is clicked from the word list, the number of <i>overall</i> mentions split by candidate is shown, whether or not a month is selected.
          <div style={{ height: 15 }} />
          <b>Stopwords:</b> This is my logic for defining stopwords. I edited them over time to make the visualizations as informative and meaningful as possible.
          <ul>
            <li>Removed regardless: 'this', 'will', 'that', 'have', 'they', 'with', 'their', 'from', 'were', 'should', 'than', 'your', 'just', 'what', 'doing', 'when', 'many', 'been', 'there', 'those', 'these', 'going', 'said'</li>
            <li>Kept regardless (not stopwords): 'our', 'you', 'aoc', 'now', 'new', 'big', 'day', 'joe', 'job', 'win', 'i', 'my', 'me', 'im', 'god'</li>
            <li>Otherwise, remove if 3 or fewer characters long</li>
          </ul>
        </AboutData>
        <Content>
          A couple of interesting months to look into: in March 2020, when COVID-19 began to spread in the US, and in June 2020, when Biden won the Democratic primary. Among the 100 top words tweeted by Trump and Biden in the past year, you might have already noticed that <Quote>covid</Quote> barely makes the list. If you search and click on it, you'll see that it was mentioned a lot in April, followed by a slow decline, even as cases rose and the pandemic worsened.
        </Content>
        <Heading>Trump vs. Biden in words</Heading>
        <Content>
          Now that we know what both candidates tweeted about, let's compare the words used by Trump and Biden. From this, we might be able to see how their messages differed. Below, you'll see a word cloud of the few hundred most used words each month, sized by how often they were used by both candidates. In white are words used about in similar frequency by both candidates, in blue are words used by Biden more frequently, and red are words used by Trump more frequently. Press play or click through to see the words and corresponding tweets from that month.
        </Content>
        <WordCloud chartHeight={650} />
        <AboutData>
          The visualization is shown with monthly granularity. When played, it advances to the next month every 10 seconds. The data discussed below is monthly. All tweets by each candidate are shown on either side for that month.
          <div style={{ height: 15 }} />
          Words are sourced in the same way as the previous visualization. Stopwords (defined there) are removed after removing punctuation and lowercasing. If you a click word in the word cloud, the tweets on either side will be filtered to include that word, after lowercasing and removing punctuation from the tweets.
          <div style={{ height: 15 }} />
          Frequency of each word calculated as follows, by candidate (monthly):
          <ul>
            <li>Count the total number of uses of that word (multiple times per tweet does count)</li>
            <li>Divide by the number of tweets (by that candidate on selected month). This is to make up for the candidates tweeting different amounts each month.</li>
          </ul>
          Words are then separated into 5 equal quantiles based on the difference between the candidates' frequencies. This means that words that are white have similar frequencies from both candidates, but not necessarily the used the same number of times, if one candidate tweets a lot more that month. Also, the quantitative center (equal frequency of both candidates), while it tends to fall in the white region, does not necessarily end up there (since the split is so that there is an equal number of words in each quantile). This isn't ideal, but can be fixed for future versions.
          <div style={{ height: 15 }} />
          Sizes of words in the word cloud are the log (base 10) of the frequency of both candidates added together, on a separate scale for each quantile. This means that, for instance, Biden using a word that Trump uses a lot more than him also contributes to it being bigger, even though it is dark red. Words sizes can't be compared across different quantiles, but it is more aesthetic this way.
          <div style={{ height: 15 }} />
          The number words shown for each quantile depends on how many words fit using the <Link href='https://github.com/jasondavies/d3-cloud'>d3-cloud</Link> layout algorithm. The top sized words (combined frequency) are always shown, but the number for each can be anywhere up to 100 words, but tends to be around 40-50.
        </AboutData>
        <Content>
          Trump and Biden seem to use noticeably different types of language to engage their audiences. They mention each other a lot some months, and Biden mentions <Quote>covid</Quote> far more frequently than Trump in October, the last month before the election.
        </Content>
        <Heading>Word sequence of campaign tweets</Heading>
        <Content>
          Now, for a more interactive part: what is it like to write a Trump or Biden tweet? Choose your first word, and then the tool will show you the words most commonly used by that candidate to follow. Think of it like if you were using the word suggestions on their phone. Switch between the candidates at any point in your tweet to get a sense of how they follow up the same words differently (you won't lose your progress). When you're ready, click the bottom right or &lt;end&gt;, and you'll get to see how close you were to their real tweets, as well as an estimation of how many likes and retweets the post would get if it were tweeted from their accounts. Try and explore different topics that their tweets cover, and see if you can gain any interesting insights.
        </Content>
        <MakeYourOwn chartHeight={650} />
        <AboutData>
          Suggested words at each step are generated from a simple bigram (a more complex language model could be used in the future for better suggestions). Sizes of suggested words (including the &lt;end&gt; keyword to indicate the end of a tweet) are a log scale of number of times they appeared after the previous word (or as first in the tweet). Tweet data is from November 2019 to October 2020.
          <div style={{ height: 15 }} />
          Similarity to real tweets is calculated as follows: (both as a tweet is constructed and once complete)
          <ul>
            <li>Jaccard similarity is calculated between the user-constructed tweet and all tweets of the selected presidential candidate</li>
            <li>Return the maximum Jaccard similarity (i.e. similarity of most similar tweet). There are better NLP metrics for this, but so far this seeems to be decently intuitive, and works well in the edge cases.</li>
          </ul>
          Document similarity when a tweet is complete uses Jaccard similarity as mentioned above. The top 30 most similar tweets of each candidate are then displayed in order.
          <div style={{ height: 15 }} />
          Projected likes and retweets are calculated as a weighted average of that of the top 30 tweets displayed, weighted by the document similarity metric, and then multiplied by the overall similarity to real tweets.
          <div style={{ height: 15 }} />
          To maximize your likes and retweets (I knew you'd ask), you want to construct a tweet with the highest overall similarity metric (i.e. closest 100% simlar to at least one tweet), but where the 30 next most similar tweets are also highly liked and retweeted. Good luck!
        </AboutData>
        <Content>
          Maybe this helps us see a bit into the minds of Biden and Trump as they write their tweets (or their trusted social media managers, anyway). Even when they cover similar topics, they convey it quite differently. Interestingly, the most similar-sounding rhetoric between them might be when they discuss each other (which they seem to do a lot). Twitter's 140-character limit is likely a factor here, but neither seems to hold back in directly criticing or insulting the other.
        </Content>
        <br />
        <Content>
          We already know what a Trump presidency is like, but maybe through the language used on these two Twitter accounts, these visualizations can help demonstrate what the next four years might look in comparison.
        </Content>
        <br />
        <Heading>References</Heading>
        <Content as='div'>
          <ul>
            <li>I used <Link href='https://github.com/jasondavies/d3-cloud'>d3-cloud</Link>, a layout library, to create the word clouds</li>
            <li>I built off of the word cloud examples at <Link href='https://www.d3-graph-gallery.com/wordcloud'>Wordcloud | The D3 Graph Gallery</Link></li>
            <li>I learned how to create complex multi-line charts from <Link href='https://observablehq.com/@d3/multi-line-chart'>D3's Observable Documentation</Link></li>
            <li>For the meat of the charts, I used <Link href='https://d3js.org/'>D3</Link>. For state management and component structure, I used <Link href='https://reactjs.org/'>React</Link>.</li>
            <li>I did most of the data processing ahead of time in Python, and the rest of the state-dependent data logic is in JavaScript and JSX.</li>
            <li>I used the <Link href='https://material.io/resources/icons/'>Material Design Icons</Link> icon set</li>
            <li>Trump tweet data originates from <Link href='https://twitter.com/realDonaldTrump'>@realDonaldTrump</Link> Twitter account and the corresponding <Link href='https://www.thetrumparchive.com/'>dataset</Link></li>
            <li>Biden tweet data originates from <Link href='https://twitter.com/JoeBiden/'>@JoeBiden</Link> Twitter account and the corresponding <Link href='https://www.kaggle.com/rohanrao/joe-biden-tweets'>dataset</Link></li>
            <li>Tweet design based on <Link href='https://developer.twitter.com/en/docs/twitter-for-websites/embedded-tweets/overview'>Twitter's embedded tweets</Link></li>
          </ul>
        </Content>
      </div>
    </div>
  )
}

export default Article
