To Do List: News-scrapper


All

Priority Project #
20 News-scrapper #: 2
20.01

Update scrapers which have broken over time

WorkingNot working
  • CNBC
  • Teletrader
  • FT Markets
  • Economist
  • FIA
  • Matt Levine
  • ISDA
  • BIS
  • Clarus
  • Economist Expresso
  • FirstFT, FT Main
  • Reuters
  • Economist Special Reports
  • Risk
  • EACH
  • FN
  • Global Custodian
  • Securities Finance
Complete
20.04

 New sites to scrape

- Complete
20.06

User-specific summary email

  • Once the broken scrapers are fixed, the main ‘ask’ is to make it “user-specific” (ie each user can define their own list of stories and categories to include in the summary email). So the following need to be linked to the logged user:
    • Headline categories (eg Macro, CV19, Corp, Cryp, Pol, Ukraine, Clearing, x)
    • What to include/exclude for the email
    • What economic stats are key to them
    • Their own list of favourites and commentaries
  • At the moment each of these fields are attached to the news article so we need to create a User-News field relationship to achieve this.  Does that make sense to you?
Complete
20.31

Other

  • Make the Economic Stats scrape time 1am and 8.30am
  • Cron does not include FirstFT
  • Add a button to test that the FT.com login is working
  • Full content button (FT) only shows in first block (unassigned)
  • Upon user login, run a check on the user's logins and passwords and determine access accordingly
- Complete
20.02

News Lists

  • Issue is that the buttons trigger a refresh and slow.
  • Can one overcome this with a javascrpt flag and a press to do all button at the end

Users

  • Centrally scrape the content of the articles.  Check once the login of the user at their login stage, and track if good.  Then show in the title page if the login is confirmed as valid
- Pending
20.02

Favicon

  •  Not working, both in bookmark and in tab, but only when on the home page… Odd
- Complete
20.03

Login

  • Add ability to register
Complete
20.05

Formatting

  • Login button
  • Subscription page to show when not logged in
  • Ability to look at each headline
  • User Edit page -
    • Show categories
    • Button to delete photo
    • Show photo
    • Hide Status from user
  • New stories
    • No like/read buttons
- Complete
20.12

Economic stats

  • Highlight and suppress buttons not working
  • Button to use the Teletrader defaults
  • Button to remove your highlight or take standard
  • Or if blank use the default - over-ride
- Complete
20.07

Settings

  • Add VCard code (Jeroen)
Complete
20.08
  • Review of security (Dashboard -changed to ROLE_USER)
- Complete
20.10

Read count

  • The new count needs to be user specific. 
  • So this is the number of unread articles in the past 24 hours by user.   Service by user and source
  • The read all button should not affect archives
    • Perhaps in archive you can have unread shown separately. 
- Complete
20.09

Removed the CMS and Settings entities

Why isn't LinkedIn password appearing on the list in a User's profile?

- Complete
20.11

Pricing

The functions of the website are

  • LinkedIn scrape
  • News:
    • Single place to read articles
      • Summary access only
      • Hover for full article
    • Abilty to mark articles as read, to avoid re-reading
    • Abiliy to select key articles and send summary email 
    • Mark economic stats as favourite - to generate an email
    • See what others are liking - are you missing an important well read article?

 

  • Summary Read-only  
    • Full articles available via a link 
  • One-stop Read access
    • However
      •  
- Complete
20.32

Economic Market Statistics

  • Chron job.  Refresh every 15mins
    • Button to refresh manually if >10mins 
  • Historical view by stat
Complete
20.19

LinkedIn:

  • Popup that checks that the LinkedIn login and password are successful
  • The first result should return the number of connections and estimate the time to download them, before proceeding.
    • Advise the user how long it will take to download and that a file will be emailed to them 
  • Email csv file - one-step directly after the scrape -  ie save a file in the database at the end of the scrap and email (ie merge the 3 buttons we have)
- Complete
20.34

Memberships

  • Create a 30-day free trial period
    • Unable to extend
    • Link the systems login to LinkedIn login and other site logins to avoid gaming
    • Buttons to upgrade
Complete
20.14

Favicon

- Complete
20.13

Market Data

  • Future T+2 data….  
- Complete
20.15

Subscription page/Memberships

  • Check whether a user has a membership upon opening the Subscription page
    • If not, create a New membership, initially a Free membership
    • The new Membership button shouldn't open the form, but just save the details as per the button
      • Include Today's date 
      • The new membership button shows all users to Non-Admin users
Complete
20.01

Scrape content

  • Check that the scraping of all the news sites works
  • Also ensure that the content and not just the headline are scraped saved in the ‘fullContent’ field in the news entity.
  • In the view, hovering over the title will display the full content
  • For websites where the content is behind a firewall, then login and scrape where available.
- Pending
20.03

Set up a Payment gateway- Stripe? 

- Pending
20.16

CompanyDetails

  • The HideOther inputs not coming through into Live
- Complete
20.22

LinkedIN scrape (https://www.linkedin.com/in/stephen-j-nurse/)

  • Include the summary experience and the employment history
Complete
20.26

Bugs

  • Market Economic Stats scraper doesn't work in live.  Not just United States (filters).  Aman is re-writing the scrape 
  • Chron job for  MarketStats doesn't work 
Complete
20.17

Bugs

  • subscriptions_buttons line 45
    • Make this dynamic?
  • source\index line 43
    • Make this dynamic?
  • Favicon issue on live (when favcicon file exists)
Complete
20.18

Bugs

  • Merge the ‘Highlight/Standard/Low’ buttons into a pop-up.  To avoid confusion. 
  • Set User bug in Market Stats
- Complete
20.20

User

Add time zones

- Complete
20.21

LinkedIn contact export

  • Develop a view of the contacts that can be expanded/contracted [like Excel]
  • Import photo and save file
  • Export file - two types: Full CSV and Outlook
    • Export to directory per user
    • Develop a concatenated Notes for the Outlook export
  • Control the number of exports (2 now)
  • Don't show the flashing screen.  Warn on timings
  • Automatically email file
Complete
20.23

LinkedIn Contacts

  • Deleting LinkedIn Contacts -just the owner's contact and the languages spoken
- Complete
20.28

Users

  • Delete doesn't work
  • Include in user view.
    • Membership
- Complete
20.24

LinkedIn Contacts

  • Don't save a Language Spoken for the null case
- Complete
20.25

Linkedin contacts 

  • Create a User-Settings upon scrape
- Complete
20.04

User Password checks

  • Need to develop an algorithm to check the passwords upon login
- Pending
20.27

User Passwords

  • List the users in alphabetical order
  • When you edit a User Password it changes the User name
Complete
20.05

Deleting LinkedIn Users

  • Delete MY Linkedin contacts should not delete them but mark them as hidden
  • How to handle when we have a contact shared with multiple "owners"?
Pending
20.30

Market stats

  • Scrape doesn't work in live
- Complete
20.29

Security

  • Add role heirachy in security.yaml
Complete
20.33

User memberships

  • Dates
Complete
20.06

LinkedIn export

  • Notes need to include the Employment and Education details
  • Searchable text for mission statement
  • Export function seems to be failing
Pending

Loading…
Loading the web debug toolbar…
Attempt #