To Do List: News-scrapper


All

Priority Project #
18 News-scrapper #: 4
18.01

Update scrapers which have broken over time

WorkingNot working
  • CNBC
  • Teletrader
  • FT Markets
  • Economist
  • FIA
  • Matt Levine
  • ISDA
  • BIS
  • Clarus
  • Economist Expresso
  • FirstFT, FT Main
  • Reuters
  • Economist Special Reports
  • Risk
  • EACH
  • FN
  • Global Custodian
  • Securities Finance
Complete
18.04

 New sites to scrape

- Complete
18.06

User-specific summary email

  • Once the broken scrapers are fixed, the main ‘ask’ is to make it “user-specific” (ie each user can define their own list of stories and categories to include in the summary email). So the following need to be linked to the logged user:
    • Headline categories (eg Macro, CV19, Corp, Cryp, Pol, Ukraine, Clearing, x)
    • What to include/exclude for the email
    • What economic stats are key to them
    • Their own list of favourites and commentaries
  • At the moment each of these fields are attached to the news article so we need to create a User-News field relationship to achieve this.  Does that make sense to you?
Complete
18.31

Other

  • Make the Economic Stats scrape time 1am and 8.30am
  • Cron does not include FirstFT
  • Add a button to test that the FT.com login is working
  • Full content button (FT) only shows in first block (unassigned)
  • Upon user login, run a check on the user's logins and passwords and determine access accordingly
- Complete
18.04

News Lists

  • Issue is that the buttons trigger a refresh and slow.
  • Can one overcome this with a javascrpt flag and a press to do all button at the end

Users

  • Centrally scrape the content of the articles.  Check once the login of the user at their login stage, and track if good.  Then show in the title page if the login is confirmed as valid
Pending
18.02

Favicon

  •  Not working, both in bookmark and in tab, but only when on the home page… Odd
- Complete
18.03

Login

  • Add ability to register
- Complete
18.05

Formatting

  • Login button
  • Subscription page to show when not logged in
  • Ability to look at each headline
  • User Edit page -
    • Show categories
    • Button to delete photo
    • Show photo
    • Hide Status from user
  • New stories
    • No like/read buttons
- Complete
18.12

Economic stats

  • Highlight and suppress buttons not working
  • Button to use the Teletrader defaults
  • Button to remove your highlight or take standard
  • Or if blank use the default - over-ride
Complete
18.07

Settings

  • Add VCard code (Jeroen)
- Complete
18.08
  • Review of security (Dashboard -changed to ROLE_USER)
Complete
18.10

Read count

  • The new count needs to be user specific. 
  • So this is the number of unread articles in the past 24 hours by user.   Service by user and source
  • The read all button should not affect archives
    • Perhaps in archive you can have unread shown separately. 
Complete
18.09

Removed the CMS and Settings entities

Why isn't LinkedIn password appearing on the list in a User's profile?

Complete
18.11

Pricing

The functions of the website are

  • LinkedIn scrape
  • News:
    • Single place to read articles
      • Summary access only
      • Hover for full article
    • Abilty to mark articles as read, to avoid re-reading
    • Abiliy to select key articles and send summary email 
    • Mark economic stats as favourite - to generate an email
    • See what others are liking - are you missing an important well read article?

 

  • Summary Read-only  
    • Full articles available via a link 
  • One-stop Read access
    • However
      •  
Complete
18.32

Economic Market Statistics

  • Chron job.  Refresh every 15mins
    • Button to refresh manually if >10mins 
  • Historical view by stat
Complete
18.19

LinkedIn:

  • Popup that checks that the LinkedIn login and password are successful
  • The first result should return the number of connections and estimate the time to download them, before proceeding.
    • Advise the user how long it will take to download and that a file will be emailed to them 
  • Email csv file - one-step directly after the scrape -  ie save a file in the database at the end of the scrap and email (ie merge the 3 buttons we have)
- Complete
18.34

Memberships

  • Create a 30-day free trial period
    • Unable to extend
    • Link the systems login to LinkedIn login and other site logins to avoid gaming
    • Buttons to upgrade
Complete
18.14

Favicon

Complete
18.13

Market Data

  • Future T+2 data….  
- Complete
18.15

Subscription page/Memberships

  • Check whether a user has a membership upon opening the Subscription page
    • If not, create a New membership, initially a Free membership
    • The new Membership button shouldn't open the form, but just save the details as per the button
      • Include Today's date 
      • The new membership button shows all users to Non-Admin users
Complete
18.04

Scrape content

  • Check that the scraping of all the news sites works
  • Also ensure that the content and not just the headline are scraped saved in the ‘fullContent’ field in the news entity.
  • In the view, hovering over the title will display the full content
  • For websites where the content is behind a firewall, then login and scrape where available.
Pending
18.05

Set up a Payment gateway- Stripe? 

Pending
18.16

CompanyDetails

  • The HideOther inputs not coming through into Live
- Complete
18.22

LinkedIN scrape (https://www.linkedin.com/in/stephen-j-nurse/)

  • Include the summary experience and the employment history
- Complete
18.26

Bugs

  • Market Economic Stats scraper doesn't work in live.  Not just United States (filters).  Aman is re-writing the scrape 
  • Chron job for  MarketStats doesn't work 
Complete
18.17

Bugs

  • subscriptions_buttons line 45
    • Make this dynamic?
  • source\index line 43
    • Make this dynamic?
  • Favicon issue on live (when favcicon file exists)
- Complete
18.18

Bugs

  • Merge the ‘Highlight/Standard/Low’ buttons into a pop-up.  To avoid confusion. 
  • Set User bug in Market Stats
Complete
18.20

User

Add time zones

- Complete
18.21

LinkedIn contact export

  • Develop a view of the contacts that can be expanded/contracted [like Excel]
  • Import photo and save file
  • Export file - two types: Full CSV and Outlook
    • Export to directory per user
    • Develop a concatenated Notes for the Outlook export
  • Control the number of exports (2 now)
  • Don't show the flashing screen.  Warn on timings
  • Automatically email file
- Complete
18.23

LinkedIn Contacts

  • Deleting LinkedIn Contacts -just the owner's contact and the languages spoken
Complete
18.28

Users

  • Delete doesn't work
  • Include in user view.
    • Membership
- Complete
18.24

LinkedIn Contacts

  • Don't save a Language Spoken for the null case
Complete
18.25

Linkedin contacts 

  • Create a User-Settings upon scrape
Complete
18.03

User Password checks

  • Need to develop an algorithm to check the passwords upon login
Pending
18.27

User Passwords

  • List the users in alphabetical order
  • When you edit a User Password it changes the User name
- Complete
18.01

Deleting LinkedIn Users

  • Delete MY Linkedin contacts should not delete them but mark them as hidden
  • How to handle when we have a contact shared with multiple "owners"?
- Pending
18.30

Market stats

  • Scrape doesn't work in live
Complete
18.29

Security

  • Add role heirachy in security.yaml
Complete
18.33

User memberships

  • Dates
Complete
18.01

LinkedIn export

  • Notes need to include the Employment and Education details
  • Searchable text for mission statement
  • Export function seems to be failing
- Pending

Loading…
Loading the web debug toolbar…
Attempt #