Full Scrape: Improvements
1.Trim the Employers
- When scraping, trim the name of the employer(s) to remove any leading/extra spaces
- The search function was struggling to handle cases where there were/were not additional ‘spaces’, so I figured that the best way to do that was to trim as far upstream as possible
2. Add new languages
- You might have noticed that in he LinkedInConnections index and in the SearchResults, I have consolidated the languages-spoken into a single column (as real estate is a premium), and I have also built “Languages spoken” into the search functionality
- This means that I can handle any number of languages (without expanding columns and destroying the visual impact), so I no longer need to restrict languages outside the top 5 to ‘Other’.
- Therefore when we scrape a profile, in the language section, can we add the new language into the Language entity, if it doesn’t exist already.
- I can manually add the 2-letter acronym and the flag manually later to maintain that, but capturing the new Language at scrape would be terrific.
3. Additional connections scrape
- If you recall, I asked that when you were processing a full-profile scrape that we capture the ‘Basic’ scrape of the details of the suggested names, that are displayed on the right. See below for an example.
- This is a critical function in order for us to quickly build out our population set as fast as possible
- On reflection, I think you are correct when you suggested we assign a ‘system-user’ as the owner of these ‘orphan’ connections, as ultimately we will want to perform the full scrape on this profile, and it will need a user to marry up against in them to trigger that. Do you agree, or could the system-linkedin cooke just look for non-owned (orphaned) users.
- If we could do it without a system-owner it might make it easier to maintain the database ? Open to ideas