This Vulnerability site is intended to provide summary information that will help the IPD team do whatever it is you do. Right now that is fundamentally the Analyze page.
It's pretty simple. When you click the Analyze link, you'll see:
Only three fields and a button! Pretty simple. And even better, most of the time you're only going to fill in ONE field...
You must fill in EITHER the 'Title' and 'URL' fields, OR the 'Shortcut' field, before clicking the 'Analyze' button at the bottom.
The 'Title' is the name of the source document as defined in the 'Title' column of the Vulnerability Classification (Master) sheet.
The 'URL' is in that same cell, but you have to click on the cell in the Google sheet, copy just the URL out of the =HYPERLINK string, and past that into the 'URL' field on this Analyze page.
Not difficult to do, but it takes several back-and-forth copy-and-paste actions between the Google sheet and this site.
To make this faster, use the 'Shortcut' box. In the Google sheet, copy the entire contents of the 'Title' cell and paste it into the 'Shortcut' box. It will look something like this:
=HYPERLINK("https://database.ich.org/sites/default/files/E6_R2_Addendum.pdf","ICH E6(R2): Good Clinical Practice- Integrated Addendum to E6(R1)")
Then...click the 'Analyze' button below.
At THIS point there is a whole lot going on that you can't see...so don't worry if nothing happens for several seconds. In short, there's a virtual server somewhere in the U.S. with a few thousand lines of code on it that will
- Validate the input you provided
- Go find the source PDF, convert it to text, and store it
- Retrieve the list of current trigger and category words that your team cares about
- Search the PDF (now text equivalent) for all of those words
- Generate a synopsis that will show wherever the tags appear, the surrounding context (lines above and below), and where in that context other tags appear
- Capture page number references and include in the synopsis, so that you can more easily find the section if you need to review the source in more detail. (You would be surprised how difficult it is to get page numbers out of an arbitrary PDF file...)
- The synopsis file is saved to this cloud-based server, and in your browser you get a page that shows the summary of what has just happened.
So here's an example. Let's look at the "Policy and Procedures on Protection of Human Subjects in EPA Conducted or Supported Research" document in the Vulnerability Classification (Master) sheet (currently row 13 but this will likely change):
Note that when you select the cell, the formula bar shows the entire hyperlink=HYPERLINK("https://www.usaid.gov/sites/default/files/documents/1864/200mbe.pdf","Policy and Procedures on Protection of Human Subjects in EPA Conducted or Supported Research")
even though what appears in the cell view is just the label componentPolicy and Procedures on Protection of Human Subjects in EPA Conducted or Supported Research
- So place your mouse in the 'formula' window and click quickly three times...and the entire content will be copied.
Next switch to the Vulnerability site and go to the Analyze page, click in the 'Shortcuts' box and paste the hyperlink you've just copied
...and then click on the 'Analyze' button
Next...nothing may happen for a few seconds, that's normal because a lot is happening in the background. In most cases, in less than 10 seconds you will see the results screen:
(This is the top of the screen that, most of the time, is all you will care about)
First, there are three links to information related to this request:
- A link to the synopsis file that has just been generated
- A link to the original PDF source file (this is the same as what is in the Vulnerability Classification (Master) sheet)
- A link to the text version of the PDF file that was created so that the tag search could happen
Second, a useful bit to make it easier to record this info in the Vulnerability Classification (Master) sheet.
- The "Copy" button will copy to the clipboard the =HYPERLINK... string that you need if you want to update the Vulnerability Classification (Master) sheet with this new synopsis
Hit the "Copy" button, and you'll get confirmation that the information has been copied (the alert will look different depending on what browser you're using)
Now switch to the Vulnerability Classification (Master) sheet and click in the 'Synopsis' column on the same row as the document that you've selected, and click in the (empty) formula bar:
Now paste (Ctl+V)...and the =HYPERLINK...string you need will be pasted into the cell, and hit the [Enter] key to save it
Now you'll see the new 'Synopsis' link in the cell...
And if you hover over it and click the hover pop-up...the Synopsis report will appear.
The synopsis report starts with a list of all the 'classification' tags that were found in the document, and how many times it found them, with hyperlinks to each tag report section.
Click on any one of the tags to see where it was found and what the surrounding context was (in this case it was the 'job' tag):
- You will see that the classification tags are highlighted in whatever color is specified (the default is yellow)
- Each block of text shows a number of rows both above and below the matched tag, for context
- These 'content blocks' are merged together if you have a section of the report where the tag shows up multiple times in the same or nearby lines
- Within each content block, the tag will be highlighted in its specified color (again, yellow is the default), but also any other classification tag that is found will be hightlighted in pink; and any trigger word found will be highlighted in grey
In the results page after you analyze a document, after the links you'll find some additional information; here's an example:
- The 'cost of conversion' is the number of seconds it took the 3rd-party service to convert the PDF file to text (there are a limited number of seconds available). If the conversion has ever happened before, it is stored on the server and it will not be converted again. So if you re-run an analysis on a source PDF file that has been processed before, the cost of conversion will be zero.
- The 'remaining conversion seconds' are exactly that - how much more time has been paid for.
When you hit the 'Analyze' button, the server gets busy. And it pays no attention at all to your browser. Browsers in general are needy...like Yorkshire Terriers...and if you don't pay attention to them they will say "piss off!" by disconnecting your session. Most browsers have a 30-second attention span...after that, if you don't talk to them, they disconnect.
If the source file you are processing is big/complex enough, it may take more than 30 seconds to process them. If that looks like it is going to happen, the server will save the state of the work done so far and return control to the browser - with messages revealing what it did, and what you need to do to resume.
If this happens, you will see something like this in the Notices at the bottom of the results page:Successfully saved the work-in-progress.
The file being processed is large/complex that it must be processed in pieces.
The work-in-progress has been saved. Click the 'Back' button on your browser
to return to the 'Analyze' page, then click the 'Analyze' button again to complete the processing.
Code execution completed in (some amount of time that is almost 30) seconds.
If you see these messages...do just what it says. Back up to the prior page and hit the 'Analyze' button again. The server will pick right back up where it left off. Note it is possible that you could have to do this more than once.