Skip to content

[SVCS-488]Public_file query param and office365 renderer #282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

AddisonSchiller
Copy link
Contributor

refs: https://openscience.atlassian.net/browse/SVCS-488

Purpose

docx rendering is very intensive on the osf. By using Office Online to render publicly available docx files, we can remove a lot of pressure from the unoconv container.

Summary of changes.

Added support for public renderers
Added support to parse for a public_file query param. This query param is optional.public_file=1 denotes that the file is public, while public_file=0 denotes that it is private. All other values for public_file cause errors to be raised.
ProviderMetadata now has an is_public value, which defaults to false.
Added the office365 renderer, which is a public renderer.

Testing Notes

the Office365 renderer does not use the pdf renderer like unoconv used to, so the pdfs that get made by this renderer will not display exactly the same.

There is a README.md in the renderer with more information about testing.

In order to test this with the OSF, an OSF side change needs to be made to make use of the new public_file query param.

Adding support for a `public_file` query param so the OSF can request a
public renderer. Added office365 which is a public renderer. This uses office
online to do .docx file conversions.
@AddisonSchiller AddisonSchiller force-pushed the feature/docx-office365-online-renderer branch from 582453b to ba1dfaa Compare October 4, 2017 17:38
@AddisonSchiller AddisonSchiller changed the title Public_file query param and office365 renderer [SVCS-488]Public_file query param and office365 renderer Oct 4, 2017
@cslzchen cslzchen self-requested a review October 25, 2017 15:21
Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as expected. As discussed with @AddisonSchiller , we may want to change how is_public is surfaced from OSF to WB/MFR.

Currently in the PR: OSF provides the download URL with query param is_public=1 for MFR. My suggestion is to ask OSF to export is_public to the file metadata (which is a more general approach can be used for other/future). But this requires WB update as well. @felliott

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass done! 🎆 🎆

setup.py Outdated
@@ -40,6 +40,10 @@ def parse_requirements(requirements):
'http = mfr.providers.http:HttpProvider',
'osf = mfr.providers.osf:OsfProvider',
],
'mfr.public_renderers': [
'.docx = mfr.extensions.office365:Office365Renderer',

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this blank line intentional?

code=400,
)

if cleaned_url.args['public_file'] == '1':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_public = False
if ...
    if ...
        raise ...
    is_public = cleaned_url.args['public_file'] == '1'

return ...

@@ -48,17 +48,19 @@ def download(self):

class ProviderMetadata:

def __init__(self, name, ext, content_type, unique_key, download_url):
def __init__(self, name, ext, content_type, unique_key, download_url, is_public=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make is_public position args if not too refactoring is required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean like get rid of the public_file=false
? and it just being..

.... download_url, is_public)

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, keep it as kwargs.

invoke_on_load=True,
invoke_args=(metadata, file_path, url, assets_url, export_url),
).driver
except:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be more explicit on the exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we talked about this. The reason it isn't specific is because any specific exception will be reported below.
If this is new let me know.

@@ -0,0 +1,20 @@

# Office 365 Renderer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Need to update this later when OSF side is updated.


def render(self):
download_url = furl.furl(self.metadata.download_url).set(query='')
url = settings.OFFICE_BASE_URL + download_url.url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encode the URL since it is the value of the src query param. It works without encoding during local test since the URL is probably not ambiguous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Need proper encoding.

What we feed to MS Office Live is an WB URL and here is an example of my local one which is more complicated than OSF url we test.

http://192.168.168.167:7777/v1/resources/ygcb4/providers/osfstorage/5a01dd60005aca00414c567d?action=download&public_file=1&version=1&mode=render&direct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is what mine looks like
This is the download_url after the query params are stripped off

http://192.168.168.167:7777/v1/resources/3uq58/providers/osfstorage/5a03579db4eef2001827a8c3

as long as the project is public, going to that link will download the file. Also there are no chars that -need- to be encoded. If they should be encoded, well probably? We briefly discussed this. Thoughts on encoding the / marks?

@coveralls
Copy link

coveralls commented Nov 8, 2017

Coverage Status

Changes Unknown when pulling 8c014ea on AddisonSchiller:feature/docx-office365-online-renderer into ** on CenterForOpenScience:develop**.

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second pass done. 🎆 🎆

@@ -76,6 +78,20 @@ def make_renderer(name, metadata, file_path, url, assets_url, export_url):
:rtype: :class:`mfr.core.extension.BaseRenderer`
"""
normalized_name = (name and name.lower()) or 'none'
logger.info(str(metadata.is_public))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Remove this line.

Probably need to remove the imports as well.


def render(self):
download_url = furl.furl(self.metadata.download_url).set(query='')
url = settings.OFFICE_BASE_URL + download_url.url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Need proper encoding.

What we feed to MS Office Live is an WB URL and here is an example of my local one which is more complicated than OSF url we test.

http://192.168.168.167:7777/v1/resources/ygcb4/providers/osfstorage/5a01dd60005aca00414c567d?action=download&public_file=1&version=1&mode=render&direct

@@ -48,17 +48,19 @@ def download(self):

class ProviderMetadata:

def __init__(self, name, ext, content_type, unique_key, download_url):
def __init__(self, name, ext, content_type, unique_key, download_url, is_public=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, keep it as kwargs.

@coveralls
Copy link

coveralls commented Nov 8, 2017

Coverage Status

Changes Unknown when pulling e3c30ff on AddisonSchiller:feature/docx-office365-online-renderer into ** on CenterForOpenScience:develop**.

Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎆 🎆 🎆 🎆 🎆

Notes for Phase 2

  • Encoding

WB download URL

Here is the code and our discussions: https://github.com/CenterForOpenScience/modular-file-renderer/pull/282/files#diff-2fc1019752cc1cd0a7f98aa9bec5974fR27. We are not sure whether to use furl or python's urllib. Currently, we leave it not encoded :-) for experiment purpose.

Notes for the Future

  • Does Microsoft enforces a rate limit?
  • How can we handle the "a few hour" cache problem?
  • What if this external service is down? Should we detect in advance?

Notes for the Merge/Deployment

  • Requires OSF side to take effect
  • This PR can slip into dev / prod first.
  • We should test on staging by ourself before QA

@cslzchen
Copy link
Contributor

cslzchen commented Dec 7, 2017

Replaced by #304.

@cslzchen cslzchen closed this Dec 7, 2017
@NyanHelsing NyanHelsing mentioned this pull request Aug 3, 2018
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants