-
Notifications
You must be signed in to change notification settings - Fork 71
[SVCS-488]Public_file
query param and office365 renderer
#282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVCS-488]Public_file
query param and office365 renderer
#282
Conversation
Adding support for a `public_file` query param so the OSF can request a public renderer. Added office365 which is a public renderer. This uses office online to do .docx file conversions.
582453b
to
ba1dfaa
Compare
Public_file
query param and office365 rendererPublic_file
query param and office365 renderer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works as expected. As discussed with @AddisonSchiller , we may want to change how is_public
is surfaced from OSF to WB/MFR.
Currently in the PR: OSF provides the download URL with query param is_public=1
for MFR. My suggestion is to ask OSF to export is_public
to the file metadata (which is a more general approach can be used for other/future). But this requires WB update as well. @felliott
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass done! 🎆 🎆
setup.py
Outdated
@@ -40,6 +40,10 @@ def parse_requirements(requirements): | |||
'http = mfr.providers.http:HttpProvider', | |||
'osf = mfr.providers.osf:OsfProvider', | |||
], | |||
'mfr.public_renderers': [ | |||
'.docx = mfr.extensions.office365:Office365Renderer', | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this blank line intentional?
mfr/providers/osf/provider.py
Outdated
code=400, | ||
) | ||
|
||
if cleaned_url.args['public_file'] == '1': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_public = False
if ...
if ...
raise ...
is_public = cleaned_url.args['public_file'] == '1'
return ...
@@ -48,17 +48,19 @@ def download(self): | |||
|
|||
class ProviderMetadata: | |||
|
|||
def __init__(self, name, ext, content_type, unique_key, download_url): | |||
def __init__(self, name, ext, content_type, unique_key, download_url, is_public=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make is_public
position args if not too refactoring is required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean like get rid of the public_file=false
? and it just being..
.... download_url, is_public)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, keep it as kwargs
.
invoke_on_load=True, | ||
invoke_args=(metadata, file_path, url, assets_url, export_url), | ||
).driver | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be more explicit on the exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we talked about this. The reason it isn't specific is because any specific exception will be reported below.
If this is new let me know.
@@ -0,0 +1,20 @@ | |||
|
|||
# Office 365 Renderer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Need to update this later when OSF side is updated.
|
||
def render(self): | ||
download_url = furl.furl(self.metadata.download_url).set(query='') | ||
url = settings.OFFICE_BASE_URL + download_url.url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encode the URL since it is the value of the src
query param. It works without encoding during local test since the URL is probably not ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Need proper encoding.
What we feed to MS Office Live is an WB URL and here is an example of my local one which is more complicated than OSF url we test.
http://192.168.168.167:7777/v1/resources/ygcb4/providers/osfstorage/5a01dd60005aca00414c567d?action=download&public_file=1&version=1&mode=render&direct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is what mine looks like
This is the download_url
after the query params are stripped off
http://192.168.168.167:7777/v1/resources/3uq58/providers/osfstorage/5a03579db4eef2001827a8c3
as long as the project is public, going to that link will download the file. Also there are no chars that -need- to be encoded. If they should be encoded, well probably? We briefly discussed this. Thoughts on encoding the /
marks?
Changes Unknown when pulling 8c014ea on AddisonSchiller:feature/docx-office365-online-renderer into ** on CenterForOpenScience:develop**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second pass done. 🎆 🎆
mfr/core/utils.py
Outdated
@@ -76,6 +78,20 @@ def make_renderer(name, metadata, file_path, url, assets_url, export_url): | |||
:rtype: :class:`mfr.core.extension.BaseRenderer` | |||
""" | |||
normalized_name = (name and name.lower()) or 'none' | |||
logger.info(str(metadata.is_public)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Remove this line.
Probably need to remove the imports as well.
|
||
def render(self): | ||
download_url = furl.furl(self.metadata.download_url).set(query='') | ||
url = settings.OFFICE_BASE_URL + download_url.url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Need proper encoding.
What we feed to MS Office Live is an WB URL and here is an example of my local one which is more complicated than OSF url we test.
http://192.168.168.167:7777/v1/resources/ygcb4/providers/osfstorage/5a01dd60005aca00414c567d?action=download&public_file=1&version=1&mode=render&direct
@@ -48,17 +48,19 @@ def download(self): | |||
|
|||
class ProviderMetadata: | |||
|
|||
def __init__(self, name, ext, content_type, unique_key, download_url): | |||
def __init__(self, name, ext, content_type, unique_key, download_url, is_public=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, keep it as kwargs
.
Changes Unknown when pulling e3c30ff on AddisonSchiller:feature/docx-office365-online-renderer into ** on CenterForOpenScience:develop**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎆 🎆 🎆 🎆 🎆
Notes for Phase 2
- Encoding
WB download URL
Here is the code and our discussions: https://github.com/CenterForOpenScience/modular-file-renderer/pull/282/files#diff-2fc1019752cc1cd0a7f98aa9bec5974fR27. We are not sure whether to use furl
or python's urllib
. Currently, we leave it not encoded :-) for experiment purpose.
Notes for the Future
- Does Microsoft enforces a rate limit?
- How can we handle the "a few hour" cache problem?
- What if this external service is down? Should we detect in advance?
Notes for the Merge/Deployment
- Requires OSF side to take effect
- This PR can slip into dev / prod first.
- We should test on staging by ourself before QA
Replaced by #304. |
refs: https://openscience.atlassian.net/browse/SVCS-488
Purpose
docx rendering is very intensive on the osf. By using Office Online to render publicly available docx files, we can remove a lot of pressure from the unoconv container.
Summary of changes.
Added support for public renderers
Added support to parse for a
public_file
query param. This query param is optional.public_file=1
denotes that the file is public, whilepublic_file=0
denotes that it is private. All other values forpublic_file
cause errors to be raised.ProviderMetadata now has an is_public value, which defaults to false.
Added the office365 renderer, which is a public renderer.
Testing Notes
the Office365 renderer does not use the pdf renderer like unoconv used to, so the pdfs that get made by this renderer will not display exactly the same.
There is a README.md in the renderer with more information about testing.
In order to test this with the OSF, an OSF side change needs to be made to make use of the new
public_file
query param.