-
Notifications
You must be signed in to change notification settings - Fork 818
[TIKA-XXXX] Refactor(core): Modularize Classes, Methods, and Associations for Clarity. #2171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… Unidirectional, and Pull-up Method to improve modularity.
I have also committed the below changes into the files:
|
Why did you delete some javadocs? |
There are some minor refactorings that I will look at. |
i removed it for a while for my convinience and forgot to add them back, ill add it back and request a PR if that is the only issue, and yes it was related to my university assignment. I used designite to detect the code smells and then refactored some code accordingly. I will surely go through the link you have provided and make necessary changes, thank you. If there is anything specific that you would want me to do, please do let me know! |
What you could do:
Be aware that I might still not apply the remaining changes, which may be frustrating to you. Maybe somebody else will, maybe not. |
I'm wondering whether the change with the MIME_TYPE_TAG is correct. The original code has a null check for |
This pull request includes multiple code refactorings aimed at improving clarity, readability, and maintainability in the Apache Tika codebase. The changes preserve original functionality while making the code more expressive and modular.
Refactorings Applied
Extract Method + Decompose Conditional
Location: MediaTypeRegistry#getSupertype()
Replaced deeply nested if-else blocks with helper methods like isXmlSubtype(), isTextType(), isEmptyType(), etc.
Used early returns to simplify control flow and reduce cyclomatic complexity.
Rename Variable
Locations: MediaTypeRegistry.java, JsonPipesIterator.java
Renamed variables to improve self-documentation:
type → mediaType
t → tuple
r → reader
Introduce Explaining Constants
Location: TextStatistics#looksLikeUTF8()
Replaced magic numbers (e.g. 0x20, 0x80, 0xc0) with named constants for better readability and understanding of UTF-8 byte range logic.
Note: I am awaiting access to the Apache Tika Jira issue tracker to file a formal issue.
Once granted access, I will: