Skip to content

Commit 4ecf6cc

Browse files
authored
[Data Liberation] Add XML API, Stream API, WXR URL Rewriter API (#1952)
A part of #1894. Follows up on #1893. This PR brings in a few more PHP APIs that were initially explored outside of Playground so that they can be incubated in Playground. See the linked descriptions for more details about each API: * XML Processor from WordPress/wordpress-develop#6713 * Stream chain from adamziel/wxr-normalize#1 * A draft of a WXR URL Rewriter class capable of rewriting URLs in WXR files ## Testing instructions * Confirm the PHPUnit tests pass in CI * Confirm the test suite looks reasonabel * That's it for now! It's all new code that's not actually used anywhere in Playground yet. I just want to merge it to keep iterating and improving.
1 parent e5813df commit 4ecf6cc

25 files changed

+6400
-172
lines changed

CHANGELOG.md

+18-23
Original file line numberDiff line numberDiff line change
@@ -4,65 +4,60 @@ All notable changes to this project are documented in this file by a CI job
44
that runs on every NPM release. The file follows the [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
55
format.
66

7-
## [v1.0.7] (2024-10-28)
7+
## [v1.0.7] (2024-10-28)
88

9-
10-
11-
12-
## [v1.0.6] (2024-10-28)
9+
## [v1.0.6] (2024-10-28)
1310

1411
### Website
1512

16-
- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
17-
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))
13+
- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
14+
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))
1815

1916
### Contributors
2017

2118
The following contributors merged PRs in this release:
2219

2320
@adamziel @bgrgicak
2421

25-
26-
## [v1.0.5] (2024-10-25)
22+
## [v1.0.5] (2024-10-25)
2723

2824
### Enhancements
2925

30-
- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))
26+
- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))
3127

3228
### Blueprints
3329

34-
- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
35-
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))
30+
- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
31+
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))
3632

3733
### Documentation
3834

39-
- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))
35+
- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))
4036

4137
### PHP WebAssembly
4238

43-
- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
44-
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))
39+
- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
40+
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))
4541

4642
### Website
4743

48-
- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
49-
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))
44+
- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
45+
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))
5046

5147
### Bug Fixes
5248

53-
- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
54-
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
55-
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
56-
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
57-
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))
49+
- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
50+
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
51+
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
52+
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
53+
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))
5854

5955
### Contributors
6056

6157
The following contributors merged PRs in this release:
6258

6359
@adamziel @ajotka @bgrgicak @bph @brandonpayton @ockham @psrpinto
6460

65-
6661
## [v1.0.4] (2024-10-21)
6762

6863
### Enhancements

packages/docs/site/docs/main/changelog.md

+18-23
Original file line numberDiff line numberDiff line change
@@ -9,65 +9,60 @@ All notable changes to this project are documented in this file by a CI job
99
that runs on every NPM release. The file follows the [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
1010
format.
1111

12-
## [v1.0.7] (2024-10-28)
12+
## [v1.0.7] (2024-10-28)
1313

14-
15-
16-
17-
## [v1.0.6] (2024-10-28)
14+
## [v1.0.6] (2024-10-28)
1815

1916
### Website
2017

21-
- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
22-
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))
18+
- Query API: Preserve multiple ?plugin= query params. ([#1947](https://github.com/WordPress/wordpress-playground/pull/1947))
19+
- [Remote] Enable releasing @wp-playground/remote by making it public. ([#1948](https://github.com/WordPress/wordpress-playground/pull/1948))
2320

2421
### Contributors
2522

2623
The following contributors merged PRs in this release:
2724

2825
@adamziel @bgrgicak
2926

30-
31-
## [v1.0.5] (2024-10-25)
27+
## [v1.0.5] (2024-10-25)
3228

3329
### Enhancements
3430

35-
- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))
31+
- [CORS Proxy] Rate-limits IPv6 requests based on /64 subnets, not specific addresses. ([#1923](https://github.com/WordPress/wordpress-playground/pull/1923))
3632

3733
### Blueprints
3834

39-
- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
40-
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))
35+
- Reload after autologin to set login cookies during boot. ([#1914](https://github.com/WordPress/wordpress-playground/pull/1914))
36+
- Skip empty lines in the runSql step. ([#1939](https://github.com/WordPress/wordpress-playground/pull/1939))
4137

4238
### Documentation
4339

44-
- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))
40+
- Clarified wp beta to also include rc version. ([#1936](https://github.com/WordPress/wordpress-playground/pull/1936))
4541

4642
### PHP WebAssembly
4743

48-
- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
49-
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))
44+
- Enable CURL in Playground Web. ([#1935](https://github.com/WordPress/wordpress-playground/pull/1935))
45+
- PHP: Implement TLS 1.2 to decrypt https:// and ssl:// traffic and translate it into fetch(). ([#1926](https://github.com/WordPress/wordpress-playground/pull/1926))
5046

5147
### Website
5248

53-
- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
54-
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))
49+
- Hide Settings menu after clicking "Restore from .zip. ([#1904](https://github.com/WordPress/wordpress-playground/pull/1904))
50+
- Publish @wp-playground/remote (types only). ([#1924](https://github.com/WordPress/wordpress-playground/pull/1924))
5551

5652
### Bug Fixes
5753

58-
- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
59-
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
60-
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
61-
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
62-
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))
54+
- CORS Proxy: Index update_at column because it is used for lookup. ([#1931](https://github.com/WordPress/wordpress-playground/pull/1931))
55+
- CORS Proxy: Reject targeting self. ([#1932](https://github.com/WordPress/wordpress-playground/pull/1932))
56+
- Docs: Fix typo. ([#1934](https://github.com/WordPress/wordpress-playground/pull/1934))
57+
- Explicitly request no-cache to discourage WP Cloud from edge caching CORS proxy results. ([#1930](https://github.com/WordPress/wordpress-playground/pull/1930))
58+
- Remove test code added in #1914. ([#1928](https://github.com/WordPress/wordpress-playground/pull/1928))
6359

6460
### Contributors
6561

6662
The following contributors merged PRs in this release:
6763

6864
@adamziel @ajotka @bgrgicak @bph @brandonpayton @ockham @psrpinto
6965

70-
7166
## [v1.0.4] (2024-10-21)
7267

7368
### Enhancements

packages/playground/data-liberation/bootstrap.php

+14
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
<?php
22

3+
require_once __DIR__ . '/src/stream-api/WP_Stream_Processor.php';
4+
require_once __DIR__ . '/src/stream-api/WP_Byte_Stream_State.php';
5+
require_once __DIR__ . '/src/stream-api/WP_Byte_Stream.php';
6+
require_once __DIR__ . '/src/stream-api/WP_Processor_Byte_Stream.php';
7+
require_once __DIR__ . '/src/stream-api/WP_File_Byte_Stream.php';
8+
require_once __DIR__ . '/src/stream-api/WP_Stream_Paused_State.php';
9+
require_once __DIR__ . '/src/stream-api/WP_Stream_Chain.php';
10+
311
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-token.php";
412
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-span.php";
513
require_once __DIR__ . "/src/wordpress-core-html-api/class-wp-html-text-replacement.php";
@@ -20,6 +28,12 @@
2028
require_once __DIR__ . '/src/WP_Block_Markup_Url_Processor.php';
2129
require_once __DIR__ . '/src/WP_URL_In_Text_Processor.php';
2230
require_once __DIR__ . '/src/WP_URL.php';
31+
32+
require_once __DIR__ . '/src/xml-api/WP_XML_Decoder.php';
33+
require_once __DIR__ . '/src/xml-api/WP_XML_Tag_Processor.php';
34+
require_once __DIR__ . '/src/xml-api/WP_XML_Processor.php';
35+
require_once __DIR__ . '/src/WP_WXR_URL_Rewrite_Processor.php';
36+
2337
require_once __DIR__ . '/vendor/autoload.php';
2438

2539

packages/playground/data-liberation/phpunit.xml

+4-1
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,15 @@
22
<phpunit xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" bootstrap="bootstrap.php" colors="true" xsi:noNamespaceSchemaLocation="https://schema.phpunit.de/10.0/phpunit.xsd" cacheDirectory=".phpunit.cache">
33
<testsuites>
44
<testsuite name="Application Test Suite">
5+
<file>tests/WPWXRURLRewriterTests.php</file>
56
<file>tests/WPRewriteUrlsTests.php</file>
67
<file>tests/WPURLInTextProcessorTests.php</file>
78
<file>tests/WPBlockMarkupProcessorTests.php</file>
89
<file>tests/WPBlockMarkupUrlProcessorTests.php</file>
910
<file>tests/URLParserWHATWGComplianceTests.php</file>
10-
<file>tests/UrldecodeNTests.php</file>
11+
<file>tests/WPXMLProcessorTests.php</file>
12+
<file>tests/WPXMLTagProcessorTests.php</file>
13+
<file>tests/UrldecodeNTests.php</file>
1114
</testsuite>
1215
</testsuites>
1316
</phpunit>

packages/playground/data-liberation/src/WP_URL_In_Text_Processor.php

+1-1
Original file line numberDiff line numberDiff line change
@@ -233,7 +233,7 @@ public function next_url() {
233233
}
234234

235235
$tld = strtolower( substr( $parsed_url->hostname, $last_dot_position + 1 ) );
236-
if ( empty( self::$public_suffix_list[ $tld ] ) ) {
236+
if ( empty( self::$public_suffix_list[ $tld ] ) && $tld !== 'internal' ) {
237237
// This TLD is not in the public suffix list. It's not a valid domain name.
238238
continue;
239239
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
<?php
2+
3+
class WP_WXR_URL_Rewrite_Processor {
4+
5+
6+
public static function stream( $current_site_url, $new_site_url ) {
7+
return WP_XML_Processor::stream(
8+
function ( $processor ) use ( $current_site_url, $new_site_url ) {
9+
if ( static::is_wxr_content_node( $processor ) ) {
10+
$text = $processor->get_modifiable_text();
11+
$updated_text = wp_rewrite_urls(
12+
array(
13+
'block_markup' => $text,
14+
'current-site-url' => $current_site_url,
15+
'new-site-url' => $new_site_url,
16+
)
17+
);
18+
if ( $updated_text !== $text ) {
19+
$processor->set_modifiable_text( $updated_text );
20+
}
21+
}
22+
}
23+
);
24+
}
25+
26+
private static function is_wxr_content_node( WP_XML_Processor $processor ) {
27+
$breadcrumbs = $processor->get_breadcrumbs();
28+
if (
29+
! in_array( 'excerpt:encoded', $breadcrumbs, true ) &&
30+
! in_array( 'content:encoded', $breadcrumbs, true ) &&
31+
! in_array( 'guid', $breadcrumbs, true ) &&
32+
! in_array( 'link', $breadcrumbs, true ) &&
33+
! in_array( 'wp:attachment_url', $breadcrumbs, true ) &&
34+
! in_array( 'wp:comment_content', $breadcrumbs, true ) &&
35+
! in_array( 'wp:base_site_url', $breadcrumbs, true ) &&
36+
! in_array( 'wp:base_blog_url', $breadcrumbs, true )
37+
// Meta values are not supported yet. We'll need to support
38+
// WordPress core options that may be saved as JSON, PHP Deserialization, and XML,
39+
// and then provide extension points for plugins authors support
40+
// their own options.
41+
// !in_array('wp:postmeta', $processor->get_breadcrumbs())
42+
) {
43+
return false;
44+
}
45+
return true;
46+
}
47+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
<?php
2+
3+
abstract class WP_Byte_Stream {
4+
5+
protected $state;
6+
7+
public function __construct() {
8+
$this->state = new WP_Byte_Stream_State();
9+
}
10+
11+
public function is_eof(): bool {
12+
return ! $this->state->output_bytes && $this->state->state === WP_Byte_Stream_State::STATE_FINISHED;
13+
}
14+
15+
public function get_file_id() {
16+
return $this->state->file_id;
17+
}
18+
19+
public function skip_file(): void {
20+
$this->state->last_skipped_file = $this->state->file_id;
21+
}
22+
23+
public function is_skipped_file() {
24+
return $this->state->file_id === $this->state->last_skipped_file;
25+
}
26+
27+
public function get_chunk_type() {
28+
if ( $this->get_last_error() ) {
29+
return '#error';
30+
}
31+
32+
if ( $this->is_eof() ) {
33+
return '#eof';
34+
}
35+
36+
return '#bytes';
37+
}
38+
39+
public function append_eof() {
40+
$this->state->input_eof = true;
41+
}
42+
43+
public function append_bytes( string $bytes, $context = null ) {
44+
$this->state->input_bytes .= $bytes;
45+
$this->state->input_context = $context;
46+
}
47+
48+
public function get_bytes() {
49+
return $this->state->output_bytes;
50+
}
51+
52+
public function next_bytes() {
53+
$this->state->reset_output();
54+
if ( $this->is_eof() ) {
55+
return false;
56+
}
57+
58+
// Process any remaining buffered input:
59+
if ( $this->generate_next_chunk() ) {
60+
return ! $this->is_skipped_file();
61+
}
62+
63+
if ( ! $this->state->input_bytes ) {
64+
if ( $this->state->input_eof ) {
65+
$this->state->finish();
66+
}
67+
return false;
68+
}
69+
70+
$produced_bytes = $this->generate_next_chunk();
71+
72+
return $produced_bytes && ! $this->is_skipped_file();
73+
}
74+
75+
abstract protected function generate_next_chunk(): bool;
76+
77+
public function get_last_error(): string|null {
78+
return $this->state->last_error;
79+
}
80+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
<?php
2+
3+
/**
4+
* This interface describes standalone streams, but it can also be
5+
* used to describe a stream Processor like WP_XML_Processor.
6+
*
7+
* In this prototype there are no pipes, streams, and processors. There
8+
* are only Byte Streams that can be chained together with the StreamChain
9+
* class.
10+
*/
11+
class WP_Byte_Stream_State {
12+
const STATE_STREAMING = '#streaming';
13+
const STATE_FINISHED = '#finished';
14+
15+
public $input_eof = false;
16+
public $input_bytes = null;
17+
public $output_bytes = null;
18+
public $state = self::STATE_STREAMING;
19+
public $last_error = null;
20+
public $input_context = null;
21+
22+
public $file_id;
23+
public $last_skipped_file;
24+
25+
public function reset_output() {
26+
$this->output_bytes = null;
27+
$this->file_id = 'default';
28+
$this->last_error = null;
29+
}
30+
31+
public function consume_input_bytes() {
32+
$bytes = $this->input_bytes;
33+
$this->input_bytes = null;
34+
return $bytes;
35+
}
36+
37+
public function finish() {
38+
$this->state = self::STATE_FINISHED;
39+
}
40+
}

0 commit comments

Comments
 (0)