[RFC]: Add support for string arrays in stdlib

### Full name

Dhruv Arvind Singh

### University status

Yes

### University name

Indian Institute Of Information Technology, Nagpur

### University program

Computer Science Enginneering

### Expected graduation

2027

### Short biography

I’m a second-year Computer Science and Engineering student at the Indian 
Institute of Information Technology, Nagpur. My fascination with technology began in 
high school when I started experimenting with mobile app development, which 
quickly turned into a passion for building things with code. 
College opened up new opportunities for me to explore computer science 
more systematically. I got hands-on with languages like Python and JavaScript, 
gradually developing an interest in backend systems, data modeling, and the logic 
behind how real-world applications work. That curiosity has fueled my learning 
journey ever since. 
Over time, my interest started shifting toward building real-world 
applications. That curiosity led me to explore TypeScript and Rust, which introduced 
me to the world of web development and backend architecture. I eventually 
immersed myself in full-stack development, working with frameworks like React, 
FastAPI, and PostgreSQL, while also exploring containerization and CI/CD pipelines. 
Lately, I’ve been actively contributing to open-source initiatives that align 
with my interests in backend systems and tooling. These experiences have not only 
deepened my technical skills but also connected me with a global community of 
developers and mentors.


### Timezone

Indian Standard Time (GMT+5:30)

### Contact details

dhruvsinghxd@gmail.com

### Platform

Linux

### Editor

My preferred code editor is VSCode with extensions like ESlint warnings and devcontiners

### Programming experience

I was first introduced to programming in high school through Kotlin, which 
sparked my curiosity for using code to solve logical problems. That early experience 
laid the foundation for my deeper dive into computer science after entering college, 
where I explored languages like C and C++ to enhance my algorithmic thinking and 
understand computational fundamentals. 
As I delved further, I became increasingly interested in building user-facing 
applications. That’s when I transitioned into web development, picking up JavaScript 
and gradually expanding into the full-stack ecosystem with technologies like 
React.js, Next.js, and Express.js. To diversify my backend skill set, I’ve recently been 
exploring Rust and its ecosystem. 
Throughout this journey, I’ve worked on a variety of projects—from interactive 
web apps to CLI-based utilities. One of my favourite projects is Multiplayer Ludo, a 
real-time game platform that connects users globally for online matches. The 
system is built on a Node.js backend with WebSocket support for live interaction 
and uses MySQL for persistent data. Features like custom game rooms and targeted 
matchmaking enhance the multiplayer experience significantly. 

### JavaScript experience

I was first introduced to JavaScript during a college course, and that 
moment marked the beginning of my journey into web development. As I 
started experimenting with it more, I soon discovered frameworks like React 
and Express, which I used to build one of my early full-stack projects—a chat 
bot. 
That experience sparked a deeper interest in development, eventually 
leading me into the world of open-source. Contributing to real-world 
projects has significantly strengthened my JavaScript skills and given me 
valuable insight into writing maintainable, production-level code. 


### Node.js experience

Over time, I’ve become quite comfortable working with Node.js, 
handling everything from writing scripts and building RESTful APIs to 
working with file systems and integrating databases. My experience has 
been shaped largely by backend-focused projects, where I frequently used 
Express.js to architect and manage server-side logic. 
I've also spent a lot of time exploring the npm ecosystem—leveraging 
a variety of libraries to streamline development and enhance functionality. 
One project I’m particularly proud of is a collaborative whiteboard 
application. It features real-time drawing powered by a WebSocket server, 
with a Node.js backend and a Next.js frontend. MySQL serves as the primary 
database, enabling persistent and synchronized user interactions.

### C/Fortran experience

In college, C was the first language I was introduced to, and I’ve 
developed a solid understanding of it through solving various competitive 
programming problems focused on data structures and algorithms. 
As part of my first-semester coursework, I built the string library in C
from scratch. For my third-semester project, I developed a Library 
Management System in C++—a terminal-based application that utilizes the 
full capabilities of C++ classes and manages data through a .csv file, 
including storage and manipulation. 
I’ve also built several small projects to further strengthen my 
understanding. 
While I don’t have much experience with Fortran, I’m always open to 
exploring new languages and technologies.

I have also built some more small projects of C and C++.

I don’t have much experience of Fortran.

### Interest in stdlib

JavaScript is often seen purely as a tool for building user interfaces, but its 
role in data engineering and analysis is quickly evolving. With its growing 
ecosystem and runtime versatility, it's becoming just as capable in backend and 
scientific workflows. That’s why stdlib’s mission to strengthen JavaScript’s utility 
across domains feels especially timely—and it’s a movement I’m excited to 
contribute to. 
While working on various contributions, I’ve always fascinated the datatype 
array focused utilities from stdlib and have been consistently impressed by how 
approachable and well-structured the library is. Even with limited exploration, 
it’s evident that stdlib offers robust support for a broad range of development 
needs. 
What makes the experience even better is the sense of community. Clear 
onboarding documentation, supportive maintainers, and an active contributor 
network made getting involved not just easy but enjoyable. It’s helped me refine 
my skills, expand my knowledge, and grow as a developer through real-world 
collaboration.

Contributing to this project definitely made me a better programmer and I wish to learn and grow more!

### Version control

Yes

### Contributions to stdlib

o	Adds C implementation:
○	[#4388](https://github.com/stdlib-js/stdlib/pull/4790) (merged) - @stdlib/stats/base/dists/logistic/logpdf
o	[#4424](https://github.com/stdlib-js/stdlib/pull/4424) (merged) stdlib/stats/base/dists/laplace/logpdf
o	[#4437](https://github.com/stdlib-js/stdlib/pull/4437) (merged) @stdlib/stats/base/dists/laplace/quantile
o	[#4324](https://github.com/stdlib-js/stdlib/pull/4324/) (merged) @stdlib/stats/base/dists/weibull/mode
o	[#4422](https://github.com/stdlib-js/stdlib/pull/4422) (merged) @stdlib/stats/base/dists/laplace/logcdf
o	[#4352](https://github.com/stdlib-js/stdlib/pull/4352) (merged) @stdlib/stats/base/dists/logistic/quantile
o	[#4440](https://github.com/stdlib-js/stdlib/pull/4440) (merged) @stdlib/stats/base/dists/laplace/cdf
o	[#4790](https://github.com/stdlib-js/stdlib/pull/4790) (open) - math/base/special/gammainc
o	[#4455](https://github.com/stdlib-js/stdlib/pull/4455)  (open) - math/base/special/rising-factorial


o	Refactor existing math/special /BLAS packages to follow current conventions:
o	[#4651](https://github.com/stdlib-js/stdlib/pull/4651) (merged) @stdlib/stats/base/dmeanvarpn
o	[#4648](https://github.com/stdlib-js/stdlib/pull/4648) (merged) @stdlib/stats/base/dmeanvar
o	[#4648](https://github.com/stdlib-js/stdlib/pull/4647) (merged) @stdlib/stats/base/dmeanstdevpn
o	[#4618](https://github.com/stdlib-js/stdlib/pull/4618) (merged) @stdlib/math/base/assert/is-finitef
o	[#4617](https://github.com/stdlib-js/stdlib/pull/4617) (merged) @stdlib/math/base/assert/is-finite
o	[#4615](https://github.com/stdlib-js/stdlib/pull/4615) (merged) @stdlib/math/base/assert/is-infinitef
o	#4614	 (merged) @stdlib/math/base/assert/is-nanf
o	[#4612](https://github.com/stdlib-js/stdlib/pull/4612) (merged) @stdlib/stats/base/dmeanstdev
o	[#4539](https://github.com/stdlib-js/stdlib/pull/4539) (merged) @stdlib/stats/base/dnanstdevch	
o	[#4538](https://github.com/stdlib-js/stdlib/pull/4538) (merged) @stdlib/stats/base/dnanvariancepn
o	[#4537](https://github.com/stdlib-js/stdlib/pull/4537) (merged) @stdlib/stats/base/dnanstdevpn
o	[#4536](https://github.com/stdlib-js/stdlib/pull/4536)	 (merged) @stdlib/stats/base/dnanstdevyc
o	[#4511](https://github.com/stdlib-js/stdlib/pull/4511) (merged) @stdlib/stats/base/snanvarianceyc
o	[#4535](https://github.com/stdlib-js/stdlib/pull/4535)	 (merged) @stdlib/stats/base/dnanvarianceyc
o	[#4509](https://github.com/stdlib-js/stdlib/pull/4509) (merged) @stdlib/stats/base/sstdevpn
o	[#4508](https://github.com/stdlib-js/stdlib/pull/4508) (merged) @stdlib/stats/base/sstdevch
o	[#4507](https://github.com/stdlib-js/stdlib/pull/4507) (merged) @stdlib/stats/base/sstdev
o	[#4505](https://github.com/stdlib-js/stdlib/pull/4505)	 (merged) @stdlib/stats/base/snanvariancetk
o	[#4504](https://github.com/stdlib-js/stdlib/pull/4504)	 (merged) @stdlib/stats/base/snanvariancewd
o	[#4535](https://github.com/stdlib-js/stdlib/pull/4535)	 (merged) @stdlib/stats/base/dnanvarianceyc
o	[#4535](https://github.com/stdlib-js/stdlib/pull/4535)	 (merged) @stdlib/stats/base/dnanvarianceyc

+10 more merged PR’s from stats/base/*


o	Add ndarray support:
o	[#4543](https://github.com/stdlib-js/stdlib/pull/4543) (open) @stdlib/stats/base/dvarmpn 
o	[#4720](https://github.com/stdlib-js/stdlib/pull/4720) (open) @stdlib/stats/base/dmeanvarpn
o	[#4726](https://github.com/stdlib-js/stdlib/pull/4726) (open) @stdlib/stats/base/dmeanpn
o	[#4727](https://github.com/stdlib-js/stdlib/pull/4727)	 (open) @stdlib/stats/base/sdsnanmeanors


o	Ideas: Proposed the following features/ideas (issues)
o	[#4889](https://github.com/stdlib-js/stdlib/pull/4889) (open) adds napi/create-bool 
o	[#4635](https://github.com/stdlib-js/stdlib/pull/4635) (merged) adds napi/argv-bool

### stdlib showcase

I have explored the stdlib repository through hands-on experimentation and by building educational and practical demos that highlight its capabilities. My university's mechanical teachers uses stdlib numerical functions for finding answers to difficult mathematical questions.  

### Goals

This project aims to introduce a dedicated string-typed array, called 
StringArray, designed to support variable-length strings. The main motivation 
behind adding this data type is to improve interoperability between JavaScript and 
C. This is especially important for enabling support for ndarrays with string data 
types, as a significant portion of ndarray iteration logic is implemented in C. 
This project aims to add all the necessary string methods for the StringArray 
with perfect error handling and all the necessary assert and other packages 
necessary for StringArray. 

### Why this project?

Unlike numeric types, which have fixed sizes and integrate seamlessly into 
typed arrays, strings present a more complex challenge because of their variable 
length. This project tackles that issue by proposing a structured memory layout 
tailored for string storage—offering both efficiency and clarity. It’s a critical step 
toward enabling robust, high-performance handling of textual data in low-level 
JavaScript environments. 
Additionally I have prior knowledge on string methods and manipulation from 
my string library project.

Additionally i have prior knowledge on string methods and manipulation from my [string.h](https://github.com/DhruvArvindSingh/String-Library-in-C) library project.

### Qualifications

During my time in college, I’ve studied JavaScript alongside fundamental 
computer science subjects such as object-oriented programming, algorithms, 
operating systems, computer architecture, Linux, and Git. I’ve also developed a solid 
understanding of the stdlib codebase and have spent a significant amount of time 
researching existing string implementations to explore how these new features can 
be effectively introduced.


### Prior art

I have some prior knowledge of creating string methods and manipulation of  string from my [string](https://github.com/DhruvArvindSingh/String-Library-in-C) library project in C.
	After researching the implementation of string in libraries and languages like Numpy, Java, etc the below are their ways of implementing it:

###Numpy:
Numpy stores string data in UTF-8 sequence which takes 1-4 bytes of storage. They use Bjoern Hoerhmann’s DFA UTF-8 validator for validating UTF-8 sequence. 
UTF-8 is a variable-width encoding, which means:
1.	ASCII Characters (U+0000 to U+007F): 1 byte each
2.	Latin/Greek/Cyrillic etc. (U+0080 to U+07FF): 2 bytes each
3.	CJK and other BMP scripts (U+0800 to U+FFFF): 3 bytes each
4.	Supplementary characters (U+10000 to U+10FFFF): 4 bytes each

Each string contains some metadata which are: 
•	**NPY_STRING_MISSING**: Whether the string is missing/null.
•	**NPY_STRING_INITIALIZED**: Whether it's been initialized.
•	**NPY_STRING_OUTSIDE_ARENA**: Whether it's stored outside the arena (arena is one of string implementation methods which is discussed later)
•	**NPY_STRING_LONG**: Whether the string length is >255 bytes or not.

Empty strings are handled specially for efficiency.
     The implementation uses thread safe memory management through mutex locks
     In numpy each index of string data is stored in a memory block which contains `size` variable and a `*buffer`. The `size` variable stores the size of data at *buffer (can be 1-4 bytes),
     If the string method is of arena type then a `cursor` variable is also present storing the current position in the arena
     Numpy uses 3 methods for storing strings depending on the size of the string namely short string, arena and heap.
 		**Short string**: 
•	Strings who’s size is less then equal to 15 bytes(for 64 bit systems) or 7 bytes(for 32 bit system) uses this storage method.
•	In this method string data is directly stored in the array buffer. 
•	It also stores all the flags and size in a single 2 byte variable called `size_and_flags`, here the upper 4 bits stores the size and lower 4 bits stores the flags as mentioned before.
•	When using methods, the methods identifies the storage method by the flags

				**Arena**:
•	Strings who’s size is greater then 15 bytes(for 64 bit systems) or 7 bytes(for 32 bit system) uses this storage method.
•	The arena is a contiguous block of memory used to store single or multiple strings.
•	Each block contains the string data 
•	It stores the staring address of the string and adds offset to it in order to access the string element
•	It stores the size in the `size_and_flags` variable.
•	Arena method grows with a factor of 1.25.
•	It works fine when we are working with multistring array as we only need to store the starting address of the string.


				**Heap/Long string**:
•	Strings who’s size is greater then 255 bytes lie here. 
•	it uses  direct pointer to the heap memory rather than an offset into the arena.
•	Since arena becomes inefficient after 255 bytes numpy uses heap.



 

###JAVA
	Java uses 2 methods for storing string datatype. One is in heap memory and other is storing 2 copies which are stored within heap and a separate area inside heap dedicated for strings(SLC). Its implementation looks out of scope from our project implementation.

idea: 
	By taking Reference from the [Complex64 array](https://github.com/stdlib-js/stdlib/blob/5dbb01dba2b1b305c6a11b66652ee2e4ccac15e2/lib/node_modules/%40stdlib/array/complex64/lib/main.js)  implementation and [Boolean Array](https://github.com/Jaysukh-409/stdlib/tree/b88f312cfecea430d79ea65509e2289509fc0681/lib/node_modules/%40stdlib/array/bool) implementation one way can be to use Uint32Array for storing the Utf-8 sequence and storing 4 metadata. This approach is similar to numpy’s short string memory allocation method and will have similar project timeline and approach as of [Boolean Array](https://github.com/stdlib-js/stdlib/wiki/GSoC-2024-%E2%80%90-Jaysukh-Makvana). StringArray prototype functions can be inspired by string library of C, general TypedArray methods and JavaScript native methods. 
	StringArray can inspire from numpy’s arena implementation and short string implementation, where we can allocate some additional free space which is 0.25 times the require length. The extra space will not be accessible to user until some data is stored in it and the length of array is greater then the index. This will decrease the chances of string data reallocation. 

	Ex: let a = new StringArray(“6_byte”); // this will initialize a string of length 8( ceil(1.25*6)) with only 0 to 5 index’s accessible for the user.

We will need to store 4 metadata namely `size`,  `Length`, `is_initialized` and `is_null` as done by the numpy string array:

o	**size** : (It can be removed after final discussion) This variable store the length of initialized Uint32Array.
o	**Length**: This variable stores the length of accessible string.
o	**is_initialized**: This variable is shows whether the string is initialized or not.
o	**Is_null**: This variable shows whether the StringArray is an Empty Array.
        StringArray’s ndarray implementation will be easier if we use use the Unit32Array as base.

It will have some static properties which are: 

•	**BYTES_PER_ELEMENT**: this stores the size of each memory block(4 bytes).
•	**name**: this stores the string `StringArray`
It will have the following prototype functions :
•	**name**: 	returns “StringArray”.
•	**byteLength**: 	returns string length in bytes.
•	**byteOffset**:	returns offset (in bytes) of the array from the start of its     underlying
•	**BYTES_PER_ELEMENT**: returns each block’s size i.e 4 bytes.
•	**length**: returns length of string.
•	**from( src , clbk, thisArgs] )**:  Creates a new StringArray from an array-like 
object or Iterable get( index ). The `clbk` is an optional callback function for each src element. `thisArgs` is a context for clbk function which is also optional.
•	**of( src )**:  Creates a new StringArray from a variable number of arguments.
•	**map( callbackfn, thisArgs )**: Returns a new array with each element being the 
from the provided callback function. `thisArgs` is the context for the callbackfn.
•	**get(index )**: Returns the string data present at the provided index.
•	**set( value, index )**: Sets the data at the provided index.
•	**indexOf( value )**: Returns the index of the first occurence of value in the StringArray
•	**lastIndexOf( value )**: Returns the index of the last occurence of value in the StringArray.
•	**toLowerCase()**: Returns a StringArray whose every element is of lower case.
•	**toUpperCase()**: Returns a StringArray whose every element is of upper case.
•	**Reverse()**: Reverses the StringArray.
•	**Includes(value)**: Returns true if the value exists in the array and vice versa.
•	**startsWith( value )**: Returns true if array starts with same values as in value and vice versa
•	**endsWith( value )**: Returns true if array ends with same values as in value and vice versa
•	**slice( arg1 , arg2 )**: Return StringArray from index arg1 till index arg2 from the parent StringArray.
•	**Substring(arg1, arg2)**: Return StringArray from index arg1 till index arg2 from 

the parent StringArray. The only difference between substring and slice is that, if arg1 > arg2 then substring will return a string from index arg2 till arg1 while slice will return an empty array.
	Many more methods will be added after final discussion.
	We would need to add assert functions (is-stringarray and is-same-stringarray) in @stdlib/assert* similar to other TypedArray.
	Ndarray implementation will be next where we need to add the StringArray support in the ndarray wapper and add the essential assert packages.
	Lastly a well documented README.md file to be added showing the work , methods and information of the newly added package.


### Commitment

My summer break starts on May 15, which means I’ll be fully available once the official coding period begins on May 27. For the first two months, I can dedicate over 40 hours per week to the project, as I won’t have any overlapping commitments during that time.
In the final month, when college resumes, I’ll still be able to contribute around 25 hours per week alongside my coursework. Altogether, I expect to commit time between 400 - 450 hours to the program—meeting the time expectations for the project comfortably.



### Schedule
Assuming a 12 week schedule,

●	Community Bonding Period:
○	Discuss and plan the proposed features in detail to gain more clarity on the goals and approach.
○	Once a clear roadmap is finalized, we can start early as my summer break would begin on May 15.

●	Week 1, 2 & 3  :
○	Implementing StringArray constructor function with error handling
○	Writing test cases ,benchmark files and creating the Readme.md file with the current package data.

●	Week 4 & 5:
○	Implementing easier prototype methods as discussed above with additional method finalized after discussion (i.e. indexOf, lastindexOf, get, set, toLowerCase, toUpperCase, includes, startsWith, endsWith, etc ).
○  Listing methods in the Readme with testing 

●	Week 6:
○	Implementing difficult prototype methods as discussed above with additional method finalized after discussion (i.e.map, etc).
(midterm): By midterm, we should be successfully done with most features indexOf, lastindexOf, get, set, toLowerCase, toUpperCase, includes, startsWith, map and endsWith with documentation , benchmark and test files.

●	Week 7, 8 & 9:
○	Implement StringArray support in ndarray with test cases and benchmarks.
○	Implementing difficult prototype method with test cases and benchmarks.

●	Week 10:
○	Add necessary packages in @stdlib/assert/* and @stdlib/array/base/assert/*.
○	Write tests for the added features and completing remaining work.

●	Week 11:
○	Continue writing and finalizing tests and completing remaining work.
○	Write tutorials and documentation.

●	Week 12:
○	handling pending work, bugs, tests etc.
●	Final Week: Project submission!


Notes:

- The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
- Usually, even week 1 deliverables include some code.
- By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
- By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
- During the final week, you'll be submitting your project.


### Related issues

●	[#44](https://github.com/stdlib-js/google-summer-of-code/issues/44) - [Idea]: add support for string arrays in stdlib

### Checklist

- [x] I have read and understood the [Code of Conduct](https://github.com/stdlib-js/stdlib/blob/develop/CODE_OF_CONDUCT.md).
- [x] I have read and understood the application materials found in this repository.
- [x] I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
- [x] I have read and understood the [patch requirement](https://github.com/stdlib-js/google-summer-of-code/blob/main/README.md#patch-requirement) which is necessary for my application to be considered for acceptance.
- [x] I have read and understood the [stdlib showcase requirement](https://github.com/stdlib-js/google-summer-of-code/blob/main/README.md#showcase-requirement) which is necessary for my application to be considered for acceptance.
- [x] The issue name begins with `[RFC]:` and succinctly describes your proposal.
- [x] I understand that, in order to apply to be a GSoC contributor, I must submit my final application to <https://summerofcode.withgoogle.com/> **before** the submission deadline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC]: Add support for string arrays in stdlib #144

Full name

University status

University name

University program

Expected graduation

Short biography

Timezone

Contact details

Platform

Editor

Programming experience

JavaScript experience

Node.js experience

C/Fortran experience

Interest in stdlib

Version control

Contributions to stdlib

stdlib showcase

Goals

Why this project?

Qualifications

Prior art

Commitment

Schedule

Related issues

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC]: Add support for string arrays in stdlib #144

Description

Full name

University status

University name

University program

Expected graduation

Short biography

Timezone

Contact details

Platform

Editor

Programming experience

JavaScript experience

Node.js experience

C/Fortran experience

Interest in stdlib

Version control

Contributions to stdlib

stdlib showcase

Goals

Why this project?

Qualifications

Prior art

Commitment

Schedule

Related issues

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions