#5568 reviewing change

Improve string handling by splitting read strings into only ASCII and non-ASCII strings

Reported by:	sergz	Assignee:	sergz
Priority:	P2	Milestone:
Module:	Libadblockplus	Keywords:
Cc:	fhd, hfiguiere, mjethani	Blocked By:
Blocking:		Platform:	Unknown / Cross platform
Ready:	no	Confidential:	no
Tester:	Unknown	Verified working:	no
Review URL(s):	https://codereview.adblockplus.org/29531696/

Description

V8 does support internally one byte character strings and we should use this feature in order to optimize a bit memory usage.

Currently when we read a file (see readFromFile in io.js) and pass its content into JS to go through each line V8 finds there non-ASCII characters and creates a regular UCS-2 JavaScript string, namely two bytes string. In addition when we work with a string, e.g. create a new one which is a substring of the original string but copied internally by V8, the new string has the same type, namely one byte string or two bytes string. It may be also valid for some transformations like changing the case of letters.

For example, in EasyList subscription there are only few lines with non-ASCII characters. So, we could save some memory if we managed to advice V8 when the line is one byte character string. Since we are not allowed to shuffle the lines in the read file it seems we should implement readFromFile from io.js in C++. However one should pay attention to the performance, e.g. one should consider whether the calling of listener from C++ is too expensive or not (e.g., profile how it performs on android, of course compiled in release).

In addition one should consider how to use it with downloaded data.

Change History (6)

comment:1 Changed on 08/28/2017 at 01:55:54 PM by sergz

Owner set to sergz

comment:2 Changed on 08/30/2017 at 05:14:22 PM by sergz

Review URL(s) modified (diff)
Status changed from new to reviewing

comment:3 Changed on 08/31/2017 at 01:33:12 PM by abpbot

A commit referencing this issue has landed:
Issue 5568 - Improve string handling by splitting read strings into only ASCII and non-ASCII strings

comment:4 follow-up: ↓ 5 Changed on 10/14/2017 at 11:20:56 PM by oleksandr

Should this be closed? I understand it has landed already.

comment:5 in reply to: ↑ 4 Changed on 10/15/2017 at 05:06:04 PM by sergz

Replying to oleksandr:

Should this be closed? I understand it has landed already.

I would consider it as not finished yet because the downloaded data are still one big 2 bytes per character string.

comment:6 Changed on 05/23/2018 at 08:39:01 AM by mjethani

Cc mjethani added

Context Navigation

#5568 reviewing change

Improve string handling by splitting read strings into only ASCII and non-ASCII strings

Description

Attachments (0)

Change History (6)

comment:1 Changed on 08/28/2017 at 01:55:54 PM by sergz

comment:2 Changed on 08/30/2017 at 05:14:22 PM by sergz

comment:3 Changed on 08/31/2017 at 01:33:12 PM by abpbot

comment:4 follow-up: ↓ 5 Changed on 10/14/2017 at 11:20:56 PM by oleksandr

comment:5 in reply to: ↑ 4 Changed on 10/15/2017 at 05:06:04 PM by sergz

comment:6 Changed on 05/23/2018 at 08:39:01 AM by mjethani

Add Comment

Modify Ticket

Changed by kzar

Download in other formats:

Summary:
Description:	You may use WikiFormatting here. V8 does support internally one byte character strings and we should use this feature in order to optimize a bit memory usage. Currently when we read a file (see `readFromFile` in `io.js`) and pass its content into JS to go through each line V8 finds there non-ASCII characters and creates a regular UCS-2 JavaScript string, namely two bytes string. In addition when we work with a string, e.g. create a new one which is a substring of the original string but copied internally by V8, the new string has the same type, namely one byte string or two bytes string. It may be also valid for some transformations like changing the case of letters. For example, in EasyList subscription there are only few lines with non-ASCII characters. So, we could save some memory if we managed to advice V8 when the line is one byte character string. Since we are not allowed to shuffle the lines in the read file it seems we should implement `readFromFile` from `io.js` in C++. However one should pay attention to the performance, e.g. one should consider whether the calling of `listener` from C++ is too expensive or not (e.g., profile how it performs on android, of course compiled in release). In addition one should consider how to use it with downloaded data.
Type:		Priority:
Milestone:		Module:
Keywords:		Cc:
Blocked By:		Blocking:
Platform:		Ready:
Confidential:		Tester:
Verified working:
Review URL(s):	https://codereview.adblockplus.org/29531696/