Opened on 05/26/2015 at 11:37:49 AM
Closed on 07/14/2015 at 02:19:01 PM
#2586 closed change (fixed)
Clear unused accounts from Trac database of the issue tracker at issues1
Reported by: | trev | Assignee: | matze |
---|---|---|---|
Priority: | P2 | Milestone: | |
Module: | Infrastructure | Keywords: | |
Cc: | philll, fhd, fred | Blocked By: | |
Blocking: | #2771, #2780, #2782, #2783 | Platform: | Unknown |
Ready: | yes | Confidential: | no |
Tester: | Unknown | Verified working: | no |
Review URL(s): |
Description (last modified by philll)
Background
We currently have beyond 26000 accounts in the Trac database, less than 500 are actually being used. That makes Trac slow, its inefficient data structures cannot deal with a large number of accounts. We should ensure staff accounts to not be removed.
What to change
Remove unused accounts from the Trac database used to authenticate users for the issue tracker at issues1, excluding accounts with email domains @eyeo.com and @adblockplus.org
Attachments (0)
Change History (15)
comment:1 Changed on 05/26/2015 at 11:40:26 AM by philll
- Description modified (diff)
- Summary changed from Clear unused accounts from Trac database to Clear unused accounts from Trac database of the issue tracker at issues1
comment:2 Changed on 05/26/2015 at 11:41:58 AM by philll
- Cc philll added
comment:3 Changed on 07/03/2015 at 12:21:09 PM by philll
- Description modified (diff)
- Tester set to Unknown
comment:4 Changed on 07/03/2015 at 02:05:00 PM by matze
- Priority changed from Unknown to P2
- Ready set
comment:5 Changed on 07/10/2015 at 05:41:30 PM by matze
- Blocking 2771 added
comment:6 Changed on 07/13/2015 at 01:31:37 PM by matze
- Cc fhd fred added; matze removed
- Owner set to matze
comment:7 Changed on 07/14/2015 at 10:54:17 AM by matze
Table modifications applied to the issues (trac) database:
mysql> alter table session modify sid character varying(100); Query OK, 190086 rows affected (15.20 sec) Records: 190086 Duplicates: 0 Warnings: 0
mysql> alter table session_attribute modify sid character varying(100); Query OK, 895880 rows affected (1 min 6.82 sec) Records: 895880 Duplicates: 0 Warnings: 0
mysql> alter table session_attribute modify name character varying(100); Query OK, 895880 rows affected (1 min 4.70 sec) Records: 895880 Duplicates: 0 Warnings: 0
Table modifications to the orders (trac_orders) one:
mysql> alter table session modify sid character varying(100); Query OK, 38 rows affected (0.40 sec) Records: 38 Duplicates: 0 Warnings: 0
mysql> alter table session_attribute modify sid character varying(100); Query OK, 82728 rows affected (3.19 sec) Records: 82728 Duplicates: 0 Warnings: 0
mysql> alter table session_attribute modify name character varying(100); Query OK, 82728 rows affected (3.23 sec) Records: 82728 Duplicates: 0 Warnings: 0
comment:8 Changed on 07/14/2015 at 11:13:46 AM by matze
Deletion of obsolete session data (we may want to establish this as a regular cron job):
mysql> delete ms, msa from session ms left join session_attribute msa on ms.sid = msa.sid and ms.authenticated = msa.authenticated where ms.authenticated = 0 and ms.last_visit < unix_timestamp(now() - interval 10 day); Query OK, 603069 rows affected (1 min 50.88 sec)
Note that the interval 10 day has been used because of the hard-coded cookie lifetime:
trac@issues1:~$ grep -A1 -B1 -i "delete from auth_cookie where time" /usr/local/lib/python2.7/dist-packages/trac/web/auth.py # Delete cookies older than 10 days db("DELETE FROM auth_cookie WHERE time < %s", (int(time.time()) - 86400 * 10,))
comment:9 Changed on 07/14/2015 at 12:30:01 PM by matze
- Blocking 2780 added
comment:10 Changed on 07/14/2015 at 12:49:03 PM by matze
Before deleting the accounts, here's an overview of the backups available on hand in production (in addition to mine) - just in case we need to react quickly:
+----------------------------+ | Tables_in_trac (_backup_%) | +----------------------------+ | _backup_session | | _backup_session_attribute | +----------------------------+
(Before any manual operation.)
+----------------------------------+ | Tables_in_trac (_intermediate_%) | +----------------------------------+ | _intermediate_session | | _intermediate_session_attribute | +----------------------------------+
(After altering tables and deleting obsolete sessions.)
comment:11 Changed on 07/14/2015 at 01:29:11 PM by matze
The following query has been used to delete the fake accounts older than 10 days:
mysql> delete s, sa from session s join session_attribute sc on s.sid = sc.sid and s.authenticated = sc.authenticated join sess ion_attribute sa on s.sid = sa.sid and s.authenticated = sa.authenticated where sc.name = "email_verification_token" and s.last _visit < unix_timestamp(now() - interval 10 day); Query OK, 346199 rows affected (33.23 sec)
Note that it seems like we get a few thousand new fake accounts every day. Thus we may want to execute the query above on a regular basis as well. Instead of investing time in a captcha solution that does not work anyway.
comment:12 Changed on 07/14/2015 at 02:02:31 PM by matze
As discussed with @philll, I've executed the query again, this time restricting to those records older than 5 days:
mysql> delete s, sa from session s join session_attribute sc on s.sid = sc.sid and s.authenticated = sc.authenticated join session_attribute sa on s.sid = sa.sid and s.authenticated = sa.authenticated where sc.name = "email_verification_token" and s.last_visit < unix_timestamp(now() - interval 5 day); Query OK, 9334 rows affected (0.48 sec)
This is the version that should become a cron job as well - due to the aforementioned permanent creation of new fake accounts we still have a few thousand laying around and slowing down the lookups:
mysql> select count(*) from session where authenticated = 1; +----------+ | count(*) | +----------+ | 5701 | +----------+ 1 row in set (0.01 sec)
comment:13 Changed on 07/14/2015 at 02:15:10 PM by matze
- Blocking 2782 added
comment:14 Changed on 07/14/2015 at 02:17:29 PM by matze
- Blocking 2783 added
comment:15 Changed on 07/14/2015 at 02:19:01 PM by matze
- Resolution set to fixed
- Status changed from new to closed
Beside the vast amount of machine-generated accounts and the missing scheme optimizations we are going to address today, there is also a large amount of actual session data from unauthenticated users laying around in the same tables.
Unfortunately, this data seems to not get cleaned up automatically. There's month's old records in there. Other than the spam accounts, however, these can easily be distinguished and even deleted via e.g. cron. I'll post a query later.