Opened on 05/26/2015 at 11:37:49 AM

Closed on 07/14/2015 at 02:19:01 PM

#2586 closed change (fixed)

Clear unused accounts from Trac database of the issue tracker at issues1

Reported by: trev Assignee: matze
Priority: P2 Milestone:
Module: Infrastructure Keywords:
Cc: philll, fhd, fred Blocked By:
Blocking: #2771, #2780, #2782, #2783 Platform: Unknown
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description (last modified by philll)

Background

We currently have beyond 26000 accounts in the Trac database, less than 500 are actually being used. That makes Trac slow, its inefficient data structures cannot deal with a large number of accounts. We should ensure staff accounts to not be removed.

What to change

Remove unused accounts from the Trac database used to authenticate users for the issue tracker at issues1, excluding accounts with email domains @eyeo.com and @adblockplus.org

Attachments (0)

Change History (15)

comment:1 Changed on 05/26/2015 at 11:40:26 AM by philll

  • Description modified (diff)
  • Summary changed from Clear unused accounts from Trac database to Clear unused accounts from Trac database of the issue tracker at issues1

comment:2 Changed on 05/26/2015 at 11:41:58 AM by philll

  • Cc philll added

comment:3 Changed on 07/03/2015 at 12:21:09 PM by philll

  • Description modified (diff)
  • Tester set to Unknown

comment:4 Changed on 07/03/2015 at 02:05:00 PM by matze

  • Priority changed from Unknown to P2
  • Ready set

comment:5 Changed on 07/10/2015 at 05:41:30 PM by matze

  • Blocking 2771 added

comment:6 Changed on 07/13/2015 at 01:31:37 PM by matze

  • Cc fhd fred added; matze removed
  • Owner set to matze

Beside the vast amount of machine-generated accounts and the missing scheme optimizations we are going to address today, there is also a large amount of actual session data from unauthenticated users laying around in the same tables.

Unfortunately, this data seems to not get cleaned up automatically. There's month's old records in there. Other than the spam accounts, however, these can easily be distinguished and even deleted via e.g. cron. I'll post a query later.

comment:7 Changed on 07/14/2015 at 10:54:17 AM by matze

Table modifications applied to the issues (trac) database:

mysql> alter table session modify sid character varying(100);
Query OK, 190086 rows affected (15.20 sec)
Records: 190086  Duplicates: 0  Warnings: 0
mysql> alter table session_attribute modify sid character varying(100);                                                        
Query OK, 895880 rows affected (1 min 6.82 sec)
Records: 895880  Duplicates: 0  Warnings: 0
mysql> alter table session_attribute modify name character varying(100);                                                       
Query OK, 895880 rows affected (1 min 4.70 sec)
Records: 895880  Duplicates: 0  Warnings: 0

Table modifications to the orders (trac_orders) one:

mysql> alter table session modify sid character varying(100);
Query OK, 38 rows affected (0.40 sec)
Records: 38  Duplicates: 0  Warnings: 0
mysql> alter table session_attribute modify sid character varying(100);
Query OK, 82728 rows affected (3.19 sec)
Records: 82728  Duplicates: 0  Warnings: 0
mysql> alter table session_attribute modify name character varying(100);
Query OK, 82728 rows affected (3.23 sec)
Records: 82728  Duplicates: 0  Warnings: 0
Last edited on 07/14/2015 at 01:38:12 PM by matze

comment:8 Changed on 07/14/2015 at 11:13:46 AM by matze

Deletion of obsolete session data (we may want to establish this as a regular cron job):

mysql> delete ms, msa from session ms left join session_attribute msa on ms.sid = msa.sid and ms.authenticated = msa.authenticated where ms.authenticated = 0 and ms.last_visit < unix_timestamp(now() - interval 10 day);
Query OK, 603069 rows affected (1 min 50.88 sec)

Note that the interval 10 day has been used because of the hard-coded cookie lifetime:

trac@issues1:~$ grep -A1 -B1 -i "delete from auth_cookie where time" /usr/local/lib/python2.7/dist-packages/trac/web/auth.py
            # Delete cookies older than 10 days
            db("DELETE FROM auth_cookie WHERE time < %s",
               (int(time.time()) - 86400 * 10,))

comment:9 Changed on 07/14/2015 at 12:30:01 PM by matze

  • Blocking 2780 added

comment:10 Changed on 07/14/2015 at 12:49:03 PM by matze

Before deleting the accounts, here's an overview of the backups available on hand in production (in addition to mine) - just in case we need to react quickly:

+----------------------------+
| Tables_in_trac (_backup_%) |
+----------------------------+
| _backup_session            |
| _backup_session_attribute  |
+----------------------------+

(Before any manual operation.)

+----------------------------------+
| Tables_in_trac (_intermediate_%) |
+----------------------------------+
| _intermediate_session            |
| _intermediate_session_attribute  |
+----------------------------------+

(After altering tables and deleting obsolete sessions.)

comment:11 Changed on 07/14/2015 at 01:29:11 PM by matze

The following query has been used to delete the fake accounts older than 10 days:

mysql> delete s, sa from session s join session_attribute sc on s.sid = sc.sid and s.authenticated = sc.authenticated join sess
ion_attribute sa on s.sid = sa.sid and s.authenticated = sa.authenticated where sc.name = "email_verification_token" and s.last
_visit < unix_timestamp(now() - interval 10 day);                                                                              
Query OK, 346199 rows affected (33.23 sec)

Note that it seems like we get a few thousand new fake accounts every day. Thus we may want to execute the query above on a regular basis as well. Instead of investing time in a captcha solution that does not work anyway.

Last edited on 07/14/2015 at 01:34:09 PM by matze

comment:12 Changed on 07/14/2015 at 02:02:31 PM by matze

As discussed with @philll, I've executed the query again, this time restricting to those records older than 5 days:

mysql> delete s, sa from session s join session_attribute sc on s.sid = sc.sid and s.authenticated = sc.authenticated join session_attribute sa on s.sid = sa.sid and s.authenticated = sa.authenticated where sc.name = "email_verification_token" and s.last_visit < unix_timestamp(now() - interval 5 day);
Query OK, 9334 rows affected (0.48 sec)

This is the version that should become a cron job as well - due to the aforementioned permanent creation of new fake accounts we still have a few thousand laying around and slowing down the lookups:

mysql> select count(*) from session where authenticated = 1;
+----------+
| count(*) |
+----------+
|     5701 |
+----------+
1 row in set (0.01 sec)

comment:13 Changed on 07/14/2015 at 02:15:10 PM by matze

  • Blocking 2782 added

comment:14 Changed on 07/14/2015 at 02:17:29 PM by matze

  • Blocking 2783 added

comment:15 Changed on 07/14/2015 at 02:19:01 PM by matze

  • Resolution set to fixed
  • Status changed from new to closed

Add Comment

Modify Ticket

Change Properties
Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
to The owner will be changed from matze.
 
Note: See TracTickets for help on using tickets.