Separation of logical and physical location of 'albums/' directory inside CPG Separation of logical and physical location of 'albums/' directory inside CPG
 

News:

cpg1.5.48 Security release - upgrade mandatory!
The Coppermine development team is releasing a security update for Coppermine in order to counter a recently discovered vulnerability. It is important that all users who run version cpg1.5.46 or older update to this latest version as soon as possible.
[more]

Main Menu

Separation of logical and physical location of 'albums/' directory inside CPG

Started by slausen, April 08, 2008, 01:00:48 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

slausen

Hi-

This feature request is related this thread: http://forum.coppermine-gallery.net/index.php/topic,51660.0.html

CPG is great because it allows for storage of a lot of the attributes of image and other files in the database, without regard for where the actual files are in the filesystem. However, it's currently not possible to completely separate the storage of files from CPG, by placing the 'albums/' directory outside of the CPG webroot. Currently, CPG only stores one value for the path to the albums in the 'fullpath' variable in the config table.

There are many reasons why it would be desirable to have flexibility in this regard.

As the number of files inside the 'albums/' directory grows large, copying or backing up the entire 'albums/' directory every time you want to do a server move or upgrade can be problematic. Also, customers with large numbers of files will want to be able to move the file storage location to separate partitions, devices or servers for performance optimization reasons.

Fortunately, this is not a new problem. Other open source applications have solved this problem by storing two values for paths to certain types of files:

1) File path (physical) - the full path to the actual location of the files in the filesystem (used for backend application reads/writes to files)
2) URL path (logical) - the path to be appended to the URL (output to the browser for HTTP requests for files)

Example "File path": '/var/www/html/files'
Example "URL path":'http://www.companyserver.com/images/'

With this type of separation, it would even be possible for very high-volume customers to place their album files on a separate server (or even several mirrored servers behind some type of load-balancing proxy), with the album server accessible via NFS for reads and writes by CPG's background processes, and the servers themselves accessible directly via end user web browsers. To use a programming analogy, CPG doesn't necessarily need to "own" the gallery files as long as it can pass end users a pointer (URL path) to the files.

In addition to giving customers flexibility with regard to storage of gallery files, there would be benefits when upgrading, since customers could have a configuration like this:

/path/to/cpg_image_data/albums  <-- storage location of 'albums/' data

/home/coppermine/cpg1.4.13 <--- separate CPG installs, each with own codebase and DB, but all pointing to same 'albums/' "File path"
/home/coppermine/cpg1.4.15
/home/coppermine/cpg1.4.16
/home/coppermine/cpg1.5-pre-alpha

To do a non-destructive upgrade, customers could simply unpack the new CPG into its own directory tree, load a copy of the CPG DB for the new install, run the DB upgrade script, and set the "File Path" to '/path/to/cpg_image_data/albums'. For customers that wanted to have separate sets of content for the different installs, this would still be possible by changing the "File path" to different locations.

Each install could have its own "URL Path", which would allow customers to implement the standard operating environment for enterprise software development (4 environments: Development, QA, Test, and Production), by mapping Apache aliases (or even virtual servers) to the different installs of CPG. For example:

Alias /development /home/coppermine/cpg1.6-pre-alpha
Alias /qa /home/coppermine/cpg1.5-pre-alpha
Alias /test /home/coppermine/cpg1.4.17
Alias /production /home/coppermine/cpg1.4.16

When the new version has been fully tested, moving users over to production would be simply a matter of changing the alias in Apache.

So what would be involved in making this change?

It's possible that all that would be needed is:

1) Two new DB config values with names '$CONFIG['album_url_path']' and '$CONFIG['album_filesystem_path']' to replace the existing '$CONFIG['fullpath']'.
2) Modification of current references to '$CONFIG['fullpath']', where appropriate. This would mean referencing '$CONFIG['album_filesystem_path']' when CPG is accessing the filesystem directly, and referencing '$CONFIG['album_url_path']' where CPG is building an URL reference to send to the browser.

I am not a CPG expert, but I did a 'grep -r -I 'fullpath' *' on all the files in the CPG root and it showed that the 'fullpath' config value is only referenced in about 25 files (not including lang files). So it might not be a hugely complicated change.

I'm not sure how many enterprise customers are using CPG currently, but this would be an attractive feature for customers with large amounts of picture and other data to manage. And as current CPG users see their album directory getting bigger and bigger, I bet many of them would find it useful as well.

Does this sound doable?


slausen

Quote from: Nibbler on April 08, 2008, 01:23:21 AM
There's a gsoc proposal that would make adding this fairly easy.

http://forum.coppermine-gallery.net/index.php/topic,51449.0.html

That is an interesting proposal.

It seems to me that this feature request (adding two new paths '$CONFIG['album_url_path']' and '$CONFIG['album_filesystem_path']') might allow for straightforward implementation of several of the thoughts expressed in that GSOC proposal, in addition to the implementation options described at the top of this thread.

1) single off-site server
album_file_path = '/local/path/to/files'
album_url_path = 'http://s3.amazon.com/album'

2) round-robin style load balancer
album_file_path = '/local/path/to/nfs_mount' <-- NFS mount to primary mirror server; primary server replicates album data to slave servers
album_url_path = 'http://roundrobin.companyserver.com/albums'; <-- points at roundrobin (maybe squid or apache) proxy which cycles through servers

3) intelligent load balancer
album_file_path = '/local/path/to/nfs_mount' <-- NFS mount to primary mirror server; primary server replicates album data to slave servers
album_url_path = 'http://loadbalancer.companyserver.com/albums'; <-- points at intelligent loadbalancer (maybe squid or localdirector) proxy which routes requests to appropriate server


slausen

Quote from: slausen on April 08, 2008, 03:42:57 AM
That is an interesting proposal.

It seems to me that this feature request (adding two new paths '$CONFIG['album_url_path']' and '$CONFIG['album_filesystem_path']') might allow for straightforward implementation of several of the thoughts expressed in that GSOC proposal, in addition to the implementation options described at the top of this thread.

1) single off-site server
album_file_path = '/local/path/to/files'
album_url_path = 'http://s3.amazon.com/album'

2) round-robin style load balancer
album_file_path = '/local/path/to/nfs_mount' <-- NFS mount to primary mirror server; primary server replicates album data to slave servers
album_url_path = 'http://roundrobin.companyserver.com/albums'; <-- points at roundrobin (maybe squid or apache) proxy which cycles through servers

3) intelligent load balancer
album_file_path = '/local/path/to/nfs_mount' <-- NFS mount to primary mirror server; primary server replicates album data to slave servers
album_url_path = 'http://loadbalancer.companyserver.com/albums'; <-- points at intelligent loadbalancer (maybe squid or localdirector) proxy which routes requests to appropriate server



I just re-read this and noticed a mistake.

Obviously, single off-site server is not going to work for a third-party service like s3 (unless s3 offers some type of network filesystem access), but it would work for a separate server maintained by the customer, like this:

1) single server maintained by customer
album_file_path = '/local/path/to/nfs_mount' <-- path to NFS mount linked to separate hardware
album_url_path = 'http://image-optimized.companyserver.com/album'

or a separate disk on the same server, like this:

1.5) separate disk device on same server
album_file_path = '/disk/with/fast/seek/time/disk2/albums' <-- path to files
album_url_path = 'album/'

Joachim Müller

The current design that only allows a relative path has been introduced to allow easy moving of coppermine, renaming the coppermine root folder or any folder above it and to allow easy migration from one server to the other. That's one of the main benefits of coppermine in my opinion. I'm not sure if it's worth to consider additional ways of addressing images or storing the location of a file in any other way.
As suggested in the other thread, I can't see the point of this: you can easily use symlink or different mounts on "real" server operating systems for those who know their way around and have the skills. The performance impact is neglible, compared to the amount of CPU cycles burned by the "regular" usage of coppermine. "Ordinary" coppermine users are not aware of such stuff and I'm opposed to allowing this ease-of-use feature to be changed for the benefit of a small minority of users, when we are meanwhile creating huge issues for a larger number of users who don't know their way around so well.
Yes, you could add warnings to the docs about the impact of using another storage method, and yes, you could offer a wizard-like interface to help users migrate. But do docs get read? No. People constantly delete files that already exist in coppermine's database using their FTP app, because they have no idea of the impact that this will have. We always have to tell them "you should have read the docs before doing something silly like XXX", but in the end this only shows that average users have no idea how server-driven applications work. Adding a feature like the one you propose will not only be used by power users who know what they are doing, but by newbies as well. As a result, I'm afraid that we'd be getting a load of frustrated end users complaining that their gallery is broken. Imo there are more important features that should go into future versions. I can't see the point in this: webspace is so cheap. We should rather teach our users how to use it wisely instead of allowing them to store ridiculously-large files off-site.

slausen

Quote from: Joachim Müller on April 08, 2008, 08:08:49 AM
The current design that only allows a relative path has been introduced to allow easy moving of coppermine, renaming the coppermine root folder or any folder above it and to allow easy migration from one server to the other. That's one of the main benefits of coppermine in my opinion. I'm not sure if it's worth to consider additional ways of addressing images or storing the location of a file in any other way.
As suggested in the other thread, I can't see the point of this: you can easily use symlink or different mounts on "real" server operating systems for those who know their way around and have the skills. The performance impact is neglible, compared to the amount of CPU cycles burned by the "regular" usage of coppermine. "Ordinary" coppermine users are not aware of such stuff and I'm opposed to allowing this ease-of-use feature to be changed for the benefit of a small minority of users, when we are meanwhile creating huge issues for a larger number of users who don't know their way around so well.
Yes, you could add warnings to the docs about the impact of using another storage method, and yes, you could offer a wizard-like interface to help users migrate. But do docs get read? No. People constantly delete files that already exist in coppermine's database using their FTP app, because they have no idea of the impact that this will have. We always have to tell them "you should have read the docs before doing something silly like XXX", but in the end this only shows that average users have no idea how server-driven applications work. Adding a feature like the one you propose will not only be used by power users who know what they are doing, but by newbies as well. As a result, I'm afraid that we'd be getting a load of frustrated end users complaining that their gallery is broken. Imo there are more important features that should go into future versions. I can't see the point in this: webspace is so cheap. We should rather teach our users how to use it wisely instead of allowing them to store ridiculously-large files off-site.

Well...

I don't think having separation between the physical and logical is a non-standard approach, so I don't think it is likely to "confuse" users or cause "huge issues". If anything, it is standard in the web development world to allow for the setting of two paths. I would even go so far as to describe it as a "best practice". For example, a few open source applications that utilize an approach similar to the one I described in this feature request are Apache, gallery2, Wordpress, and drupal. Those apps seemed to have survived without crashing and burning under a deluge of user complaints.

And I don't see how having two paths makes it any harder to do migrations. If anything, it seems to me that it would make it easier, as outlined above in my first post. I really don't see why any "wizard-like" migration interface would be required, since this is just one more config setting. Assuming that users are capable of entering data into the existing config settings, they should probably be able to handle one extra field.

I don't think that any warnings would be needed regarding additional "storage methods", since from a user perspective, this would just be one additional config setting. To me that's the beauty of it. Simply by adding one additional setting, you can greatly increase the power, sophistication, and possible use cases for CPG from a customer standpoint. If someone wanted to do enterprise-level performance optimization, the extra setting would give them that flexibility. But if they wanted to have a simplified setup, they could do that also. Maybe even make the simplified path setup the default.

As for your comments about past issues you have had personally with end-user confusion, I guess I can only say that there are always tradeoffs between power and simplicity... it still seems to me that adding one extra config field is a good trade for the extra flexibility you get from separating the logical and physical definitions of the file locations.

But I respect your opinion. ;D

Given the large amount of interest in the GSOC thread/project to address this same issue, I am curious as to what others may think. I realize it's possible that I may be pushing the limits of what CPG was designed to do, and that there may not be much interest in this. But it's possible that this one small change could provide a relatively short path to get to some of the things described in that GSOC thread, and if there is interest, it might be worth doing.

hypers

IMHO, this possibility could be very usefull for a range of users. At least, it would be such for me.

I'm not sure about implementation difficulties but it seems quite possible to me. Regarding "user-confusing" issues: I suppose there's a way to do it kinda 'invisible' for non-advanced users. A line in the config file with warning about not doing anything with it until it is fully understood what is intended and what is about to be accomplished with modifying the line. This kind of warning is as usual in web-scripts as web-scripts themself :) and usually apply to the whole config file as well as to separate lines/settings. Non-advanced user is supposed not to touch config file at all. Basic CPG installation must have this 'separation' feature switched off by default. So as long he/she is non-advanced user he/she doesn't even realize there's some other way to manage albums storage and some settings regarding this. If he/she gets acquainted with this feature he/she should definitely read some info first.
And there's no insurance against users who delete/modify config file lines not realizing what they are doing at all.  ;)

I may be wrong. But I'd rather support this request seeing much value and so little risk in it.

Thank you.