JA-SIG 2008 Tuesday’s notes

By redhotikalegs

jasig tuesday

General Session: Moving Community Source to the Mainstream, Ira Fuchs, Mellon Foundation

Educause awarded u-portal as ground-breaking open source project.

Legal, economic, psychological impediments to open source

why doesn’t every campus use open source or participate in development?

Why is growth needed :

  • communities are organic, so growing = life
  • more usage=greater synergies, more feedback, more user based support, funders get more social returns, contributors get more ego boosts, vendors get attracted & finds way to serve communities. growth begets growth & reduces overall costs.

Obstacles to growth:

  • Legal obstacles : witness issues w blackboard(?) community-source (c-source) is subjected to double layer of risk assessment, e.g. Intellectual Property (IP) infringement. Fuchs refutes this, campuses are always challenging IP edge w/ standard research. Strategies: 501c3s to hold copyright & get universities out of line of fire.
  • Need for professional support: as more institutions outsource key services like email, IT spt may be shrinking at institutions. Vendors are critical to IT @ universities. Many of the significant c-source projects were vendor-supported. e.g. vendor tech spt services.
  • Real cost of c-source ownership: we need to show real costs of implementation, there are enough 2nd and 3rd gen adopters to help identify real costs. Fuchs is confident the costs are still competitive. c-source provides institutions w greater agility.
  • Open-source (o-source) as anti-commercial movement: Some concern among top leaders of academia. However, big companies use open source sw. Academia can be just as professional as business.
  • Wealthier institutions have an obligation to help less-wealthy institutions: o-source follows this model. Not all institutional leaders share this view. We need to work as a community to show the pay-offs to large institutions
  • Perception & marketing: osource risks are higher, bennies lower. Clear, persistent communications needed. We must communicate value of participation and collaboration both horizontally & vertically. Software project wiki sites need to provide info to senior executives. Inadequacy of marketing materials is why osource has not penetrated. Marketing cannot be relegated to vendors only. zotero has marketing done well. zotero is a firefox plug-in. Why shouldn’t academic institutions help out with other project needs like marketing? e.g. multilingual marketing? Shouldn’t open source and community source communities value this kind of contribution?

Projects to note:

RIT-Space (avail on website)

OpenCollection:

  • any museum type, any metadata schema, multinational consortium, collections to full fledged MIS. Mellon grant. Museum of Moving Image in New York and Berkeley leading project, using SOA (Service Oriented Architecture). First design workshops.

VUE:

  • vue website: concept mapping. rich connections to external resoruces. network analytics, filtering tools. pathways and presentation spt.

Sophie:

  • eBook authoring. rich media, timelines, annotation, interactive conversations via sophie server. can use flash, pagemaker, quark express. created a book online, 10 pages per hour.
  • try to locate sophie : poema de la siguiriya gitana

Zotero:

  • Citation manager, bookmarks on steroids, integration w/ internet archive=permanent citations for web resources. API: plug-ins for the plug-in (e.g. Vertov). George Mason Univ. funded by Mellon. Now working on zotero server for sharing citations. Internet archive will assign permanent uri to whatever site you want. IA makes it available through own search interface for use by others. zotero vertov (sp?) plugin will enable you to cite a video and determine in and out points and add metadata.

SEASR:

  • rich media analytics for humanists and artists. laptop to grid scale. automated marshalling of resoruces. components, workflow, soa/web servcs.

Bamboo:

  • Chicago & Berkeley leading. Shared tech svcs for arts & humanities scholarship. Community design process. SOA. By using SOA environment, hoping for greater applicability and sustainability. Maybe connect with MIT simile, fedora project, standards based annotation sytems. First round of bamboo workshops underway. projectbamboo.org.

Synergy among these projects are critical for every project’s sustainability.

To succeed, need to market this to campus leaders. benefits minus costs.

Session: If we build it, will they come? Cornell dspace , George Kozak

6 programmers, web designers. Topic of presentation : promotion of their repository

Cornell’s history:

  • Original Fedora work
  • digital preservation aDORe
  • physics arXiv

DigitalCommons@ILR

CUL Media Archive using Fedora & Ruby

Funding for deployment & maintenance of dspace thru foundation. Operational responsibility funded by grant.

Dean of Faculty’s dream: Open access for faculty. Create 193 communities. Grad Ofc: ETDs. Students offered print on demand services in exchange for voluntarily submitting.

Code enhancements by Cornell info tech grp:

  • quick submit program
  • view counter for items
  • Offered to load materials & provide metadata or tech spt
  • Other selling points: guaranteed open access, google harvesting, harvesting, guaranteed storage & web access.

Position of assoc univ librarian for scholarly communication & collections was created. AUL set up an IR team of librarians & tech staff.

Several upgrades.

  • rebranded site.
  • removed empty communities
  • the new paint attracted a lot more attention.

More requests for inputting items came in.

Physicists wanted archives dept to store their videos. They submit video into their repositories. They are starting streaming videos.

Some collections: they provide CDs of content when desired.

Archived some websites that would no longer be sustained using httptrap (?) sw.

Cornell will be using Sakai and Dspace to make content available through Sakai.

Focus on providing materials that were “losing their home” or previously unavailable on web.

Working with grad school to mandate e-submission.

Other avenues:

  • harvesting our domain
  • works seeking publisher
  • local communities

Size of repository tripled in one year. Many items added though batch loads.

Tracked increase in # of hits. 70% from robots. non-robot hits were around 130K to 150K per month

40% of downloads from robots.

George tracks hits and downloads, w and w/o bots. Tracks it with year and items.

Tracked unique IPs, majority from outside univ.

Problem: How do you balance quantity vs. “scholarly”?

Library controversy about community content. Is it ephemeral or should be preserved?

Storage and network transmission costs are significant.

How do we measure success? hits? downloads? # collections?

Need for fulltime funded staff for outreach & recruitment.

They took out self-registration because people got in and wanted access to closed collections.

Kozak is working on a white paper to convince people why a repository is needed and why it needs staff.

Their new university librarian is Ann Killian (sp?)

They are getting 2 new servers.

Another Dspace project, OpenPolicy, is thinking about charging for statistics. Kozac does charge for conversion to pdf. They are working on the statistics issue. Using Google Analytics.

Embargo code is from University of Maryland and being used in ETD collection.

National Library of Medicine is highest.

dspace.org is using Google Analytics to track their stats.

Work on submission form to make a quick submit routine was not fruitful. But got feedback that licence page would be best up front.

ETDs also being sent to Proquest. Workflow will be students submit to ecommons & library forwards it to proquest.

Students like the print on demand service, library will print & distribute to committee members.

A SHORT CONVERSATION WITH MICHELLE KLIMPTON

Statistics is a committer priority as of OR2008. The committers agreed they will take UMinho statistics module and integrate it for future release of Dspace, manakin and (?) jsp-ui. Minho patch works with 1.4 only, not 1.5. Temporary solution for manakin: google analytics javascript code in web pages. Will not show closed collection views, tho’. Mark mentioned Google Analytics presents some processing overhead and cost issues, too.

CONVERSATION WITH LIBRARIAN FROM WOODS HOLE

Copyright issues: Sherpa can be misleading. Must read actual license agreement. Publisher may define “self-archiving” as on to author’s pc or a storage mechanism completely under the author’s control.

To get publications into her DSpace, she programs zip code query in major database feeds, get automated reports on what is getting published in her neighborhood, then contact the authors for preprints, NOT published versions. Word docs qualify as preprints. Converts to PDF b4 loading up.

Session: FLUID: FLEXIBLE USER INTERFACE DESIGN

Adaptive Technology Resource Center, U Toronto, world leader in accessibiliity and usability

Cross-project collaboration

Share UX (User Interface) resources across projects, solve common challenges, recognize recurring user interface idioms and needs

How do non technical people get involved in OSS? How to do distributed user testing.

Reusable flexible rich UI components,lightweight javascript,great interaction designs

  • UX toolkit
  • UI design patterns
  • UX walkthroughs
  • testing techniques
  • user profiles

what you need to design great user interfaces

  • components=recurring interactions
  • common activities uploading, finding, navigating thru content and tools, drag & drop
  • activities & contexts
  • UX walkthrus
  • checklists, pain points, solns, techniques

U-camps or user camps provide

  • basic UX vocabulary, techniques
  • OS distributed usability testing, competing with svcs like Morae
  • surveys, screen recording, keyboard tracking, etc. VU lab to be released soon.

UI design patterns

  • pattern=proven soln to a common problem in a speciffied context.
  • first open source pattern repository
  • share patterns across communities
  • www.uidesignpatterns.org

Goals:

  • make it easier for developrs to build better, more accessible user interfaces.
  • support collaboration w designers
  • foster sharing of design and code
  • adaptable for variety of toolls & workflows
  • diverse presentatin framewoks

fluid component:

  • client side: html, style sheets, javascript, accessibility metadata
  • server: ability to respond to RESTful requests (get, post), ability to deliver appropriate markup and data

UI adaptaton

  • flexible layouts & linearization: switching from multiple to one column
  • enhanced nav aids: turn on/off sitemaps, breadcrumbs
  • keyboard support: shortcuts, navigation
  • work based on jQuery

fluid components built to work with portals. support for multiple instances. dom searches constrained to fragments.

Fearless javascript workshop wiki.fluidproject.org/x/71Mk

fluidproject.org,wiki.fluidproject.org

other dhtml toolkits with accessibility: dojo. Will be incl. in jquery release.

Graceful degradation issue. Possibility of over dependence on client side javascript. They are very interested in open source renderers.

simile widget like timeline great but not accessible.

Session: OPEN SOURCE LONG TERM PRESERVATION ARCHIVES : Richard Matthews, Sun Inc.

Richard working on Honeycomb project

Sun Microsystems’ commitment to open source has strategic goals : increase core developers. This results in more partners, more awareness of trends. Also larger user community and funding support.

Solaris is open source software (oss). Commitment to port utilities to solaris. opensource.org.

Sun xvm is version of vmware

Sun announcing completely open platform including apache, php, ubuntu, synopsys, mysql, opensolaris & opensparc: hw & sw

www.opensolaris.org : site for new stuff

dtrace : open source debugger

www.sun-pasig.org May 27 will have more info about preservation archiving. sun preservation archiving community.

Reasons: compliance, book & image sharing, national heritage content, newspapers, data, applications & systems, journals, born digital, tiered repositories.

Proposed soln: fedora front end plus sun honeycomb

sam-qfs project : best policy based multi-tiered archive manager. Application transparent dynamic data movement, 4 tiers, local & remote, continuous archive=cdp, WORM & retention mgmt

Infinite archive system: scalable multi-tiered SAM-QFS, platform base, 10-256 TB systems, data-in-place upgrade

Tier 3: tape archive tier 2: disk cache tier1: disk archive

Serving Library Of Congress, petabyte of data, and Dept of Defense customers.

Sun storagetek 5800 Honeycomb

Smart, network attached, clustered, racked storage system

Metadata awareness built into design of box & data layout on disk

Open system, open source sw

RAIN architecture based on “cells” disk architecture

  • l2 load spreading switches
  • service processor
  • ea node Opteron-based SunFire server

Honeycomb

  • architecture optimized to store & retrieved unstructured fixed content. object storage, metadata aware.
  • dublin core metadata
  • web dav
  • future: xam (metadata & query model).
  • extreme data protection via RAID6. mean time to data loss > 2M yrs
  • demoe’d running video and pulling out disks at same time.
  • standard java & C APIs in SDK
  • horizontal scaling
  • dublin core is only beginning
  • platform agnostic

near future: onboard local data services available (’storage beans’)

why fedora & honeycomb?

  • to address scalability need fedora as aggregator of many different repositories.
  • designed w proper intelligence in proper places; metadata integral to storage, world-class reliability, persistency and scalability; end-2-end oss, automated wide area backup option

Storage beans

  • discrete services inside Honeycomb. example apps:
  • asynchronous background jobs : transformations (take all MP3s and turn into MP7 files : remastering of your files), periodic data scrubbing, duplicate consolidation (de-duping)
  • synchronous: audit logs, watermarking, encryption

Fedora now runs on Solaris/Open Solaris

  • Server + storage reference configs
  • inclusion of fedora 3 in Open solaris as ‘Indiana’ repository
  • Fedora on Solaris
  • John Hopkins now using this config.

eresearch, preservation archive, publishing going into fedora commons and going to fast disk, honeycomb, tape.

Can control views of metadata, e.g. blot out privacy data in mri file. Handles embedded EXIF data.

www.sun.com/storagetek/disk_systems/enterprise/5800/index.xml

www.sun-pasig.org

storagetek/management_software/data_management/sam/index.xml

www.opensolaris.org/os/project/honeycomb

Questions/comments from audience:

issue with the tremendous number of filehandles for tiny bits of data. 5800 was designed for large data files.

issue with headroom on each node so you needed a lot more storage than advertised

Leave a Reply